# Wildfires in Brazil

## Data cleaning

Data regarding wildfires, deforestation and precipitation in Brazil was loaded into data frames and cleaned accordingly.

The following data was found:

1. Wildfires data for each state in Brazil per month in the period 1998-2017. Source: PORTAL BRASILEIRO DE DADOS ABERTOS.
2. Wildfires data for Brazil as a whole per month in the period 1998-2017. Source: PORTAL BRASILEIRO DE DADOS ABERTOS.
3. Wildfires data detected through satellite images. Source: Banco de Dados de Quemaidas.

In [1]:
# Import the necessary libraries

import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import re
import requests
import json

### 1. Wildfires data for each state in Brazil per month in the period 1998-2017.

In [2]:
# Import the data 

fires_year_state = pd.read_csv("../Data/rf_incendiosflorestais_focoscalor_estados_1998-2017.csv")

In [3]:
# Check for missing values

fires_year_state.isnull().sum()

year      0
state     0
month     0
number    0
date      0
dtype: int64

In [4]:
# Convert "number" column to integer

fires_year_state["number"] = fires_year_state.number.astype('int64')

In [5]:
# Replace the column "month" with month id's
# First check the spelling of the months in Portuguese

fires_year_state["month"].unique()

array(['Janeiro', 'Fevereiro', 'Março', 'Abril', 'Maio', 'Junho', 'Julho',
       'Agosto', 'Setembro', 'Outubro', 'Novembro', 'Dezembro'],
      dtype=object)

In [6]:
# Create a dictionary with the months id's

months = {"Janeiro": 1, 
          "Fevereiro": 2, 
          "Março": 3, 
          "Abril": 4, 
          "Maio": 5, 
          "Junho": 6, 
          "Julho": 7, 
          "Agosto": 8, 
          "Setembro": 9, 
          "Outubro": 10, 
          "Novembro": 11, 
          "Dezembro": 12}

In [7]:
# Replace the column "month" with month id's

fires_year_state = fires_year_state.replace({"month": months})

In [8]:
# Verify the data for each state

pd.Series(fires_year_state["state"]).value_counts()

Espirito Santo         239
Rio de Janeiro         239
Amazonas               239
Sao Paulo              239
Amapa                  239
Alagoas                239
Paraiba                239
Piauí                  239
Ceara                  239
Tocantins              239
Pará                   239
Minas Gerais           239
Bahia                  239
Pernambuco             239
Paraná                 239
Distrito Federal       239
Rondonia               239
Sergipe                239
Santa Catarina         239
Roraima                239
Mato Grosso do Sul     239
Maranhao               239
Goias                  239
Rio Grande do Norte    239
Acre                   239
Rio Grande do Sul      239
Mato Grosso            239
Name: state, dtype: int64

In [9]:
# We can see that Brazil's 27 states are present, and all states have the same entries

In [10]:
fires_year_state.dtypes

year       int64
state     object
month      int64
number     int64
date      object
dtype: object

In [11]:
# Save the data frame to a csv file

fires_year_state.to_csv("../Data/clean_data/fires_year_state.csv", index=False)

### 2. Wildfires data for Brazil as a whole per month in the period 1998-2017.

In [12]:
# Impor the data

fires_month = pd.read_csv("../Data/rf_incendiosflorestais_focoscalor_brasil_1998-2017.csv")

In [13]:
# Separate the data into columns
# First split the strings

fires_month = fires_month["Ano;Mês;Número;Período"].str.split(";", expand=True)

In [14]:
# Create new column names and then drop the previous columns

fires_month["year"] = fires_month[0]
fires_month["month"] = fires_month[1]
fires_month["number"] = fires_month[2]
fires_month["period"] = fires_month[3]

fires_month.drop(columns=[0, 1, 2, 3], inplace=True)

In [15]:
# Eliminate the rows that do not fit

fires_month = fires_month[~fires_month.year.str.contains("Máximo")]

In [16]:
fires_month = fires_month[~fires_month.year.str.contains("Média")]

In [17]:
fires_month = fires_month[~fires_month.year.str.contains("Mínimo")]

In [18]:
fires_month = fires_month[~fires_month.month.str.contains("Total")]

In [19]:
# Convert the year column into integer type

fires_month["year"] = fires_month.year.astype('int64')

In [20]:
# Create a dictionary with the months id's

months = {"Janeiro": 1, 
          "Fevereiro": 2, 
          "Março": 3, 
          "Abril": 4, 
          "Maio": 5, 
          "Junho": 6, 
          "Julho": 7, 
          "Agosto": 8, 
          "Setembro": 9, 
          "Outubro": 10, 
          "Novembro": 11, 
          "Dezembro": 12}

In [21]:
# Replace the column "month" with month id's

fires_month = fires_month.replace({"month": months})

In [22]:
# Convert the month column into integer type

fires_month["number"] = fires_month.number.astype('int64')

In [23]:
# Convert the period column into a date type

fires_month["period"] = fires_month.period.astype('datetime64[ns]')

In [24]:
fires_month.dtypes

year               int64
month              int64
number             int64
period    datetime64[ns]
dtype: object

In [25]:
# Verify the data for each year

pd.Series(fires_month["year"]).value_counts()

2007    12
2016    12
1999    12
2000    12
2001    12
2002    12
2003    12
2004    12
2005    12
2006    12
1998    12
2008    12
2009    12
2010    12
2011    12
2012    12
2013    12
2014    12
2015    12
2017    11
Name: year, dtype: int64

In [26]:
# We can see that there are entries for each year, except december 2017

In [27]:
# Save the data frame to a csv file

fires_month.to_csv("../Data/clean_data/fires_month.csv", index=False)

### 3. Wildfires data for Brazil's Legal Amazon (BLA) per month in the period 1999-2019.

In [28]:
# Import the data 

fires_bla = pd.read_csv("../Data/inpe_brazilian_amazon_fires_1999_2019.csv")

In [29]:
# Check for missing values

fires_bla.isnull().sum()

year         0
month        0
state        0
latitude     0
longitude    0
firespots    0
dtype: int64

In [30]:
# Convert to title the states

fires_bla["state"] = fires_bla["state"].str.title()

In [31]:
# Verify the data for each state

pd.Series(fires_bla["state"]).value_counts()

Mato Grosso    252
Amazonas       250
Para           250
Rondonia       246
Roraima        243
Maranhao       241
Tocantins      221
Acre           204
Amapa          197
Name: state, dtype: int64

In [32]:
# We can see that only 9 states are present in this data frame. 
# These states belong to Brazil's Legal Amazon (BLA), which is the largest socio-geographic division in Brazil.
# We can also see that the number of entries for each state varies.

In [34]:
# Save the data frame to a csv file

fires_bla.to_csv("../Data/clean_data/fires_bla.csv", index=False)

### 4. Deforestation data for Brazil's Legal Amazon (BLA) per year in the period 2004-2019.

In [None]:
# Import the data

url = "http://www.obt.inpe.br/OBT/assuntos/programas/amazonia/prodes"

In [None]:
html = requests.get(url).content

In [None]:
soup = BeautifulSoup(html, "html.parser")

In [None]:
soup

In [None]:
col_values = [items.text for items in (soup
                                       .find('div', id_='wrapper')
                                       .find('div', id_='main')
                                       .find_all('table')[0]
                                       .find_all('td',{'style':'text-align: left;'}))]

col_keys = [items.text[:-1] for items in (soup
                                       .find('div', id_='wrapper')
                                       .find('div', id_='main')
                                       .find_all('table')[0]
                                       .find_all('td',{'style':'text-align: left;'}))]

names_dict = dict(zip(col_keys, col_values))

### 5. Precipitation data for Brazil's Legal Amazon (BLA) per year in the period 2004-2019.

In [35]:
# Import the data 

precip = pd.read_csv("../Data/precipitation.csv")

In [36]:
# Check for missing values

precip.isnull().sum()

state            0
date             0
precipitation    0
dtype: int64

In [37]:
precip.columns

Index(['state', 'date', 'precipitation'], dtype='object')

In [38]:
precip["state"].unique()

array(['BA', 'RR', 'SE', 'AL', 'TO', 'GO', 'PI', 'MG', 'PR', 'MA', 'AP',
       'RJ', 'AC', 'AM', 'DF', 'PE', 'CE', 'PA', 'MT', 'PB', 'RS', 'RN',
       'SP', 'ES', 'SC'], dtype=object)

In [39]:
# Create a dictionary with the states id's

states = {"BA": "Bahia", 
          "RR": "Roraima", 
          "SE": "Sergipe", 
          "AL": "Alagoas", 
          "TO": "Tocantins", 
          "GO": "Goias", 
          "PI": "Piauí", 
          "MG": "Minas Gerais", 
          "PR": "Paraná", 
          "MA": "Maranhao", 
          "AP": "Amapa", 
          "RJ": "Rio de Janeiro",
          "AC": "Acre", 
          "AM": "Amazonas", 
          "DF": "Distrito Federal", 
          "PE": "Pernambuco", 
          "CE": "Ceara", 
          "PA": "Pará", 
          "MT": "Mato Grosso", 
          "PB": "Paraiba", 
          "RS": "Rio Grande do Sul", 
          "RN": "Rio Grande do Norte", 
          "SP": "Sao Paulo", 
          "ES": "Espirito Santo", 
          "SC": "Santa Catarina"}

In [40]:
# Replace the column "state" with state names

precip = precip.replace({"state": states})

In [41]:
precip["date"] = precip.date.astype('datetime64[ns]')

In [42]:
precip.dtypes

state                    object
date             datetime64[ns]
precipitation           float64
dtype: object

In [43]:
# Get the month and year in separate columns

precip['year'] = pd.DatetimeIndex(precip['date']).year
precip['month'] = pd.DatetimeIndex(precip['date']).month

In [44]:
# Save the data frame to a csv file

precip.to_csv("../Data/clean_data/precipitation_brazil.csv", index=False)

### 6. Severity of climatic phenomena El Nino and La Nina in the period 1999-2019.

In [46]:
# Import the data 

ninos = pd.read_csv("../Data/el_nino_la_nina_1999_2019.csv")

In [47]:
ninos

Unnamed: 0,start year,end year,phenomenon,severity
0,2004,2005,El Nino,Weak
1,2006,2007,El Nino,Weak
2,2014,2015,El Nino,Weak
3,2018,2019,El Nino,Weak
4,2002,2003,El Nino,Moderate
5,2009,2010,El Nino,Moderate
6,2015,2016,El Nino,Very Strong
7,2000,2001,La Nina,Weak
8,2005,2006,La Nina,Weak
9,2008,2009,La Nina,Weak


In [48]:
# Create a dictionary with the severity level as numeric value

severity_dict = {"Weak": 1, 
                 "Moderate": 2, 
                 "Strong": 3, 
                 "Very Strong": 4}

In [49]:
# Create a new column with the severity level

ninos["severity_level"] = ninos["severity"].map(severity_dict)

In [50]:
ninos

Unnamed: 0,start year,end year,phenomenon,severity,severity_level
0,2004,2005,El Nino,Weak,1
1,2006,2007,El Nino,Weak,1
2,2014,2015,El Nino,Weak,1
3,2018,2019,El Nino,Weak,1
4,2002,2003,El Nino,Moderate,2
5,2009,2010,El Nino,Moderate,2
6,2015,2016,El Nino,Very Strong,4
7,2000,2001,La Nina,Weak,1
8,2005,2006,La Nina,Weak,1
9,2008,2009,La Nina,Weak,1


In [51]:
# Save the data frame to a csv file

ninos.to_csv("../Data/clean_data/ninos.csv", index=False)