# Wildfires in Brazil

## Data cleaning

Data regarding wildfires, deforestation and precipitation in Brazil was loaded into data frames and cleaned accordingly.

The following data was found:

1. Wildfires data for each state in Brazil per month in the period 1998-2017. Source: PORTAL BRASILEIRO DE DADOS ABERTOS.
2. Wildfires data for Brazil as a whole per month in the period 1998-2017. Source: PORTAL BRASILEIRO DE DADOS ABERTOS.
3. Wildfires data detected through satellite images. Source: Banco de Dados de Quemaidas.
4.
5.
6.
7. Temperature in Brazil in the period 1991-2016. Source: World Bank Data https://climateknowledgeportal.worldbank.org/download-data

In [1]:
# Import the necessary libraries

import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import re
import requests
import json

### 1. Wildfires data for each state in Brazil per month in the period 1998-2017.

In [None]:
# Import the data 

fires_year_state = pd.read_csv("../Data/rf_incendiosflorestais_focoscalor_estados_1998-2017.csv")

In [None]:
# Check for missing values

fires_year_state.isnull().sum()

In [None]:
# Convert "number" column to integer

fires_year_state["number"] = fires_year_state.number.astype('int64')

In [None]:
# Replace the column "month" with month id's
# First check the spelling of the months in Portuguese

fires_year_state["month"].unique()

In [None]:
# Create a dictionary with the months id's

months = {"Janeiro": 1, 
          "Fevereiro": 2, 
          "Março": 3, 
          "Abril": 4, 
          "Maio": 5, 
          "Junho": 6, 
          "Julho": 7, 
          "Agosto": 8, 
          "Setembro": 9, 
          "Outubro": 10, 
          "Novembro": 11, 
          "Dezembro": 12}

In [None]:
# Replace the column "month" with month id's

fires_year_state = fires_year_state.replace({"month": months})

In [None]:
fires_year_state.dtypes

In [None]:
# Verify the data for each state

pd.Series(fires_year_state["state"]).value_counts()

In [None]:
# We can see that Brazil's 27 states are present, and all states have the same entries

In [None]:
fires_year_state.dtypes

In [None]:
# Save the data frame to a csv file

fires_year_state.to_csv("../Data/clean_data/fires_year_state.csv", index=False)

### 2. Wildfires data for Brazil as a whole per month in the period 1998-2017.

In [None]:
# Impor the data

fires_month = pd.read_csv("../Data/rf_incendiosflorestais_focoscalor_brasil_1998-2017.csv")

In [None]:
# Separate the data into columns
# First split the strings

fires_month = fires_month["Ano;Mês;Número;Período"].str.split(";", expand=True)

In [None]:
# Create new column names and then drop the previous columns

fires_month["year"] = fires_month[0]
fires_month["month"] = fires_month[1]
fires_month["number"] = fires_month[2]
fires_month["period"] = fires_month[3]

fires_month.drop(columns=[0, 1, 2, 3], inplace=True)

In [None]:
# Eliminate the rows that do not fit

fires_month = fires_month[~fires_month.year.str.contains("Máximo")]

In [None]:
fires_month = fires_month[~fires_month.year.str.contains("Média")]

In [None]:
fires_month = fires_month[~fires_month.year.str.contains("Mínimo")]

In [None]:
fires_month = fires_month[~fires_month.month.str.contains("Total")]

In [None]:
# Convert the year column into integer type

fires_month["year"] = fires_month.year.astype('int64')

In [None]:
# Create a dictionary with the months id's

months = {"Janeiro": 1, 
          "Fevereiro": 2, 
          "Março": 3, 
          "Abril": 4, 
          "Maio": 5, 
          "Junho": 6, 
          "Julho": 7, 
          "Agosto": 8, 
          "Setembro": 9, 
          "Outubro": 10, 
          "Novembro": 11, 
          "Dezembro": 12}

In [None]:
# Replace the column "month" with month id's

fires_month = fires_month.replace({"month": months})

In [None]:
# Convert the month column into integer type

fires_month["number"] = fires_month.number.astype('int64')

In [None]:
# Convert the period column into a date type

fires_month["period"] = fires_month.period.astype('datetime64[ns]')

In [None]:
fires_month.dtypes

In [None]:
# Verify the data for each year

pd.Series(fires_month["year"]).value_counts()

In [None]:
# We can see that there are entries for each year, except december 2017

In [None]:
# Save the data frame to a csv file

fires_month.to_csv("../Data/clean_data/fires_month.csv", index=False)

### 3. Wildfires data for Brazil's Legal Amazon (BLA) per month in the period 1999-2019.

In [None]:
# Import the data 

fires_bla = pd.read_csv("../Data/inpe_brazilian_amazon_fires_1999_2019.csv")

In [None]:
# Check for missing values

fires_bla.isnull().sum()

In [None]:
# Convert to title the states

fires_bla["state"] = fires_bla["state"].str.title()

In [None]:
# Verify the data for each state

pd.Series(fires_bla["state"]).value_counts()

In [None]:
# We can see that only 9 states are present in this data frame. 
# These states belong to Brazil's Legal Amazon (BLA), which is the largest socio-geographic division in Brazil.
# We can also see that the number of entries for each state varies.

In [None]:
# Save the data frame to a csv file

fires_bla.to_csv("../Data/clean_data/fires_bla.csv", index=False)

### 4. Deforestation data for Brazil's Legal Amazon (BLA) per year in the period 2004-2019.

In [None]:
# Import the data

url = "http://www.obt.inpe.br/OBT/assuntos/programas/amazonia/prodes"

In [None]:
html = requests.get(url).content

In [None]:
soup = BeautifulSoup(html, "html.parser")

In [None]:
soup

In [None]:
col_values = [items.text for items in (soup
                                       .find('div', id_='wrapper')
                                       .find('div', id_='main')
                                       .find_all('table')[0]
                                       .find_all('td',{'style':'text-align: left;'}))]

col_keys = [items.text[:-1] for items in (soup
                                       .find('div', id_='wrapper')
                                       .find('div', id_='main')
                                       .find_all('table')[0]
                                       .find_all('td',{'style':'text-align: left;'}))]

names_dict = dict(zip(col_keys, col_values))

### 5. Precipitation data for Brazil's Legal Amazon (BLA) per year in the period 2004-2019.

In [None]:
# Import the data 

precip = pd.read_csv("../Data/precipitation.csv")

In [None]:
# Check for missing values

precip.isnull().sum()

In [None]:
precip.columns

In [None]:
precip["state"].unique()

In [None]:
# Create a dictionary with the states id's

states = {"BA": "Bahia", 
          "RR": "Roraima", 
          "SE": "Sergipe", 
          "AL": "Alagoas", 
          "TO": "Tocantins", 
          "GO": "Goias", 
          "PI": "Piauí", 
          "MG": "Minas Gerais", 
          "PR": "Paraná", 
          "MA": "Maranhao", 
          "AP": "Amapa", 
          "RJ": "Rio de Janeiro",
          "AC": "Acre", 
          "AM": "Amazonas", 
          "DF": "Distrito Federal", 
          "PE": "Pernambuco", 
          "CE": "Ceara", 
          "PA": "Pará", 
          "MT": "Mato Grosso", 
          "PB": "Paraiba", 
          "RS": "Rio Grande do Sul", 
          "RN": "Rio Grande do Norte", 
          "SP": "Sao Paulo", 
          "ES": "Espirito Santo", 
          "SC": "Santa Catarina"}

In [None]:
# Replace the column "state" with state names

precip = precip.replace({"state": states})

In [None]:
precip["date"] = precip.date.astype('datetime64[ns]')

In [None]:
precip.dtypes

In [None]:
# Get the month and year in separate columns

precip['year'] = pd.DatetimeIndex(precip['date']).year
precip['month'] = pd.DatetimeIndex(precip['date']).month

In [None]:
# Save the data frame to a csv file

precip.to_csv("../Data/clean_data/precipitation_brazil.csv", index=False)

### 6. Severity of climatic phenomena El Nino and La Nina in the period 1999-2019.

In [None]:
# Import the data 

ninos = pd.read_csv("../Data/el_nino_la_nina_1999_2019.csv")

In [None]:
# Create a dictionary with the severity level as numeric value

severity_dict = {"Weak": 1, 
                 "Moderate": 2, 
                 "Strong": 3, 
                 "Very Strong": 4}

In [None]:
# Create a new column with the severity level

ninos["severity_level"] = ninos["severity"].map(severity_dict)

In [None]:
ninos

In [None]:
# Save the data frame to a csv file

ninos.to_csv("../Data/clean_data/ninos.csv", index=False)

### 7. Temperature in Brazil per month from 1991 to 2016.

In [3]:
# Load the data

temp = pd.read_csv("../Data/temperature_1991_2016_BRA.csv")

In [4]:
temp.head()

Unnamed: 0,Temperature - (Celsius),Year,Statistics,Country,ISO3
0,25.6309,1991,Jan Average,Brazil,BRA
1,25.9331,1991,Feb Average,Brazil,BRA
2,25.6195,1991,Mar Average,Brazil,BRA
3,25.3122,1991,Apr Average,Brazil,BRA
4,24.6685,1991,May Average,Brazil,BRA


In [5]:
# Check for missing values

temp.isnull().sum()

Temperature - (Celsius)    0
 Year                      0
 Statistics                0
 Country                   0
 ISO3                      0
dtype: int64

In [6]:
# Checking datatypes

temp.dtypes

Temperature - (Celsius)    float64
 Year                        int64
 Statistics                 object
 Country                    object
 ISO3                       object
dtype: object

In [7]:
temp['Month'] = temp[' Statistics'].str.replace(" Average","")
temp

Unnamed: 0,Temperature - (Celsius),Year,Statistics,Country,ISO3,Month
0,25.6309,1991,Jan Average,Brazil,BRA,Jan
1,25.9331,1991,Feb Average,Brazil,BRA,Feb
2,25.6195,1991,Mar Average,Brazil,BRA,Mar
3,25.3122,1991,Apr Average,Brazil,BRA,Apr
4,24.6685,1991,May Average,Brazil,BRA,May
...,...,...,...,...,...,...
307,25.5629,2016,Aug Average,Brazil,BRA,Aug
308,25.9775,2016,Sep Average,Brazil,BRA,Sep
309,27.0781,2016,Oct Average,Brazil,BRA,Oct
310,26.7037,2016,Nov Average,Brazil,BRA,Nov


In [8]:
# Create column month 

temp = temp.rename(columns = {"Temperature - (Celsius)" : "Temperature", 
                              " Year" : "Year",
                              " Country" : "Country"}).drop(columns = [" Statistics", " ISO3"])


In [22]:
temp

Unnamed: 0,Temperature,Year,Country,Month
0,25.6309,1991,Brazil,Jan
1,25.9331,1991,Brazil,Feb
2,25.6195,1991,Brazil,Mar
3,25.3122,1991,Brazil,Apr
4,24.6685,1991,Brazil,May
...,...,...,...,...
307,25.5629,2016,Brazil,Aug
308,25.9775,2016,Brazil,Sep
309,27.0781,2016,Brazil,Oct
310,26.7037,2016,Brazil,Nov


In [26]:
months = {" Jan": 1, 
          " Feb": 2, 
          " Mar": 3, 
          " Apr": 4, 
          " May": 5, 
          " Jun": 6, 
          " Jul": 7, 
          " Aug": 8, 
          " Sep": 9, 
          " Oct": 10, 
          " Nov": 11, 
          " Dec": 12}

In [27]:
# Replace the column "month" with month id's

temp = temp.replace({"Month": months})

In [28]:
temp.dtypes

Temperature    float64
Year             int64
Country         object
Month            int64
dtype: object

In [29]:
temp

Unnamed: 0,Temperature,Year,Country,Month
0,25.6309,1991,Brazil,1
1,25.9331,1991,Brazil,2
2,25.6195,1991,Brazil,3
3,25.3122,1991,Brazil,4
4,24.6685,1991,Brazil,5
...,...,...,...,...
307,25.5629,2016,Brazil,8
308,25.9775,2016,Brazil,9
309,27.0781,2016,Brazil,10
310,26.7037,2016,Brazil,11


In [30]:
# Save the data frame to a csv file

temp.to_csv("../Data/clean_data/temp_clean.csv", index=False)