Fires are a serious problem in Brazil. As stated under the Dataset description, "Understanding the frequency of forest fires in a time series can help to take action to prevent them". Being able to pin-point where and when that frequency is most observed should give some clarity on what is the scope we are looking at.

In [None]:
#Import all necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas
import plotly.express as px

In [None]:
#Using pandas to read the csv file and encoding the file to ISO-88590-1
df = pd.read_csv('../input/forest-fires-in-brazil/amazon.csv', encoding = "ISO-8859-1")

In [None]:
#Examining the head of the dataset
df.head()

Let's first start with simple analysis to plot:
* Total fires reported by year
* Total fires reported by month
* Total fires reported by state
* Total fires reported by year for each state

In [None]:
#Creating a pivot to get the total number of fires and the year
pivot1 = pd.pivot_table(df,values="number",index=["year"],aggfunc=np.sum)

In [None]:
#Reading the pivot table generated
pivot1

In [None]:
#Plotting the graph
plt.figure(figsize=(15, 6))
sns.set_style("darkgrid")
ax = sns.barplot(x=pivot1.index, y="number", color="coral", data=pivot1)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
plt.xlabel("Year")
plt.ylabel("Count of fires")
plt.title("Number of Fires by Year")

The following analysis can be drawn from the plot:
* Forest fires have been increasing from 1998 with a sudden spike in 2002
* in the year 2003, a large number of forest fires happened after which there has been a decline
* Sadly in 2009, it has again spiked and is continuing going up and down between the range of 35K-40K fires

In [None]:
#creating a dictionary with translations of months
month_map={'Janeiro': 'January', 'Fevereiro': 'February', 'Março': 'March', 'Abril': 'April', 'Maio': 'May',
          'Junho': 'June', 'Julho': 'July', 'Agosto': 'August', 'Setembro': 'September', 'Outubro': 'October',
          'Novembro': 'November', 'Dezembro': 'December'}
#mapping our translated months
df['month']=df['month'].map(month_map)

In [None]:
#Creating another pivot to plot the number of fires by month
pivot2 = pd.pivot_table(df,values="number",index=["month"],aggfunc=np.sum)

In [None]:
pivot2

In [None]:
#Plotting the graph
plt.figure(figsize=(15, 6))
sns.set_style("darkgrid")
ax = sns.barplot(x=pivot2.index, y="number", data=pivot2)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
plt.xlabel("Month")
plt.ylabel("Count of fires")
plt.title("Fire vs Month")

The following analysis can be drawn from the plot:
* February, March, April and May see the lowest number of forest fires
* A sudden spike in June continuing the trend till November
* July, August, October and November are the 4 months where maximum forest fires happen

In [None]:
#Creating another pivot to plot the fires by state
pivot3 = pd.pivot_table(df,values="number",index=["state"],aggfunc=np.sum)

In [None]:
pivot3

In [None]:
#Plotting the graph
plt.figure(figsize=(15, 6))
sns.set_style("darkgrid")
plt.figure(figsize=(20, 6))
ax = sns.barplot(x=pivot3.index, y="number", data=pivot3)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
plt.xlabel("State")
plt.ylabel("Count of fires")
plt.title("Fire vs State")

The following analysis can be drawn from the plot:
* Mato Grosso see a huge number of forest fires
* Sergipe, Distrito Federal, Alagoas and Espirito Santo see the lowest number of forest fires

Now we are plotting number of fires reported by year for each state in a grid. It is easy to glance through the plots and arrive at analysis.

In [None]:
#Plotting the graph
plt.figure(figsize=(35, 6))
g = sns.FacetGrid(df, col="state", col_wrap=3, height=5, xlim=(0, 60), ylim=(0, 500))
g.map(sns.pointplot, "year", "number", color="red", ci=None);
g.set_xticklabels(rotation=45)

In [None]:


pivot3
af_geo = pd.DataFrame(data=pivot3.iloc[[2,4,8,10,9,16,19,18,22]],columns=pivot3.columns)
af_geo

In [None]:
#taking my time and adding all coordinates (latitude and longitude) for this top 10 states
lat=[0.035574, -11.409874, -16.665136, -16.350000, -5.56667, -22.90278, -27.593500,
     5.1433, -21.175]
long=[-51.070534, -41.280857, -49.286041, -56.666668, -46.74222, -43.2075, -48.558540,
     -60.7625, -43.01778]
#adding new coordinates as columns to subdataframe above
af_geo['Lat']=lat
af_geo['Long']=long
#checking changes in subdataframe for geo visualization
af_geo

In [None]:
#using scatter geo with above created subdataframe
fig = px.scatter_geo(data_frame=af_geo, scope='south america',lat='Lat',lon='Long',
                     size='number', color='number', projection='hammer')
fig.update_layout(
        title_text = '1998-2017 Top States in Brazil with increasing reported fires')
fig.show()