In [None]:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

%matplotlib inline
sns.set()

### Data ingestion and cleaning

In [None]:
# Encoding for Brazilian : https://community.atlassian.com/t5/Jira-questions/File-encoding-for-Portuguese-Brazilian/qaq-p/680609
df =  pd.read_csv('/kaggle/input/forest-fires-in-brazil/amazon.csv', encoding="ISO-8859-1")
df.head()

In [None]:
df.shape

So the dataset has 6 columns and 6454 rows. Let's check for missing values in the columns.

In [None]:
df.isna().sum()

So, we don't have any missing values in the data. A closer look at the `date` column separately tells us that it is not of much use. The `year` information is already captured, and the remaining date is always `01-01`. So we can easily discard the `date` column from any further analysis.

Next, let's do the following transformations on the dataset:
- Change `number` column from `float` to `int`
- Converting `month` column to numeric.

In [None]:
# converting `number` to int
df.number = df.number.astype('int')

# replacing the month column with its numeric version
df["month"] = df["month"].map({'Janeiro': 1, 'Fevereiro': 2, 'Março': 3, 'Abril': 4, 'Maio': 5, 'Junho': 6, 'Julho': 7,
       'Agosto': 8, 'Setembro': 9, 'Outubro':10, 'Novembro':11, 'Dezembro':12}).astype('category')

Let's have a look at the transformed data now.

In [None]:
df.head()

### Visualizations and inferences

#### How is the number of forest fires varying over the years?

In [None]:
fires_by_year = df[["year", "number"]].groupby("year").sum()
fires_by_year.reset_index(inplace=True)

plt.figure(figsize=(18,6))
sns.lineplot(x="year", y="number", data=fires_by_year)
plt.xticks(fires_by_year["year"])
plt.title("Number of forest fires each year")
plt.show()

There is a clear increasing trend in the number of forest fires happening in Brazil over the years.

#### Is there a relation between the month and the number of forest fires?

In [None]:
fires_by_month =  df[["month", "number"]].groupby("month").sum()
fires_by_month.reset_index(inplace=True)

plt.figure(figsize=(18,6))
sns.lineplot(x="month", y="number", data=fires_by_month)
plt.xticks(fires_by_month["month"])
plt.title("Number of forest fires by month")
plt.show()

This is a clear indicator that there are specific months when the number of forest fires in Brazil increases. The most number of fires happen in July, followed by August, October and November.

#### Which state has the most number of forest fires?

In [None]:
fires_by_state =  df[["state", "number"]].groupby("state").sum()
fires_by_state.reset_index(inplace=True)

plt.figure(figsize=(18,6))
sns.lineplot(x="state", y="number", data=fires_by_state)
plt.xticks(fires_by_state["state"], rotation="vertical")
plt.title("Number of forest fires by state")
plt.show()

This is surprising. In the media, we generally hear about the forest fires in the Amazons, but the state of Mato Grosso has far higher number of fires. It would be interesting to see the distribution of states with the most number of forest fires across years. So let's plot that next.

#### Distribution of forest fires by year - Top 10 states

In [None]:
top_10_states = fires_by_state.sort_values(["number"], ascending=False)[0:10]["state"]
fires_in_top_10_states = df[df.state.isin(top_10_states)][["state", "year", "number"]]
fires_in_top_10_states_by_year = fires_in_top_10_states.groupby(["state","year"]).sum().reset_index()

In [None]:
plt.figure(figsize=(18,6))
sns.lineplot(x="year", y="number", hue="state", data=fires_in_top_10_states_by_year)
plt.xticks(fires_in_top_10_states_by_year["year"])
plt.title("Top 10 states with forest fires over the years")
plt.legend(loc="upper left")
plt.show()

The state of Mato Grosso continues to fight with forest fires on a far larger scale than do the rest of the states. In 2009 there was a sharp increase in the number of forest fires that happened in Mato Grosso. 

### Final conclusions

- Forest fires are increasing over the years and needs to be taken seriously.
- A major proportion of the forest fires occur in the months of August-October. It corresponds to the months of Fall. The presence of a large number of dried leaves probably aid in spread the forest fires. An analysis of the weather data during this period can indicate if weather conditions have a role to play in this.
- The state of Mato Grosso has the most number of forest fires. Amazon comes 10th in the list. 
- Even after so much talk about forest fires, there is no decreasing trend in the forest fires in any states.