# Analysis into the Brazil Forest Fires

This analysis is based on the Kaggle data set https://www.kaggle.com/man72331/forest-fires-data-analysis 

The goal of this analysis is to gain insights in the patern of fires in the Brazil Forest

1. How has the number of fires evolved over the last years
2. Which state has seen the highest /lowest in/decrease in forest fires
3. Which month has the most chance of an fire per region
4. Does the chance of fire per moth per region show any dissimalarities
5. Which year had the most / least fires
6. Which state on average has the most / least fires
7. Which year / state has had the largest deviation in forest fires


By: <b>Bram van Schaik</b> 
<br>Date: <b> October 2019</b>

## Part 1: Import libraries

In [3]:
import pandas as pd
import numpy as np

from scipy import stats


#### Visuals ####
import plotly.express as px

import plotly as py
import plotly.graph_objs as go

## Part 2: Import the dataset

In [4]:
#### Read in the data set ####
df = pd.read_csv("C:\\Users\\schai495\\Documents\\GitHub\\DEMO_PROJECTS\\Brazil Forest Fires\\amazon.csv", sep=",", engine='python')

## Part 3: Exploratory Analysis
This part is about getting to know our dataset

In [5]:
#### Looking at the first and last few rows to gain an understanding of the content ####
print(df.head(3))
print("\n")
print(df.tail(3))

   year state    month  number        date
0  1998  Acre  Janeiro     0.0  1998-01-01
1  1999  Acre  Janeiro     0.0  1999-01-01
2  2000  Acre  Janeiro     0.0  2000-01-01


      year      state     month  number        date
6451  2014  Tocantins  Dezembro   223.0  2014-01-01
6452  2015  Tocantins  Dezembro   373.0  2015-01-01
6453  2016  Tocantins  Dezembro   119.0  2016-01-01


### Conclusions:
- The month column has the spanish month names instead of common english names


In [6]:
### Now we are going to be looking at the datatypes,shape and missing/unique for each column ####
def resumetable(df):
    print(f"Dataset Shape: {df.shape}")
    summary = pd.DataFrame(df.dtypes,columns=['dtypes'])
    summary = summary.reset_index()
    summary['Name'] = summary['index']
    summary = summary[['Name','dtypes']]
    summary['Missing'] = df.isnull().sum().values    
    summary['Uniques'] = df.nunique().values
    return summary

resumetable(df)

Dataset Shape: (6454, 5)


Unnamed: 0,Name,dtypes,Missing,Uniques
0,year,int64,0,20
1,state,object,0,23
2,month,object,0,12
3,number,float64,0,1479
4,date,object,0,20


### Conclusions:
- There are 20 unique years in the data set
- There are 23 unique states in the data set
- State, Month and date are objects

### Question 1: How has the amount of fires developt over the last years

In [12]:
#### Count the fires by year and visualize ####
YEAR = df.groupby(['year']).sum()
YEAR = YEAR.reset_index()
YEAR.head()

Unnamed: 0,year,number
0,1998,20013.971
1,1999,26882.821
2,2000,27351.251
3,2001,29071.612
4,2002,37390.6


In [16]:
fig = px.line(YEAR, x="year", y="number", title='Brazil Forest Fires by Year')
fig.update_layout(title='Amount of forest fires in Brazil by Year',
                   xaxis_title='Year',
                   yaxis_title='Amount of Forest Fires')
fig.show()

In [38]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
x = YEAR[['year']]
y = YEAR[['number']]

model.fit(x, y)


#reg.fit(df[['year']],df.income)

r_sq = model.score(x, y)
print('coefficient of determination:', r_sq)
print('\n')
print('intercept:', model.intercept_)
print('\n')
print('slope:', model.coef_)

coefficient of determination: 0.4299563767268794


intercept: [-1277040.25038571]


slope: [[653.54244286]]


Conclusion:

- The regression formula shows an increase slope of 654 for each year

In [46]:
Slope = model.coef_
Intercept = model.intercept_

In [48]:
y*Slope

product = 1
list = [1, 2, 3, 4]
for num in list:
    product = product * num

# product = 24

ValueError: Unable to coerce to DataFrame, shape must be (20, 1): given (1, 1)

In [51]:
fig = px.histogram(df, x="number")
fig.show()

In [11]:
#STATE = df.groupby(['state']).sum()
#STATE = STATE.reset_index()
#STATE = STATE.sort_values(['number'], ascending= False)

TEN = df['year'] > 2015

TEN = df[TEN]

fig = px.bar(TEN, x="state", y="number", color="state", 
             facet_row="year", facet_col="month",)

fig.update_layout(title='Amount of forest fires in Brazil by State',
                   xaxis_title='State',
                   yaxis_title='Amount of Forest Fires')

fig.show()

0    False
1    False
2    False
3    False
4    False
Name: year, dtype: bool