# 💡 Exploratory Data Analysis 💡

Data taken from: https://www.kaggle.com/gustavomodelli/forest-fires-in-brazil/downloads/forest-fires-in-brazil.zip/1

## 🤔 Consider visualizing the following

* Multiple states overlaid on top of one another in **time series analysis**
    * **X-axis**: time
        * *month* 还是 *year*? -- *omg **YEAR** as main, with **MONTH** as drilldown option*!
        * **In this EDA**, use *everything* to plot time series; **in deployal**, plot by _overall fires in the year_, then _drilldown to month-by-month_
    * **y-axis**: # of fires
    * **z-axis**: specific Brazilian state
* ***Cumulative*** fires in all of Brazil as a **bar chart**
    * **X-axis**: year (_again, with MONTH as a drilldown option_)
    * **y-axis**: total # of fires across all months and all regions

In [25]:
ls

[31mamazon.csv[m[m*                 forest-fires-in-brazil.zip


In [5]:
from scipy import stats
import math
import csv
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [88]:
df = pd.read_csv("amazon.csv", encoding= 'ISO-8859-1')
df.head(10)

Unnamed: 0,year,state,month,number,date
0,1998,Acre,Janeiro,0.0,1998-01-01
1,1999,Acre,Janeiro,0.0,1999-01-01
2,2000,Acre,Janeiro,0.0,2000-01-01
3,2001,Acre,Janeiro,0.0,2001-01-01
4,2002,Acre,Janeiro,0.0,2002-01-01
5,2003,Acre,Janeiro,10.0,2003-01-01
6,2004,Acre,Janeiro,0.0,2004-01-01
7,2005,Acre,Janeiro,12.0,2005-01-01
8,2006,Acre,Janeiro,4.0,2006-01-01
9,2007,Acre,Janeiro,0.0,2007-01-01


In [83]:
df['year'].unique()

array([1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008,
       2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017])

In [84]:
df['state'].unique().tolist()

['Acre',
 'Alagoas',
 'Amapa',
 'Amazonas',
 'Bahia',
 'Ceara',
 'Distrito Federal',
 'Espirito Santo',
 'Goias',
 'Maranhao',
 'Mato Grosso',
 'Minas Gerais',
 'Pará',
 'Paraiba',
 'Pernambuco',
 'Piau',
 'Rio',
 'Rondonia',
 'Roraima',
 'Santa Catarina',
 'Sao Paulo',
 'Sergipe',
 'Tocantins']

In [28]:
df['number'].dtype

dtype('float64')

In [34]:
df['date'].dtype

dtype('O')

In [35]:
df['date'] = pd.to_datetime(df['date'])
df.sort_values(by='date').head()

Unnamed: 0,year,state,month,number,date
0,1998,Acre,Janeiro,0.0,1998-01-01
6415,1998,Tocantins,Novembro,1.0,1998-01-01
2790,1998,Mato Grosso,Setembro,457.0,1998-01-01
2810,1998,Mato Grosso,Outubro,576.0,1998-01-01
2830,1998,Mato Grosso,Novembro,306.0,1998-01-01


In [36]:
df['date'].dtype

dtype('<M8[ns]')

In [42]:
df['year'].dtype

dtype('int64')

In [38]:
df.dtypes

year               int64
state             object
month             object
number           float64
date      datetime64[ns]
dtype: object

Oooookay. Why, oh why, is 'date' a different data type depending on whether I'm checking _date alone_ vs. checking the _entire dataframe_?

**Vincenzo says** it has something to do with endianness. (💡_Research later! Not important for creating this project, but it'll be good to know!_)

### TODO: Write a function that takes month + year and smashes 'em into a date that actually represents the month + year

1. Drop current useless 'date' column
1. Iterate through rows in dataframe
1. Instantiate variable `month_day` and set it equal to ''.
1. If `row['month'] = 'Janeiro'`, then `month_day == '-01-01'"`
1. Add an extra column to the end of dataframe whose name is **date**. Then set `row['date'] = str(row['year']) + month_day`
1. (_outside of for loop_) For the **whole dataframe**, cast `month_day` from **string** to **datetime**. TO CAST TO DATETIME: `df['date'] = pd.to_datetime(df['date'])`

In [89]:
df = df.drop(columns=['date'])
df.head()

Unnamed: 0,year,state,month,number
0,1998,Acre,Janeiro,0.0
1,1999,Acre,Janeiro,0.0
2,2000,Acre,Janeiro,0.0
3,2001,Acre,Janeiro,0.0
4,2002,Acre,Janeiro,0.0


In [90]:
df.insert(0, 'date', '')
df.head()

Unnamed: 0,date,year,state,month,number
0,,1998,Acre,Janeiro,0.0
1,,1999,Acre,Janeiro,0.0
2,,2000,Acre,Janeiro,0.0
3,,2001,Acre,Janeiro,0.0
4,,2002,Acre,Janeiro,0.0


In [57]:
df['year'][0]

1998

In [62]:
df.head()

Unnamed: 0,date,year,state,month,number
0,,1998,Acre,Janeiro,0.0
1,,1999,Acre,Janeiro,0.0
2,,2000,Acre,Janeiro,0.0
3,,2001,Acre,Janeiro,0.0
4,,2002,Acre,Janeiro,0.0


### Playtime

In [61]:
boo_thang = str(df['year'][0]) + '-01-01'
boo_thang

'1998-01-01'

In [73]:
row0 = 0
if df['month'][row0] == 'Janeiro':
    print('True')

True


### Function

In [91]:
for index, row in df.iterrows():
    if df['month'][index] == 'Janeiro':
        df.loc[index, 'date'] = str(df['year'][index]) + '-01-01'
    elif df['month'][index] == 'Fevereiro':
        df.loc[index, 'date'] = str(df['year'][index]) + '-02-01'
    elif df['month'][index] == 'Março':
        df.loc[index, 'date'] = str(df['year'][index]) + '-03-01'
    elif df['month'][index] == 'Abril':
        df.loc[index, 'date'] = str(df['year'][index]) + '-04-01'
    elif df['month'][index] == 'Maio':
        df.loc[index, 'date'] = str(df['year'][index]) + '-05-01'
    elif df['month'][index] == 'Junho':
        df.loc[index, 'date'] = str(df['year'][index]) + '-06-01'
    elif df['month'][index] == 'Julho':
        df.loc[index, 'date'] = str(df['year'][index]) + '-07-01'
    elif df['month'][index] == 'Agosto':
        df.loc[index, 'date'] = str(df['year'][index]) + '-08-01'
    elif df['month'][index] == 'Setembro':
        df.loc[index, 'date'] = str(df['year'][index]) + '-09-01'
    elif df['month'][index] == 'Outubro':
        df.loc[index, 'date'] = str(df['year'][index]) + '-10-01'
    elif df['month'][index] == 'Novembro':
        df.loc[index, 'date'] = str(df['year'][index]) + '-11-01'
    elif df['month'][index] == 'Dezembro':
        df.loc[index, 'date'] = str(df['year'][index]) + '-12-01'
df.head()

Unnamed: 0,date,year,state,month,number
0,1998-01-01,1998,Acre,Janeiro,0.0
1,1999-01-01,1999,Acre,Janeiro,0.0
2,2000-01-01,2000,Acre,Janeiro,0.0
3,2001-01-01,2001,Acre,Janeiro,0.0
4,2002-01-01,2002,Acre,Janeiro,0.0


In [93]:
df.tail()

Unnamed: 0,date,year,state,month,number
6449,2012-12-01,2012,Tocantins,Dezembro,128.0
6450,2013-12-01,2013,Tocantins,Dezembro,85.0
6451,2014-12-01,2014,Tocantins,Dezembro,223.0
6452,2015-12-01,2015,Tocantins,Dezembro,373.0
6453,2016-12-01,2016,Tocantins,Dezembro,119.0


In [95]:
df.dtypes

date       object
year        int64
state      object
month      object
number    float64
dtype: object

In [96]:
df['date'] = pd.to_datetime(df['date'])

In [97]:
df.head()

Unnamed: 0,date,year,state,month,number
0,1998-01-01,1998,Acre,Janeiro,0.0
1,1999-01-01,1999,Acre,Janeiro,0.0
2,2000-01-01,2000,Acre,Janeiro,0.0
3,2001-01-01,2001,Acre,Janeiro,0.0
4,2002-01-01,2002,Acre,Janeiro,0.0


In [98]:
df.dtypes

date      datetime64[ns]
year               int64
state             object
month             object
number           float64
dtype: object