# Exploratory Data Analysis
---

## Dataset
1. Regroups data recorded from two regions of Algeria-
   - Bejaia region, NE-Algeria
   - Sidi Bel-abbes region, NW-Algeria
2. Size- 244 instances (122 instances per region)
3. Contains 11 attributes/features and 1 target variable/class
4. 244 instances is split into two classes-
   - Fire (138 instances)
   - Not Fire (106 instances)
   
   
### Attributes/Features
|Attribute|Description|
|:---:|:---:|
|`Date`|Split into 3 columns- `day`, `month`, `year`|
|`Temperature`|Maximum noon temperature ($°C$)|
|`RH`|Relative humidity in $\%$|
|`Ws`|Wind speed in $km/h$|
|`Rain`|Total rainfall in $mm$|
|`FWI`|Fire Weather Index (FWI) Index|
|`FFMC`|Fine Fuel Moisture Code (FFMC) index from the FWI system|
|`DMC`|Duff Moisture Code (DMC) index from the FWI system|
|`DC`|Drought Code (DC) index from the FWI system|
|`ISI`|Initial Spread Index (ISI) from the FWI system|
|`BUI`|Buildup Index (BUI) from the FWI system|
|`Classes`|Target variable. Two possible values- `fire`, `not fire`|


---

## Dependencies
---

In [1]:
# for data wrangling with dataframes
import pandas as pd
# for numerical computations
import numpy as np
# for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

### Loading the Cleaned Dataset

In [2]:
# local path
path = r"D:\End-to-End ML Project\dataset\Algerian_forest_fires_dataset_CLEANED.csv"

In [4]:
# load as dataframe
dataset = pd.read_csv(path)

# check if loaded properly
dataset.head()

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes,Region
0,1,6,2012,29,57,18,0.0,65.7,3.4,7.6,1.3,3.4,0.5,not fire,0
1,2,6,2012,29,61,13,1.3,64.4,4.1,7.6,1.0,3.9,0.4,not fire,0
2,3,6,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire,0
3,4,6,2012,25,89,13,2.5,28.6,1.3,6.9,0.0,1.7,0.0,not fire,0
4,5,6,2012,27,77,16,0.0,64.8,3.0,14.2,1.2,3.9,0.5,not fire,0


In [5]:
# copy into separate variable to avoid overwriting
df = dataset.copy()