# Algerian Forest Fires Dataset 

## Data Set Information:
The dataset includes 244 instances that represent data from two regions of Algeria:

- **Bejaia region**: Located in the northeast of Algeria
- **Sidi Bel-abbes region**: Located in the northwest of Algeria

There are 122 instances for each region, covering the period from June 2012 to September 2012.

The dataset contains **11 attributes** and **1 output attribute** (class).

- **Total Instances**: 244
- **Class Distribution**: 
  - **Fire**: 138 instances
  - **Not Fire**: 106 instances

---

## Attribute Information:

1. **Date**: (DD/MM/YYYY) 
   - Day, month ('June' to 'September'), year (2012)

2. **Temp**: Temperature at noon (maximum temperature) in Celsius (22 to 42°C)

3. **RH**: Relative Humidity in percentage (21% to 90%)

4. **Ws**: Wind speed in km/h (6 to 29 km/h)

5. **Rain**: Total daily rainfall in mm (0 to 16.8 mm)

---

### FWI Components:
6. **Fine Fuel Moisture Code (FFMC)**: Index from the FWI system (28.6 to 92.5)

7. **Duff Moisture Code (DMC)**: Index from the FWI system (1.1 to 65.9)

8. **Drought Code (DC)**: Index from the FWI system (7 to 220.4)

9. **Initial Spread Index (ISI)**: Index from the FWI system (0 to 18.5)

10. **Buildup Index (BUI)**: Index from the FWI system (1.1 to 68)

11. **Fire Weather Index (FWI)**: Index from the FWI system (0 to 31.1)

---

12. **Classes**: 
   - **Fire** 
   - **Not Fire**


In [135]:
# I am going to create a model that will prediv the temperature using the the input features

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [136]:
dataset=pd.read_csv('../dataset/Algerian_forest_fires_dataset.csv' ,header=1)

In [None]:
dataset.head()

In [None]:
dataset.info()

## Data Cleaning

In [None]:
dataset.isnull().sum()

In [None]:
## missing values
dataset[dataset.isnull().any(axis=1)]
# is used to filter rows in a pandas DataFrame that contain any missing (null) values.

## Region Column Addition

The dataset consists of 244 instances from two regions of Algeria:

1. **Bejaia Region**: The first 122 instances (indices 0 to 121).
2. **Sidi-Bel Abbes Region**: The remaining 122 instances (indices 122 to 243).

To clearly distinguish between the two regions, we add a new column **"Region"** to the dataset:

- For instances with indices from 0 to 121, the **Region** column will be labeled as **"Bejaia Region Dataset"**.
- For instances with indices from 122 to 243, the **Region** column will be labeled as **"Sidi-Bel Abbes Region Dataset"**.

This helps categorize the data based on its geographical region.


In [141]:
# till 122th row i will create a new feature called region and assingn 0 to it beacuse it belongs to Bejaia region 
dataset.loc[:122,"Region"]=0 
# from 123th row i will  assingn 1 to it beacuse it belongs to Sidi-Bel region 
dataset.loc[122:,"Region"]=1
df=dataset

In [None]:
df.info()

In [None]:
df.head()

In [None]:
df.tail()

In [145]:
df[['Region']]=df[['Region']].astype(int)

In [None]:
df.info()

In [None]:
df.head()

In [None]:
df.isnull().sum()

In [149]:
## Removing the null values
df=df.dropna().reset_index(drop=True)


In [None]:
df.head()

In [None]:
df.isnull().sum()

In [None]:
# printing 122th row
df.iloc[[122]]

In [153]:
##remove the 122nd row
df=df.drop(122).reset_index(drop=True)

In [None]:
df.iloc[[122]]

In [None]:
df.columns

In [None]:
## fix spaces in columns names
df.columns=df.columns.str.strip()
df.columns

In [None]:
df.info()

#### Changes the required columns as integer data type

In [None]:
df.columns

In [159]:
df[['month','day','year','Temperature','RH','Ws']]=df[['month','day','year','Temperature','RH','Ws']].astype(int)

In [None]:
df.info()

In [None]:
df.head()

#### Changing the other columns to float data datatype


In [162]:
objects=[features for features in df.columns if df[features].dtypes=='O']

In [None]:
objects

In [164]:
for i in objects:
    if i!='Classes': # since classes is categorical feature 
        df[i]=df[i].astype(float)

In [None]:
df.info()

In [None]:
objects

In [None]:
df.describe()

In [None]:
df.head()

In [169]:
## Let ave the cleaned dataset
df.to_csv('../dataset/Algerian_forest_fires_cleaned_dataset.csv',index=False)

##  Exploratory Data Analysis

In [170]:
## drop day,month and year
df_copy=df.drop(['day','month','year'],axis=1)

In [None]:
df_copy.head()

In [None]:
## categories in classes
df_copy['Classes'].value_counts()

In [173]:
## Encoding of the categories in classes
df_copy['Classes']=np.where(df_copy['Classes'].str.contains('not fire'),0,1)
#  if in classes 'not fire' then make it 0 otherwise 1 (we are doing encoding here)

In [None]:
df_copy.head()

In [None]:
df_copy.tail()

In [None]:
df_copy['Classes'].value_counts()

In [None]:
## Plot desnity plot for all features
plt.style.use('Solarize_Light2')
df_copy.hist(bins=50,figsize=(20,15))
plt.show()
# print(plt.style.available)


In [178]:
## Percentage for Pie Chart
percentage=df_copy['Classes'].value_counts(normalize=True)*100

In [None]:
# plotting piechart
classlabels=["Fire","Not Fire"]
plt.figure(figsize=(12,7))
plt.pie(percentage,labels=classlabels,autopct='%1.1f%%')
plt.title("Pie Chart of Classes")
plt.show()

## Correlation

In [None]:
df_copy.corr()

In [None]:
sns.heatmap(df_copy.corr(),annot=True)

In [None]:
## Box Plots
sns.boxplot(df_copy['FWI'],color='green')

In [None]:
df_copy.head()

In [184]:
df['Classes'] = np.where(df['Classes'].str.contains('not fire'), 'not fire', 'fire')

In [None]:
## Monthly Fire Analysis
dftemp=df.loc[df['Region']==1]
plt.subplots(figsize=(13,6))
sns.set_style('whitegrid')
sns.countplot(x='month',hue='Classes',data=df)
plt.ylabel('Number of Fires',weight='bold')
plt.xlabel('Months',weight='bold')
plt.title("Fire Analysis of Sidi- Bel Regions",weight='bold')

In [None]:
## Monthly Fire Analysis
dftemp=df.loc[df['Region']==0]
plt.subplots(figsize=(13,6))
sns.set_style('whitegrid')
sns.countplot(x='month',hue='Classes',data=df)
plt.ylabel('Number of Fires',weight='bold')
plt.xlabel('Months',weight='bold')
plt.title("Fire Analysis of Brjaia Regions",weight='bold')

Its observed that August and September had the most number of forest fires for both regions. And from the above plot of months, we can understand few things

Most of the fires happened in August and very high Fires happened in only 3 months - June, July and August.

Less Fires was on September