# Algerian Forest Fire Dataset

The Algerian Forest Fires Dataset contains data related to forest fires in Algeria. It includes meteorological, environmental, and fire occurrence attributes to study fire prediction and related analysis. The dataset is split into two regions of Algeria (Bejaia and Sidi Bel-abbes) and includes multiple features measured over time. Here are the main features and specifications of this dataset:

# Key Features
Day and Month:
The day and month the data was recorded.

Temperature:
Temperature recorded in degrees Celsius.

RH (Relative Humidity):
The relative humidity percentage.
Ws (Wind Speed): 
Wind speed measured in km/h.
Rain:
Rainfall amount in mm.
FFMC (Fine Fuel Moisture Code):
Indicates moisture content in surface litter and small fuel materials, impacting fire spread.

DMC (Duff Moisture Code):
Indicates moisture content in loosely compacted organic layers, affecting fire ignition.

DC (Drought Code)
Represents moisture content in compact organic layers, reflecting long-term drought conditions.

ISI (Initial Spread Index):
A measure of fire spread rate.

BUI (Buildup Index):
 Represents the total amount of fuel available for combustion.

FWI (Fire Weather Index):
A measure of fire intensity or potential fire spread.

Classes:
The binary classification target indicating whether a fire occurred or not, with values "Fire" or "No Fire".
Dataset Specifications

Region-specific data:
Data is divided into Bejaia and Sidi Bel-abbes regions of Algeria.

Time Frame:
The data was collected across various months and days to capture seasonality and different weather conditions.

Binary Classification:
The target is binary ("Fire" or "No Fire") to support classification models.

Number of Instances:
There are 244 instances in total, with data split between the two regions.

In [13]:
import numpy as np             
import pandas as pd        
import matplotlib.pyplot as plt  
import seaborn as sns     
%matplotlib inline            


In [14]:
df = pd.read_csv('Algerian_forest_fires_dataset.csv' , header=0)

In [15]:
df.head()

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
0,1,6,2012,29,57,18,0.0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
1,2,6,2012,29,61,13,1.3,64.4,4.1,7.6,1.0,3.9,0.4,not fire
2,3,6,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
3,4,6,2012,25,89,13,2.5,28.6,1.3,6.9,0.0,1.7,0.0,not fire
4,5,6,2012,27,77,16,0.0,64.8,3.0,14.2,1.2,3.9,0.5,not fire


In [18]:
df.isnull().sum()

day            1
month          2
year           2
Temperature    2
 RH            2
 Ws            2
Rain           2
FFMC           2
DMC            2
DC             2
ISI            2
BUI            2
FWI            2
Classes        3
dtype: int64

In [16]:
df.describe()

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
count,246,245,245,245,245,245,245,245.0,245.0,245,245.0,245,245.0,244
unique,33,5,2,20,63,19,40,174.0,167.0,199,107.0,175,127.0,9
top,1,7,2012,35,55,14,0,88.9,7.9,8,1.1,3,0.4,fire
freq,8,62,244,29,10,43,133,8.0,5.0,5,8.0,5,12.0,131


# Data cleaning

In [20]:
df[df.isnull().any(axis = 1)]

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
122,,,,,,,,,,,,,,
123,Sidi-Bel Abbes Region Dataset,,,,,,,,,,,,,
168,14,7.0,2012.0,37.0,37.0,18.0,0.2,88.9,12.9,14.6 9,12.5,10.4,fire,


The Dataset is converted into two sets based on Region fron 123th index , we can make a new column based on the region 

1 : " Bejaia Region Dataset "
2: " Sidi-Bel Abbes Region Datasset "

Add new column with region 

In [21]:
df.loc[:122 , " Region"] = 0 
df.loc[122:, " Reigion"] = 1 


In [22]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 247 entries, 0 to 246
Data columns (total 16 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   day          246 non-null    object 
 1   month        245 non-null    object 
 2   year         245 non-null    object 
 3   Temperature  245 non-null    object 
 4    RH          245 non-null    object 
 5    Ws          245 non-null    object 
 6   Rain         245 non-null    object 
 7   FFMC         245 non-null    object 
 8   DMC          245 non-null    object 
 9   DC           245 non-null    object 
 10  ISI          245 non-null    object 
 11  BUI          245 non-null    object 
 12  FWI          245 non-null    object 
 13  Classes      244 non-null    object 
 14   Region      123 non-null    float64
 15   Reigion     125 non-null    float64
dtypes: float64(2), object(14)
memory usage: 31.0+ KB


In [None]:
df['Region'] = df['Region'].astype(int)

KeyError: "None of [Index(['Region'], dtype='object')] are in the [columns]"