## About Dataset features

1. Date : (DD/MM/YYYY) Day, month ('june' to 'september'), year (2012)

### Weather data observations

2. Temp : temperature noon (temperature max) in Celsius degrees: 22 to 42
3. RH : Relative Humidity in %: 21 to 90
4. Ws :Wind speed in km/h: 6 to 29
5. Rain: total day in mm: 0 to 16.8

### FWI Components

6. Fine Fuel Moisture Code (FFMC) index from the FWI system: 28.6 to 92.5
7. Duff Moisture Code (DMC) index from the FWI system: 1.1 to 65.9
8. Drought Code (DC) index from the FWI system: 7 to 220.4
9. Initial Spread Index (ISI) index from the FWI system: 0 to 18.5
10. Buildup Index (BUI) index from the FWI system: 1.1 to 68
11. Fire Weather Index (FWI) Index: 0 to 31.1
12. Classes: two classes, namely fire and not fire

### ✨ The fullform of FWI is Forest Weather Index. The FWI is a system which is used worldwide to estimate the fire danger ✨.

In [1]:
import numpy as np
import pandas as pd

In [2]:
data = pd.read_csv("https://raw.githubusercontent.com/Nitish-Satya-Sai/data-is-crucial/main/Algerian_forest_fires.csv")
data

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
0,1,6,2012,29,57,18,0.0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
1,2,6,2012,29,61,13,1.3,64.4,4.1,7.6,1.0,3.9,0.4,not fire
2,3,6,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
3,4,6,2012,25,89,13,2.5,28.6,1.3,6.9,0.0,1.7,0,not fire
4,5,6,2012,27,77,16,0.0,64.8,3.0,14.2,1.2,3.9,0.5,not fire
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
239,26,9,2012,30,65,14,0.0,85.4,16.0,44.5,4.5,16.9,6.5,fire
240,27,9,2012,28,87,15,4.4,41.1,6.5,8,0.1,6.2,0,not fire
241,28,9,2012,27,87,29,0.5,45.9,3.5,7.9,0.4,3.4,0.2,not fire
242,29,9,2012,24,54,18,0.1,79.7,4.3,15.2,1.7,5.1,0.7,not fire


### 👉 Lets see the dimensions of the data

In [3]:
data.shape

(244, 14)

### 👉 Lets see the information of the data

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   day          244 non-null    int64  
 1   month        244 non-null    int64  
 2   year         244 non-null    int64  
 3   Temperature  244 non-null    int64  
 4    RH          244 non-null    int64  
 5    Ws          244 non-null    int64  
 6   Rain         244 non-null    float64
 7   FFMC         244 non-null    float64
 8   DMC          244 non-null    float64
 9   DC           244 non-null    object 
 10  ISI          244 non-null    float64
 11  BUI          244 non-null    float64
 12  FWI          244 non-null    object 
 13  Classes      243 non-null    object 
dtypes: float64(5), int64(6), object(3)
memory usage: 26.8+ KB


### 👉Let's check for missing values

In [5]:
data.isnull().sum()

day            0
month          0
year           0
Temperature    0
 RH            0
 Ws            0
Rain           0
FFMC           0
DMC            0
DC             0
ISI            0
BUI            0
FWI            0
Classes        1
dtype: int64

### 👉 Let's see features of the data

In [6]:
data.columns

Index(['day', 'month', 'year', 'Temperature', ' RH', ' Ws', 'Rain ', 'FFMC',
       'DMC', 'DC', 'ISI', 'BUI', 'FWI', 'Classes  '],
      dtype='object')

### 👉 There are some spaces in the column names, which are improper and it may raise key errors while accessing the features from the dataset

In [7]:
data.columns = ['day', 'month', 'year', 'Temperature', 'RH', 'Ws', 'Rain', 'FFMC','DMC', 'DC', 'ISI', 'BUI', 'FWI', 'Classes']

In [8]:
data["FWI"].unique()

array(['0.5', '0.4', '0.1', '0', '2.5', '7.2', '7.1', '0.3', '0.9', '5.6',
       '0.2', '1.4', '2.2', '2.3', '3.8', '7.5', '8.4', '10.6', '15',
       '13.9', '3.9', '12.9', '1.7', '4.9', '6.8', '3.2', '8', '0.6',
       '3.4', '0.8', '3.6', '6', '10.9', '4', '8.8', '2.8', '2.1', '1.3',
       '7.3', '15.3', '11.3', '11.9', '10.7', '15.7', '6.1', '2.6', '9.9',
       '11.6', '12.1', '4.2', '10.2', '6.3', '14.6', '16.1', '17.2',
       '16.8', '18.4', '20.4', '22.3', '20.9', '20.3', '13.7', '13.2',
       '19.9', '30.2', '5.9', '7.7', '9.7', '8.3', '0.7', '4.1', '1',
       '3.1', '1.9', '10', '16.7', '1.2', '5.3', '6.7', '9.5', '12',
       '6.4', '5.2', '3', '9.6', '4.7', 'fire   ', '14.1', '9.1', '13',
       '17.3', '30', '25.4', '16.3', '9', '14.5', '13.5', '19.5', '12.6',
       '12.7', '21.6', '18.8', '10.5', '5.5', '14.8', '24', '26.3',
       '12.2', '18.1', '24.5', '26.9', '31.1', '30.3', '26.1', '16',
       '19.4', '2.7', '3.7', '10.3', '5.7', '9.8', '19.3', '17.5', '15.4',

In [9]:
data["Classes"].value_counts()

fire             131
not fire         101
fire               4
fire               2
not fire           2
not fire           1
not fire           1
not fire           1
Name: Classes, dtype: int64

In [10]:
data["Classes"].unique()

array(['not fire   ', 'fire   ', 'fire', 'fire ', 'not fire', 'not fire ',
       'not fire     ', nan, 'not fire    '], dtype=object)

### 👉ohh its messy 😱

In [11]:
data["Classes"].replace(to_replace={'not fire   ':"not_fire",'fire   ':"fire",'fire':"fire",'fire ':"fire",'not fire':"not_fire",'not fire ':"not_fire",
                                     'not fire     ':"not_fire",'not fire    ':"not_fire",
                                     },inplace=True) 

In [12]:
data["Classes"].unique()

array(['not_fire', 'fire', nan], dtype=object)

In [13]:
data[data["Classes"].isnull()]

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
165,14,7,2012,37,37,18,0.2,88.9,12.9,14.6 9,12.5,10.4,fire,


### 👉 if we see this record, I beleive it is a human error, where classes value suffled to FWI feature. Now I just want to change the Classes value from NaN to fire and place NaN in the FWI column where we have fire value in FWI.

In [14]:
data_Classes_imputed = data["Classes"].fillna(value="fire")

In [15]:
data_copy = data.copy()
data_copy["Classes"]=data_Classes_imputed

In [16]:
data_copy["Classes"].unique()

array(['not_fire', 'fire'], dtype=object)

### 👉 Now, we need to work on FWI Column.

In [17]:
data_copy[data_copy["FWI"]=='fire   ']

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
165,14,7,2012,37,37,18,0.2,88.9,12.9,14.6 9,12.5,10.4,fire,fire


In [18]:
data_copy["FWI"] = data_copy["FWI"].replace(to_replace={"fire   ":np.NaN})

In [19]:
data_copy["FWI"].isnull().sum()

1

In [20]:
data_copy[data_copy["FWI"].isnull()]

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
165,14,7,2012,37,37,18,0.2,88.9,12.9,14.6 9,12.5,10.4,,fire


In [21]:
data_copy["DC"] = data_copy["DC"].replace(to_replace={"14.6 9":14.6})

In [22]:
data_copy.select_dtypes(include=np.number)

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,ISI,BUI
0,1,6,2012,29,57,18,0.0,65.7,3.4,1.3,3.4
1,2,6,2012,29,61,13,1.3,64.4,4.1,1.0,3.9
2,3,6,2012,26,82,22,13.1,47.1,2.5,0.3,2.7
3,4,6,2012,25,89,13,2.5,28.6,1.3,0.0,1.7
4,5,6,2012,27,77,16,0.0,64.8,3.0,1.2,3.9
...,...,...,...,...,...,...,...,...,...,...,...
239,26,9,2012,30,65,14,0.0,85.4,16.0,4.5,16.9
240,27,9,2012,28,87,15,4.4,41.1,6.5,0.1,6.2
241,28,9,2012,27,87,29,0.5,45.9,3.5,0.4,3.4
242,29,9,2012,24,54,18,0.1,79.7,4.3,1.7,5.1


### 👉 Now, I will impute this value with the help of KNN Imputer

In [23]:
from sklearn.impute import KNNImputer
my_imputer = KNNImputer(n_neighbors=5)
data_copy.iloc[:,0:13] = my_imputer.fit_transform(data_copy.iloc[:,0:13])

In [24]:
data_copy.isnull().sum()

day            0
month          0
year           0
Temperature    0
RH             0
Ws             0
Rain           0
FFMC           0
DMC            0
DC             0
ISI            0
BUI            0
FWI            0
Classes        0
dtype: int64

In [25]:
data_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   day          244 non-null    float64
 1   month        244 non-null    float64
 2   year         244 non-null    float64
 3   Temperature  244 non-null    float64
 4   RH           244 non-null    float64
 5   Ws           244 non-null    float64
 6   Rain         244 non-null    float64
 7   FFMC         244 non-null    float64
 8   DMC          244 non-null    float64
 9   DC           244 non-null    float64
 10  ISI          244 non-null    float64
 11  BUI          244 non-null    float64
 12  FWI          244 non-null    float64
 13  Classes      244 non-null    object 
dtypes: float64(13), object(1)
memory usage: 26.8+ KB


In [26]:
data_copy

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
0,1.0,6.0,2012.0,29.0,57.0,18.0,0.0,65.7,3.4,7.6,1.3,3.4,0.5,not_fire
1,2.0,6.0,2012.0,29.0,61.0,13.0,1.3,64.4,4.1,7.6,1.0,3.9,0.4,not_fire
2,3.0,6.0,2012.0,26.0,82.0,22.0,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not_fire
3,4.0,6.0,2012.0,25.0,89.0,13.0,2.5,28.6,1.3,6.9,0.0,1.7,0.0,not_fire
4,5.0,6.0,2012.0,27.0,77.0,16.0,0.0,64.8,3.0,14.2,1.2,3.9,0.5,not_fire
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
239,26.0,9.0,2012.0,30.0,65.0,14.0,0.0,85.4,16.0,44.5,4.5,16.9,6.5,fire
240,27.0,9.0,2012.0,28.0,87.0,15.0,4.4,41.1,6.5,8.0,0.1,6.2,0.0,not_fire
241,28.0,9.0,2012.0,27.0,87.0,29.0,0.5,45.9,3.5,7.9,0.4,3.4,0.2,not_fire
242,29.0,9.0,2012.0,24.0,54.0,18.0,0.1,79.7,4.3,15.2,1.7,5.1,0.7,not_fire


In [27]:
data_copy.iloc[165,:]

day              14.0
month             7.0
year           2012.0
Temperature      37.0
RH               37.0
Ws               18.0
Rain              0.2
FFMC             88.9
DMC              12.9
DC               14.6
ISI              12.5
BUI              10.4
FWI               8.3
Classes          fire
Name: 165, dtype: object

### 👉 Finally, the FWI value is imputed

### 👉 Now, I have cleaned this dataset completely.

### 👉 I just want to store a copy of it.

In [28]:
data_copy.to_csv("forest_fires.csv")

# ✨✨ To be Continued... 😁😎