## Exploratory Data Analysis EDA for Algerian Forest Fire Prediction

#### Task Given by my instructor  Krish Naik:

1. Import the dataset.
https://archive.ics.uci.edu/ml/datasets/Algerian+Forest+Fires+Dataset++

https://archive.ics.uci.edu/ml/machine-learning-databases/00547/

2. Do proper EDA(analysis) of your dataset and create a report from your dataset

3. Then perform necessary preprocessing steps 
4.  Then create a classification and regression model for a given dataset.
5.  For regression use linear regression, ridge regression and lasso
regression, SVR, Decision tree regressor and random forest
regressor along with cross validation and hyperparameter tuning.Try
to showcase the MSE value for each model and try to find out the
best possible model based on the R2 value.
6. In classification models try to use logistic regression, SVM, decision
tree, naive bayes and random forest along with hyperparametertuning and cross validation and print your classification report and
showcase the best possible model based on that report.
* API Testing:
1. Now create a flask API for testing your model(via postman) or you
can create an HTML page(optional)
2. While creating the API you have to perform single value prediction
as well as bulk prediction.
3. Load your data via mongo db or mysql(for bulk prediction)
4. Try to perform api testing in a modular way (modular coding with
classes and objects)
5. Do proper logging for your application.
6. Try to handle exceptions at each and every step.


**Note:**
* The Fire Weather Index (FWI) is a meteorologically based index used worldwide to estimate fire danger.
It consists of different components that account for the effects of fuel moisture and wind on fire behaviour and spread
For our regression modelling , we will predict fire base on FWI

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
data = pd.read_csv("Algerian_forest_fires_dataset_UPDATE.csv" , header=1)

In [3]:
data

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
0,01,06,2012,29,57,18,0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
1,02,06,2012,29,61,13,1.3,64.4,4.1,7.6,1,3.9,0.4,not fire
2,03,06,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
3,04,06,2012,25,89,13,2.5,28.6,1.3,6.9,0,1.7,0,not fire
4,05,06,2012,27,77,16,0,64.8,3,14.2,1.2,3.9,0.5,not fire
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
241,26,09,2012,30,65,14,0,85.4,16,44.5,4.5,16.9,6.5,fire
242,27,09,2012,28,87,15,4.4,41.1,6.5,8,0.1,6.2,0,not fire
243,28,09,2012,27,87,29,0.5,45.9,3.5,7.9,0.4,3.4,0.2,not fire
244,29,09,2012,24,54,18,0.1,79.7,4.3,15.2,1.7,5.1,0.7,not fire


### Inserting and Loading  data via mysql (for bulk prediction)

***MSQL DATABASE CONNECTOR***

In [4]:
!pip install mysql-connector-python   



#### Create a database for our dataframe

In [5]:
connex = connection.connect(host="localhost",user="root", passwd="mysql",use_pure=True)

NameError: name 'connection' is not defined

In [None]:
#creating a database
curs = connex.cursor()
curs.execute("create database Algerian_forest_fires_data")

### Creating a table AFFDatasetDetails

In [None]:
import mysql.connector as connection

try:
    mydb = connection.connect(host="localhost", database = 'Algerian_forest_fires_data',user="root", passwd="mysql",use_pure=True)
    # check if the connection is established
    print(mydb.is_connected())

    query = "CREATE TABLE AFFDatasetDetails (id INT(10) AUTO_INCREMENT PRIMARY KEY,month INT(4)," \
            "Temperature INT(10), RH INT(5),Ws INT(10), Rain float(5),FFMC float(5),DMC float(5),DC float(5),ISI float(5),"\
    " BUI float(5),FWI float(5) ,Classes INT(2), Region INT(2) )"

    cursor = mydb.cursor() #create a cursor to execute queries
    cursor.execute(query)
    print("Table Created!!")
    mydb.close()
except Exception as e:
    mydb.close()
    print(str(e))

In [None]:
# importing sql engine
from sqlalchemy import create_engine

In [None]:
# create sqlalchemy engine
engine = create_engine('sqlite://', echo=False)

In [None]:
# Insert whole DataFrame into MySQL
data.to_sql('AFFDatasetDetails', con = engine, if_exists = 'append')

### Retrieve Records from a Database

In [17]:
result = engine.execute("SELECT * FROM AFFDatasetDetails").fetchall()

In [44]:
#Creating a Dataframe
data_retrieved = pd.DataFrame(result, columns=['id','day','month','year','Temperature','RH','Ws','Rain','FFMC','DMC','DC','ISI','BUI','FWI','Classes'])

In [66]:
data_retrieved

Unnamed: 0,id,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
0,0,01,06,2012,29,57,18,0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
1,1,02,06,2012,29,61,13,1.3,64.4,4.1,7.6,1,3.9,0.4,not fire
2,2,03,06,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
3,3,04,06,2012,25,89,13,2.5,28.6,1.3,6.9,0,1.7,0,not fire
4,4,05,06,2012,27,77,16,0,64.8,3,14.2,1.2,3.9,0.5,not fire
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
241,241,26,09,2012,30,65,14,0,85.4,16,44.5,4.5,16.9,6.5,fire
242,242,27,09,2012,28,87,15,4.4,41.1,6.5,8,0.1,6.2,0,not fire
243,243,28,09,2012,27,87,29,0.5,45.9,3.5,7.9,0.4,3.4,0.2,not fire
244,244,29,09,2012,24,54,18,0.1,79.7,4.3,15.2,1.7,5.1,0.7,not fire


### DATA CLEANING

In [98]:
data_retrieved

Unnamed: 0,id,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
0,0,01,06,2012,29,57,18,0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
1,1,02,06,2012,29,61,13,1.3,64.4,4.1,7.6,1,3.9,0.4,not fire
2,2,03,06,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
3,3,04,06,2012,25,89,13,2.5,28.6,1.3,6.9,0,1.7,0,not fire
4,4,05,06,2012,27,77,16,0,64.8,3,14.2,1.2,3.9,0.5,not fire
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
241,241,26,09,2012,30,65,14,0,85.4,16,44.5,4.5,16.9,6.5,fire
242,242,27,09,2012,28,87,15,4.4,41.1,6.5,8,0.1,6.2,0,not fire
243,243,28,09,2012,27,87,29,0.5,45.9,3.5,7.9,0.4,3.4,0.2,not fire
244,244,29,09,2012,24,54,18,0.1,79.7,4.3,15.2,1.7,5.1,0.7,not fire


In [141]:
data = data_retrieved.copy()
data.head(5)

Unnamed: 0,id,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
0,0,1,6,2012,29,57,18,0.0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
1,1,2,6,2012,29,61,13,1.3,64.4,4.1,7.6,1.0,3.9,0.4,not fire
2,2,3,6,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
3,3,4,6,2012,25,89,13,2.5,28.6,1.3,6.9,0.0,1.7,0.0,not fire
4,4,5,6,2012,27,77,16,0.0,64.8,3.0,14.2,1.2,3.9,0.5,not fire


In [142]:
data[data.isnull().any(axis=1)]

Unnamed: 0,id,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
122,122,Sidi-Bel Abbes Region Dataset,,,,,,,,,,,,,
167,167,14,7.0,2012.0,37.0,37.0,18.0,0.2,88.9,12.9,14.6 9,12.5,10.4,fire,


In [143]:
data[data.isnull().any(axis=1)]

Unnamed: 0,id,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
122,122,Sidi-Bel Abbes Region Dataset,,,,,,,,,,,,,
167,167,14,7.0,2012.0,37.0,37.0,18.0,0.2,88.9,12.9,14.6 9,12.5,10.4,fire,


In [144]:
data = data.drop(index=[122], axis = 0)

Note : Data Set Information:
The dataset includes 244 instances that regroup a data of two regions of Algeria,namely the Bejaia region located in the northeast of Algeria and the Sidi Bel-abbes region located in the northwest of Algeria.

122 instances for each region.

The period from June 2012 to September 2012. The dataset includes 11 attribues and 1 output attribue (class) The 244 instances have been classified into fire for(138 classes) and not fire (106 classes) classes.

In [145]:
data = data.drop(['id','day','month','year'], axis=1 )

In [146]:
data

Unnamed: 0,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
0,29,57,18,0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
1,29,61,13,1.3,64.4,4.1,7.6,1,3.9,0.4,not fire
2,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
3,25,89,13,2.5,28.6,1.3,6.9,0,1.7,0,not fire
4,27,77,16,0,64.8,3,14.2,1.2,3.9,0.5,not fire
...,...,...,...,...,...,...,...,...,...,...,...
241,30,65,14,0,85.4,16,44.5,4.5,16.9,6.5,fire
242,28,87,15,4.4,41.1,6.5,8,0.1,6.2,0,not fire
243,27,87,29,0.5,45.9,3.5,7.9,0.4,3.4,0.2,not fire
244,24,54,18,0.1,79.7,4.3,15.2,1.7,5.1,0.7,not fire


In [147]:
data.loc[:122, 'Region'] = "Bejaia" 

In [148]:
data.loc[122: , "Region"] = " Sidi Bel-abbes "

In [149]:
data

Unnamed: 0,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes,Region
0,29,57,18,0,65.7,3.4,7.6,1.3,3.4,0.5,not fire,Bejaia
1,29,61,13,1.3,64.4,4.1,7.6,1,3.9,0.4,not fire,Bejaia
2,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire,Bejaia
3,25,89,13,2.5,28.6,1.3,6.9,0,1.7,0,not fire,Bejaia
4,27,77,16,0,64.8,3,14.2,1.2,3.9,0.5,not fire,Bejaia
...,...,...,...,...,...,...,...,...,...,...,...,...
241,30,65,14,0,85.4,16,44.5,4.5,16.9,6.5,fire,Sidi Bel-abbes
242,28,87,15,4.4,41.1,6.5,8,0.1,6.2,0,not fire,Sidi Bel-abbes
243,27,87,29,0.5,45.9,3.5,7.9,0.4,3.4,0.2,not fire,Sidi Bel-abbes
244,24,54,18,0.1,79.7,4.3,15.2,1.7,5.1,0.7,not fire,Sidi Bel-abbes


In [150]:
data.columns

Index(['Temperature', 'RH', 'Ws', 'Rain', 'FFMC', 'DMC', 'DC', 'ISI', 'BUI',
       'FWI', 'Classes', 'Region'],
      dtype='object')

In [151]:
data.Classes.value_counts()

fire             131
not fire         101
fire               4
fire               2
not fire           2
not fire           1
not fire           1
not fire           1
Classes            1
Name: Classes, dtype: int64

In [152]:
data['Classes']=data.Classes.str.strip()

In [153]:
data.Classes.value_counts()

fire        137
not fire    106
Classes       1
Name: Classes, dtype: int64

In [154]:
# Remove null or na values rows
data =data.dropna().reset_index(drop=True) 
data.isnull().sum()

Temperature    0
RH             0
Ws             0
Rain           0
FFMC           0
DMC            0
DC             0
ISI            0
BUI            0
FWI            0
Classes        0
Region         0
dtype: int64

#### Encode region and classes

In [155]:
data['Classes']= np.where(data['Classes']=='fire',1,0) # 1 if Fire otherwise 0

In [156]:
data['Region'] = np.where(data['Region']=="Bejaia",1,2)  # 1 if Bejaia region otherwise 2

In [157]:
data.head()

Unnamed: 0,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes,Region
0,29,57,18,0.0,65.7,3.4,7.6,1.3,3.4,0.5,0,1
1,29,61,13,1.3,64.4,4.1,7.6,1.0,3.9,0.4,0,1
2,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,0,1
3,25,89,13,2.5,28.6,1.3,6.9,0.0,1.7,0.0,0,1
4,27,77,16,0.0,64.8,3.0,14.2,1.2,3.9,0.5,0,1


In [158]:
data.Classes.unique()

array([0, 1])

In [159]:
data.Region.unique()

array([1, 2])

### changing dtypes


In [160]:
data

Unnamed: 0,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes,Region
0,29,57,18,0,65.7,3.4,7.6,1.3,3.4,0.5,0,1
1,29,61,13,1.3,64.4,4.1,7.6,1,3.9,0.4,0,1
2,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,0,1
3,25,89,13,2.5,28.6,1.3,6.9,0,1.7,0,0,1
4,27,77,16,0,64.8,3,14.2,1.2,3.9,0.5,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...
239,30,65,14,0,85.4,16,44.5,4.5,16.9,6.5,1,2
240,28,87,15,4.4,41.1,6.5,8,0.1,6.2,0,0,2
241,27,87,29,0.5,45.9,3.5,7.9,0.4,3.4,0.2,0,2
242,24,54,18,0.1,79.7,4.3,15.2,1.7,5.1,0.7,0,2


In [161]:
data.columns = data.columns.str.strip()

In [162]:
data.dtypes

Temperature    object
RH             object
Ws             object
Rain           object
FFMC           object
DMC            object
DC             object
ISI            object
BUI            object
FWI            object
Classes         int32
Region          int32
dtype: object

In [167]:
data.columns

Index(['Temperature', 'RH', 'Ws', 'Rain', 'FFMC', 'DMC', 'DC', 'ISI', 'BUI',
       'FWI', 'Classes', 'Region'],
      dtype='object')

In [168]:
data.columns=data.columns.str.strip(to_strip=None) # removing space in columns's names

In [171]:
data = data.astype({"Temperature": int, "RH": int,'Ws':int,'Rain':float,'FFMC':float,'DMC':float ,'DC':float,'ISI': float,'BUI': float,'FWI':float})
print(data.dtypes)

ValueError: invalid literal for int() with base 10: 'Temperature'