Algerian Forest Fires Dataset Project EDA


UNDERSTANDING THE PROJECT

1. Import the dataset.
https://archive.ics.uci.edu/ml/datasets/Algerian+Forest+Fires+Dataset++# 
2. Do proper EDA(analysis) of your dataset and create a report 
3. Then perform necessary preprocessing steps 
4. Then create a classification and regression model for a given dataset 
5. For regression use linear regression, ridge and lasso regression 
6. In classification models try to use logistic regression, SVM, decisiontree, naive bayes and random forest along with hyperparameter tuning

API Testing:
  1. Create a flask API for testing your model(via postman) or you can create an HTML page
  2. When creating the API, we will have to perform single value prediction as well as bulk prediction.
  3. We will load our data via mongo db or mysql(for bulk prediction).
  4. We will perform api testing in a modular way
  5. We will create a logging function for our application.
  6. We will try to handle exceptions at each and every step.

DATA COLLECTION AND UNDERSTANDING

    We will be using Algerian forest fie dataset from UCI Machine Learning Repository. The dataset contains 244 instances of forest fire observations for two regions of Algeria: the Bejaia region and the Sidi Bel-Abbes region. The timeline of this dataset is from June 2012 to September 2012. In this project, we will focus on whether certain weather features can predict forest fires in these regions using few classification techniques and Fire Weather Index (FWI) Index using regression techniques.

EXPLORATORY DATA ANALYSIS

  Exploratory Data Analysis or EDA is used to take insights from the data. Data Scientists and Analysts try to find different patterns, relations, and anomalies in the data using some statistical graphs and other visualization techniques. We will perform the below steps as part of EDA:

  1. Missing or null values
  2. Numercial and Categorical Variables
  3. Distribution of Numerical Variables
  4. Outliers
  5. Realtionships between Independent and Dependent Features
  6. Correaltion between Independent and Dependent Features

Importing Libraries

In [22]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from warnings import filterwarnings
filterwarnings('ignore')

Storing the Dataset into MongoDB Database

  1. Importing Pymongo library to connect with MongoDB Atlas Cloud

In [23]:
from pymongo import MongoClient

In [24]:
# Establish a connection to a MongoDB Atlas Cluster with Secured Authentication using User Name and Password of the Database
client = MongoClient("mongodb+srv://subramanyachel:subramanya1478@algerian-forest-fire-db.qk9hooc.mongodb.net/?retryWrites=true&w=majority")
# Create Database and specify name of database
db = client.get_database('Algerian-forest-fire-DB')
# Create a collection
records = db.fire_records

In [25]:
# Create Dataframe and Read the dataset using Pandas
data = pd.read_csv('Algerian_forest_fires_dataset_UPDATE.csv', header=1)
data.head()

Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
0,1,6,2012,29,57,18,0.0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
1,2,6,2012,29,61,13,1.3,64.4,4.1,7.6,1.0,3.9,0.4,not fire
2,3,6,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
3,4,6,2012,25,89,13,2.5,28.6,1.3,6.9,0.0,1.7,0.0,not fire
4,5,6,2012,27,77,16,0.0,64.8,3.0,14.2,1.2,3.9,0.5,not fire


In [26]:
# Convert Dataframe into Dictionary as MongoDB stores data in records/documents
data = data.to_dict(orient = 'records')

In [27]:
# Insert records in the dataset into MongoDB collection "hotel_records"
db.fire_records.insert_many(data)
print("All the Data has been Exported to MongoDB Successfully")

All the Data has been Exported to MongoDB Successfully


 Loading Data from Mongo DB

In [28]:
#Load all records from MongoDB using find()
all_records = records.find()
print(all_records)

<pymongo.cursor.Cursor object at 0x0000021935FA90F0>


In [29]:
#Convert Cursor Object into list
list_cursor = list(all_records)

In [31]:
#Convert list into Dataframe
dataframe = pd.DataFrame(list_cursor)
dataframe.drop('_id', axis=1, inplace=True)
dataframe


Unnamed: 0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
0,01,06,2012,29,57,18,0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
1,02,06,2012,29,61,13,1.3,64.4,4.1,7.6,1,3.9,0.4,not fire
2,03,06,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
3,04,06,2012,25,89,13,2.5,28.6,1.3,6.9,0,1.7,0,not fire
4,05,06,2012,27,77,16,0,64.8,3,14.2,1.2,3.9,0.5,not fire
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
487,26,09,2012,30,65,14,0,85.4,16,44.5,4.5,16.9,6.5,fire
488,27,09,2012,28,87,15,4.4,41.1,6.5,8,0.1,6.2,0,not fire
489,28,09,2012,27,87,29,0.5,45.9,3.5,7.9,0.4,3.4,0.2,not fire
490,29,09,2012,24,54,18,0.1,79.7,4.3,15.2,1.7,5.1,0.7,not fire
