# Wildfire Classification

## Aim 
- Practice implementation of **supervised learning** algorithms from Scikit Learn. 

- Datasets used: 'wildfires_training.txt' and 'wildfires_test.txt'

- Goal of algorithm is the **classification** prediction of a wildfire (two classes: yes or no). 

- The **attributes** of each observation go as follows : fire, year, temp, humidity, rainfall, drought_code, buildup_index, day, month, wind_speed


### Contents

1. Pre-processing steps. This includes reformatting the datasets from text to csv files. The csv files are also split according to the features and labels of the observations.

2. Decision tree Classifier

3. Support Vector Machine Classifier


In [45]:
# Load libraries
import csv  
import pandas as pd
from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier
from sklearn.model_selection import train_test_split # Import train_test_split function
from sklearn import metrics #Import scikit-learn metrics module for accuracy calculation
import sklearn

### Pre-processing

In [46]:
#Open the text file and read the lines
with open('wildfires_training.txt', 'r') as textfile:
    lines = textfile.readlines()
    
#Open the text file and read the lines
with open('wildfires_test.txt', 'r') as textfile:
    lines2 = textfile.readlines()
    
#Create a list containing the data in the text file
data1=[]
for line in lines:
    data1.append(tuple(line.split()))
    
data2=[]
for line in lines2:
    data2.append(tuple(line.split()))

#Convert this data to a csv file format
#Specify the column names of the csv
header = ['fire','year','temp','humidity','rainfall','drought_code','buildup_index','day','month','wind_speed']

#Open the new csv file in write mode and write the contents of the list to the csv
with open('wildfires_training.csv', 'w', encoding='UTF8') as f:
    writer = csv.writer(f)

    # write the header
    writer.writerow(header)

    # write the data
    for line in data1:
        writer.writerow(line)
    
#Open the new csv file in write mode and write the contents of the list to the csv
with open('wildfires_test.csv', 'w', encoding='UTF8') as f:
    writer = csv.writer(f)

    # write the header
    writer.writerow(header)

    # write the data
    for line in data2:
        writer.writerow(line)

In [47]:
Training_data=pd.read_csv("wildfires_training.csv")
Test_data=pd.read_csv("wildfires_test.csv")

In [144]:
Training_data.head()

Unnamed: 0,fire,year,temp,humidity,rainfall,drought_code,buildup_index,day,month,wind_speed
0,no,2015,28,59,0.0,8.06,3.47,1,6,19
1,no,2010,30,61,1.3,8.17,4.03,2,6,13
2,no,2009,26,83,13.1,8.08,3.59,3,6,22
3,no,2017,25,87,2.5,7.18,2.42,4,6,15
4,no,2014,28,77,0.0,14.98,4.63,5,6,18


In [48]:
X_train = Training_data[(Training_data.columns)[1:]]
y_train = Training_data.fire
X_test = Test_data[(Test_data.columns)[1:]]
y_test = Test_data.fire

In [143]:
X_train.head()

Unnamed: 0,year,temp,humidity,rainfall,drought_code,buildup_index,day,month,wind_speed
0,2015,28,59,0.0,8.06,3.47,1,6,19
1,2010,30,61,1.3,8.17,4.03,2,6,13
2,2009,26,83,13.1,8.08,3.59,3,6,22
3,2017,25,87,2.5,7.18,2.42,4,6,15
4,2014,28,77,0.0,14.98,4.63,5,6,18


### Decision Tree Classifier

In [136]:
# Create Decision Tree classifer object
clf = DecisionTreeClassifier(criterion='gini',max_depth=4)

# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)
y_pred2 = clf.predict(X_train)

In [137]:
#print accuracy on test data
acc1=sklearn.metrics.accuracy_score(y_test, y_pred)*100
print("Training data accuracy",round(acc1,2),"%")
#print accuracy on training data
acc2=sklearn.metrics.accuracy_score(y_train, y_pred2)*100
print("Training data accuracy",round(acc2,2),"%")

Training data accuracy 88.0 %
Training data accuracy 95.45 %


In [138]:
grid=sklearn.metrics.precision_recall_fscore_support(y_test,y_pred)
precision,recall,f_score=grid[0][1],grid[1][1],grid[2][1]
print("precision:",round(precision*100,2),"%")
print("recall:",round(recall*100,2),"%")
print("f-score:",round(f_score*100,2),"%")

precision: 95.83 %
recall: 82.14 %
f-score: 88.46 %


In [139]:
#Visualizing the tree structure
from sklearn import tree
import graphviz 
dot_data = tree.export_graphviz(clf, out_file=None,feature_names=list(X_train.columns),class_names=(["yes","no"]),filled=True, rounded=True,special_characters=True) 
graph = graphviz.Source(dot_data) 
graph.render("tree") 

'tree.pdf'

### Naive Bayes Classifier

In [140]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred3 = gnb.fit(X_train, y_train).predict(X_test)
acc4=sklearn.metrics.accuracy_score(y_test, y_pred3)*100
print("Accuracy on test data is:",round(acc4,2),"%")

Accuracy on test data is: 86.0 %


In [141]:
y_pred4 = gnb.fit(X_train, y_train).predict(X_train)
acc5=sklearn.metrics.accuracy_score(y_train, y_pred4)*100
print("Accuracy on training data is:",round(acc5,2),"%")

Accuracy on training data is: 90.91 %


In [142]:
grid=sklearn.metrics.precision_recall_fscore_support(y_test,y_pred3)
precision,recall,f_score=grid[0][1],grid[1][1],grid[2][1]
print("precision:",round(precision*100,2),"%")
print("recall:",round(recall*100,2),"%")
print("f-score:",round(f_score*100,2),"%")

precision: 86.21 %
recall: 89.29 %
f-score: 87.72 %
