## Forest fires in the northeast region of Portugal
Data set: https://archive.ics.uci.edu/ml/datasets/forest+fires

Zadaniem jest klasyfikacja względem określonych danych np. temperatury lub np. obszaru.

Ściągnąć dane<br>
Wczytać dane<br>
Przeskalować dane, również wymienić dane miesiące i dni tygodnia na dane liczbowe<br>
Podzielić dane na dane testujące i trenujące<br>
Wybrać omawiane modele<br>
Wytrenować modele<br>
Porównać modele

In [1]:
import matplotlib as plt
import numpy as np 
import pandas as pd 

## Reading the data
1. X - x-axis spatial coordinate within the Montesinho park map: 1 to 9<br>
2. Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9<br>
3. month - month of the year: 'jan' to 'dec'<br>
4. day - day of the week: 'mon' to 'sun'<br>
5. FFMC - FFMC index from the FWI system: 18.7 to 96.20<br>
6. DMC - DMC index from the FWI system: 1.1 to 291.3<br>
7. DC - DC index from the FWI system: 7.9 to 860.6<br>
8. ISI - ISI index from the FWI system: 0.0 to 56.10<br>
9. temp - temperature in Celsius degrees: 2.2 to 33.30<br>
10. RH - relative humidity in %: 15.0 to 100<br>
11. wind - wind speed in km/h: 0.40 to 9.40<br>
12. rain - outside rain in mm/m2 : 0.0 to 6.4<br>
13. area - the burned area of the forest (in ha): 0.00 to 1090.84<br>

In [2]:
path = 'forestfires.csv'
names = ['x-axis', 'y-axis', 'month', 'day', 'FFMC', 'DMC', 'DC', 'ISI', 'temp', 'humidity', 'wind', 'rain', 'area']
dataset = pd.read_csv(path, names=names, skiprows=[0])
dataset.head()


Unnamed: 0,x-axis,y-axis,month,day,FFMC,DMC,DC,ISI,temp,humidity,wind,rain,area
0,7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0.0
1,7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0.0
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0.0
3,8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,0.0
4,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0.0


## Changing the names of the months & days to their numeric values

In [3]:
months = {'jan' : 1, 'feb' : 2, 'mar' : 3, 'apr' : 4, 'may' : 5, 'jun' : 6, 'jul' : 7, 'aug' : 8, 'sep' : 9, 'oct' : 10, 
           'nov' : 11, 'dec' : 12}
days = {'mon' : 1, 'tue' : 2, 'wed' : 3, 'thu' : 4, 'fri' : 5, 'sat' : 6, 'sun' : 7}
dataset['month'] = dataset.month.astype(object)
dataset['month'] = dataset['month'].replace(months)
dataset['day'] = dataset.day.astype(object)
dataset['day'] = dataset['day'].replace(days)
dataset.head()


Unnamed: 0,x-axis,y-axis,month,day,FFMC,DMC,DC,ISI,temp,humidity,wind,rain,area
0,7,5,3,5,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0.0
1,7,4,10,2,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0.0
2,7,4,10,6,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0.0
3,8,6,3,5,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,0.0
4,8,6,3,7,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0.0


## Preprocessing
x = attributes, y = labels

In [4]:
cols = list(dataset.columns.values)
cols.pop(cols.index('humidity'))
dataset = dataset[cols+['humidity']] # moving humidity column to the end for easier slicling
x = dataset.iloc[:, :12].values
y = dataset.iloc[:, -1].values


## Dividing a Dataset into Training and Validation Samples

In [5]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.20, random_state = 82)

## Feature scaling

In [6]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

## Naive Bayes classifier

In [7]:
from sklearn.naive_bayes import GaussianNB
nvclassifier = GaussianNB()
nvclassifier.fit(x_train, y_train)

GaussianNB()

## Prediction

In [8]:
y_pred = nvclassifier.predict(x_test)
print(y_pred)


[82 43 55 46 19 52 34 66 71 61 22 28 82 79 53 51 52 74 41 38 70 26 51 47
 41 52 28 38 19 56 74 21 41 38 39 26 82 30 21 43 47 74 38 38 47 51 39 90
 21 48 47 58 64 38 36 38 25 32 58 34 48 58 53 45 48 41 19 72 48 52 22 51
 38 51 48 77 52 48 48 48 24 66 65 48 38 55 21 28 56 51 51 39 34 52 74 32
 48 43 51 45 47 38 48 64]


In [9]:
y_compare = np.vstack((y_test,y_pred)).T
y_compare[:5,:]

array([[64, 82],
       [44, 43],
       [35, 55],
       [46, 46],
       [56, 19]], dtype=int64)

## Algorithm Evaluation

In [10]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[0 0 0 ... 0 0 0]
 [0 1 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 1 0 0]]


## Matching the recognition accuracy

In [12]:
a = cm.shape
corr_pred = 0
false_pred = 0

for row in range(a[0]):
    for col in range(a[1]):
        if row == col:
            corr_pred +=cm[row, col]
        else:
            false_pred += cm[row, col]
print(f'Correct predictions: {corr_pred}')
print(f'False predictions {false_pred}')
print (f'\nAccuracy of the Naive Bayes Clasification is: {corr_pred/(cm.sum())}')

Correct predictions: 10
False predictions 94

Accuracy of the Naive Bayes Clasification is: 0.09615384615384616
