##Brazil fire data by month

### Bi-Weekly Report ###
**What we did:**

Finalized preprocessing & merged our data together into one .csv file ('weather_with_fires.csv')

**What we are doing in this notebook:**

We created a daily scale for month-specific fire danger to use as classifier in our predicition. It is separated into 4 tiers.


**What we will do:**

With the preprocessed data, we will apply SVM method to classify whether given data predicts 4 different classes we divided. Also, we will compare SVM to different methods to see if SVM yields highest accuracy.



In [2]:
from google.colab import drive
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tabulate import tabulate
import pickle

from sklearn.model_selection import train_test_split 
from sklearn.model_selection import cross_validate
from sklearn.model_selection import KFold
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization

drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


- load the .csv file we created in Data_Preprocessing_Weather_2.ipynb
- drop 'Unnamed' column & create 'Scale' column. Scale determines the fire-danger of a state on a certain day.
- translate Date to datetime to make date-specific calculations easier

In [3]:
data = pd.read_csv('/content/drive/Shareddrives/BNCS411_Final_Project/weather_with_fires.csv')
data = data.drop(columns=['Unnamed: 0'])
data = data.append(pd.DataFrame(columns=['Scale']))
data['Date'] = pd.to_datetime(data['Date'])
mask = (data['Date'] >= '2000-11-01') & (data['Date'] <= '2018-12-31')
data_train = data.loc[mask]

In [4]:
data_train.State.unique()
statelist = ['AC', 'AL', 'AM', 'AP', 'BA', 'CE', 'DF', 'ES', 'GO', 'MA', 'MG','MS', 'MT', 'PA', 'PB', 'PE', 'PI', 'PR', 'RJ', 'RN', 'RR', 'RS','SC', 'SE', 'SP', 'TO']

As each month varies alot in fire danger, the scale has to vary too. We chose a scale from 1 - 4 with:
1. being the the first 25% of days which are mostly 0 fire days.
2. includes the values for 50% of the data between the 25% & the 75% quantile. 
3. is determined by the next 10% of days so the upper and lower border of 75%-85%.
4. is used for all days more than that.

We iterate over each month & apply the scale for that month onto each row.

In [5]:
for h in statelist:
  stat = data.loc[data['State'] == h]
  for j in range(1,13):
      mon_mask = stat['Date'].map(lambda x: x.month) == j
      mon = stat[mon_mask]
      temp = np.array(mon.quantile([.25, .75, .85]))[:,0]
      print(temp)
      for i in mon.index:
          if data.loc[i, 'Fires'] <= temp[0]:
              data.loc[i, 'Scale'] = 1
          elif temp[0] < data.loc[i, 'Fires'] <= temp[1]:
              data.loc[i, 'Scale'] = 2
          elif temp[0] < data.loc[i, 'Fires'] <= temp[2]:
              data.loc[i, 'Scale'] = 3
          else:
              data.loc[i, 'Scale'] = 4

[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 1.]
[0.   1.   2.65]
[ 0. 10. 19.]
[  7.  83. 152.]
[ 13.   178.   295.25]
[ 0.  20.  42.8]
[0. 1. 2.]
[0. 0. 0.]
[ 0. 13. 19.]
[ 0.  9. 14.]
[0. 5. 7.]
[0. 1. 2.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 2. 4.]
[ 0.   7.  10.8]
[ 1. 12. 17.]
[ 1. 13. 19.]
[0. 3. 6.]
[0. 2. 4.]
[0. 1. 2.]
[0. 1. 1.]
[0. 1. 1.]
[0.   3.75 5.65]
[ 3. 29. 50.]
[ 43.  213.  284.6]
[ 39.  177.  264.6]
[  9.   77.  121.8]
[ 1.  25.  43.3]
[ 0.  7. 12.]
[0. 0. 1.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 1. 2.]
[ 1. 10. 14.]
[ 4.   35.25 50.55]
[ 4.  57.5 88. ]
[ 0. 17. 35.]
[ 2. 18. 26.]
[ 2.   16.   25.75]
[ 2. 18. 26.]
[ 2. 14. 19.]
[ 4.  17.  22.8]
[ 6. 25. 31.]
[11.  34.  45.8]
[ 23.   83.  118.8]
[ 71.   305.75 419.  ]
[ 47.  268.  377.6]
[ 3.   50.   90.15]
[ 1. 25. 39.]
[ 0.   9.  17.8]
[0. 1. 2.]
[0. 0. 1.]
[0. 0. 0.]
[0. 0. 1.]
[0. 1. 2.]
[0. 2. 4.]
[ 0.   7.  11.8]
[ 2.   16.   22.65]
[ 5. 46. 68.]
[  6.   90.  1

In [6]:
def get_state_nr (row, Statelist):
  for i in Statelist:
    if row['State'] == i:
      return Statelist.index(i) + 1

In [7]:
data['Scale']=data['Scale'].astype('int')
data['Month'] = data['Date'].map(lambda x: x.month)
data['StateNr'] = data.apply(lambda row: get_state_nr(row, statelist), axis=1)

In [8]:
data['FirPrev'] = 0
data_temp = data.sort_values(["StateNr", "Date"])
data_temp

for k in data_temp['StateNr'].unique():
    temp = data_temp[data_temp['StateNr']==k]
    for i in range(1, len(temp)):
        data.loc[temp.index[i], 'FirPrev'] = data.loc[temp.index[i-1], 'Fires']

In [9]:
mask = (data['Date'] >= '2000-11-01') & (data['Date'] <= '2018-12-31')
data_train = data.loc[mask]

In [10]:
knc = KNeighborsClassifier()
#cross_validate(knc, X_train, y_train, cv=5, n_jobs=-1,verbose=1)
X = data_train[['MaxTemp','MinTemp','RelHum', 'WindVel', 'Month', 'StateNr']]
y = data_train['Scale']
for train_index, test_index in KFold(n_splits=10, random_state=42, shuffle=True).split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X.loc[train_index], X.loc[test_index]
    y_train, y_test = y.loc[train_index], y.loc[test_index]
    y_train = pd.get_dummies(y_train)
    y_test = pd.get_dummies(y_test)
    knc.fit(X_train,y_train)
    print(knc.score(X_test, y_test))

pickle.dump(knc, open( "/content/drive/Shareddrives/BNCS411_Final_Project/Group12_knc.pkl", "wb" ))


#knc.fit(X_train, y_train)
#accuracy = knc.score(X_test, y_test) 
#print(accuracy)


TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [    11     12     22 ... 171867 171878 171879]
0.45296410495084066
TRAIN: [     1      2      3 ... 171884 171885 171886] TEST: [     0     20     24 ... 171849 171872 171874]
0.4504043283495259
TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [     4     23     39 ... 171856 171861 171881]
0.447960905230089
TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [    31     48     57 ... 171863 171876 171880]
0.45063703531328175
TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [     3     38     41 ... 171869 171875 171883]
0.4503461516085869
TRAIN: [     0      1      2 ... 171882 171883 171884] TEST: [     8     14     17 ... 171873 171885 171886]
0.44790272848915
TRAIN: [     0      2      3 ... 171883 171885 171886] TEST: [     1      6      7 ... 171841 171857 171884]
0.450113444644831
TRAIN: [     0      1      3 ... 171884 171885 171886] TEST: [     2      9     13 ... 171859 171864 17186

In [11]:
dtc = DecisionTreeClassifier()
X = data_train[['MaxTemp','MinTemp','RelHum', 'WindVel', 'Month', 'StateNr']]
y = data_train['Scale']
for train_index, test_index in KFold(n_splits=10, random_state=42, shuffle=True, ).split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X.loc[train_index], X.loc[test_index]
    y_train, y_test = y.loc[train_index], y.loc[test_index]
    y_train = pd.get_dummies(y_train)
    y_test = pd.get_dummies(y_test)
    dtc.fit(X_train,y_train)
    print(dtc.score(X_test, y_test))

pickle.dump(dtc, open( "/content/drive/Shareddrives/BNCS411_Final_Project/Group12_dtc.pkl", "wb" ))

#dtc = DecisionTreeClassifier().fit(X_train, y_train)
#dtc.score(X_test, y_test)
#dtc.feature_importances_

TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [    11     12     22 ... 171867 171878 171879]
0.46355227180173364
TRAIN: [     1      2      3 ... 171884 171885 171886] TEST: [     0     20     24 ... 171849 171872 171874]
0.46634475536680436
TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [     4     23     39 ... 171856 171861 171881]
0.4595962534178835
TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [    31     48     57 ... 171863 171876 171880]
0.4654139275117808
TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [     3     38     41 ... 171869 171875 171883]
0.46721740648088894
TRAIN: [     0      1      2 ... 171882 171883 171884] TEST: [     8     14     17 ... 171873 171885 171886]
0.46081796497760197
TRAIN: [     0      2      3 ... 171883 171885 171886] TEST: [     1      6      7 ... 171841 171857 171884]
0.4632032113560998
TRAIN: [     0      1      3 ... 171884 171885 171886] TEST: [     2      9     13 ... 171859 171864

In [12]:
rfc = RandomForestClassifier(max_depth=15, random_state=42)
X = data_train[['MaxTemp','MinTemp','RelHum', 'WindVel', 'Month', 'StateNr']]
y = data_train['Scale']
for train_index, test_index in KFold(n_splits=10, random_state=42, shuffle=True).split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X.loc[train_index], X.loc[test_index]
    y_train, y_test = y.loc[train_index], y.loc[test_index]
    y_train = pd.get_dummies(y_train)
    y_test = pd.get_dummies(y_test)
    rfc.fit(X_train,y_train)
    print(rfc.score(X_test, y_test))

pickle.dump(rfc, open( "/content/drive/Shareddrives/BNCS411_Final_Project/Group12_rfc.pkl", "wb" ))

#clf.fit(X_train, y_train)
#y_pred = clf.predict(X_test)

#print(accuracy_score(y_test, y_pred))

TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [    11     12     22 ... 171867 171878 171879]
0.3915294665192856
TRAIN: [     1      2      3 ... 171884 171885 171886] TEST: [     0     20     24 ... 171849 171872 171874]
0.39222758741055325
TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [     4     23     39 ... 171856 171861 171881]
0.38693350398510673
TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [    31     48     57 ... 171863 171876 171880]
0.39123858281459073
TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [     3     38     41 ... 171869 171875 171883]
0.3947291872709291
TRAIN: [     0      1      2 ... 171882 171883 171884] TEST: [     8     14     17 ... 171873 171885 171886]
0.38675897376228985
TRAIN: [     0      2      3 ... 171883 171885 171886] TEST: [     1      6      7 ... 171841 171857 171884]
0.3834428995287684
TRAIN: [     0      1      3 ... 171884 171885 171886] TEST: [     2      9     13 ... 171859 171864

In [13]:
def ANN1():
    model = Sequential()
    model.add(Dense(128, input_shape=(6,), activation='tanh'))
    model.add(BatchNormalization())
    model.add(layers.Activation(tf.nn.tanh))
    model.add(Dense(256, activation='tanh'))
    model.add(Dropout(0.25))
    model.add(BatchNormalization())
    model.add(layers.Activation(tf.nn.tanh))
    model.add(Dense(512,  activation='tanh'))
    model.add(Dropout(0.25))
    model.add(BatchNormalization())
    model.add(layers.Activation(tf.nn.tanh))
    model.add(Dense(128, activation='sigmoid'))
    model.add(Dense(4, activation='softmax'))
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [16]:
Mymodel = ANN1()
history = {'accuracy': [], 'loss': [], 'val_accuracy': [], 'val_loss': []}
X = data_train[['MaxTemp','MinTemp','RelHum', 'WindVel', 'Month', 'StateNr']]
y = data_train['Scale']
for train_index, test_index in KFold(n_splits=5, random_state=42, shuffle=True).split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X.loc[train_index], X.loc[test_index]
    y_train, y_test = y.loc[train_index], y.loc[test_index]
    y_train = pd.get_dummies(y_train)
    y_test = pd.get_dummies(y_test)
    temp = Mymodel.fit(X_train,y_train, batch_size = 128, epochs=30, validation_data=(X_test, y_test))
    for key, values in temp.history.items():
        history[key].extend(values)

Mymodel.save('/content/drive/Shareddrives/BNCS411_Final_Project/Group12_ANN.h5')
pd.DataFrame(history).to_csv('/content/drive/Shareddrives/BNCS411_Final_Project/Group12_ANN_history.csv')

TRAIN: [     1      2      3 ... 171884 171885 171886] TEST: [     0     11     12 ... 171874 171878 171879]
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
TRAIN: [     0      1      2 ... 171884 171885 171886] TEST: [     4     23     31 ... 171876 171880 171881]
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
TRAIN: [     0      1      2 ... 171881 171882 171884] TEST: [     3      8     

In [18]:
Mymodel.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_10 (Dense)             (None, 128)               896       
_________________________________________________________________
batch_normalization_6 (Batch (None, 128)               512       
_________________________________________________________________
activation_6 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 256)               33024     
_________________________________________________________________
dropout_4 (Dropout)          (None, 256)               0         
_________________________________________________________________
batch_normalization_7 (Batch (None, 256)               1024      
_________________________________________________________________
activation_7 (Activation)    (None, 256)              

In [None]:
testday2 = [[31.566667, 20.633333, 80.000000, 3.555555, 9, 4]]
Mymodel.predict(testday2)

In [None]:
check = data_train.loc[data_train['Month'] == 9]
check = check[check['State']=='AM']

In [None]:
history.to_

In [None]:
data.to_csv('/content/drive/Shareddrives/BNCS411_Final_Project/weather_with_fires_scales.csv')