<h3> Rain Prediction </h3>

We are going to use classification algorithms to create a model based on our training data and evaluate our testing data using evaluation metrics.<br>
The dataset contains observations of weather metrics for each day from 2008 to 2017. 

In this notebook, we will  use the following algorithms :
1.  Linear Regression
2.  KNN
3.  Decision Trees
4.  Logistic Regression
5.  SVM

We will evaluate our models using:
1.  Accuracy Score
2.  Jaccard Index
3.  F1-Score
4.  LogLoss
5.  Mean Absolute Error
6.  Mean Squared Error
7.  R2-Score

Finally, we will use your models to generate the report displaying the accuracy scores.


<h3>Import the required libraries</h3>


In [1]:
# Surpress warnings:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

In [15]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import svm

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

import sklearn.metrics as metrics
from sklearn.metrics import r2_score
from sklearn.metrics import jaccard_score
from sklearn.metrics import f1_score
from sklearn.metrics import log_loss
from sklearn.metrics import confusion_matrix, accuracy_score

#### Importing the Dataset

In [3]:
df = pd.read_csv('Weather_Data.csv')
df.head()

Unnamed: 0,Date,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,WindDir3pm,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2/1/2008,19.5,22.4,15.6,6.2,0.0,W,41,S,SSW,...,92,84,1017.6,1017.4,8,8,20.7,20.9,Yes,Yes
1,2/2/2008,19.5,25.6,6.0,3.4,2.7,W,41,W,E,...,83,73,1017.9,1016.4,7,7,22.4,24.8,Yes,Yes
2,2/3/2008,21.6,24.5,6.6,2.4,0.1,W,41,ESE,ESE,...,88,86,1016.7,1015.6,7,8,23.5,23.0,Yes,Yes
3,2/4/2008,20.2,22.8,18.8,2.2,0.0,W,41,NNE,E,...,83,90,1014.2,1011.8,8,8,21.4,20.9,Yes,Yes
4,2/5/2008,19.7,25.7,77.4,4.8,0.0,W,41,NNE,W,...,88,74,1008.3,1004.8,8,8,22.5,25.5,Yes,Yes


<h3> Data Preprocessing</h3>

<b>Transforming Categorical Variables</b><br>
First, we need to convert categorical variables to binary variables. We will use pandas `get_dummies()` method for this.

In [4]:
df_sydney_processed = pd.get_dummies(data=df, columns=['RainToday', 'WindGustDir', 'WindDir9am', 'WindDir3pm'])

Next, we replace the values of the 'RainTomorrow' to a binary column.<br>
We do not use the `get_dummies` method because we would end up with two columns for 'RainTomorrow' and we do not want, since 'RainTomorrow' is our target.

In [5]:
df_sydney_processed.replace(['No', 'Yes'], [0,1], inplace=True)

<b>Training Data and Test Data</b><br>
Now, we set our 'features' or x values and our Y or target variable.

In [6]:
df_sydney_processed.drop('Date',axis=1,inplace=True)
df_sydney_processed = df_sydney_processed.astype(float)

features = df_sydney_processed.drop(columns='RainTomorrow', axis=1)
Y = df_sydney_processed['RainTomorrow']

### Linear Regression


In [72]:
# Splitting the data into train and test sets
x_train, x_test, y_train, y_test = train_test_split(features, Y, test_size = 0.2, random_state=10)

# Training and Predictions
LinearReg = LinearRegression()
LinearReg.fit(x_train, y_train)
predictions = LinearReg.predict(x_test)

# Evaluation Metrics
LinearRegression_MAE = np.mean(np.absolute(predictions - y_test))
LinearRegression_MSE = np.mean((predictions - y_test) ** 2)
LinearRegression_R2 = r2_score(y_test , predictions)

In [52]:
dict_LR = {'MAE': LinearRegression_MAE, 'MSE' : LinearRegression_MSE, 'R2': LinearRegression_R2}
Report = pd.DataFrame(dict_LR, index = ["Linear Regression"])
Report

Unnamed: 0,MAE,MSE,R2
Linear Regression,0.277863,0.277863,-0.384778


### KNN


In [27]:
# Training and Predictions
KNN = KNeighborsClassifier(n_neighbors = 4).fit(x_train,y_train)
predictions = KNN.predict(x_test)

# Evaluation Metrics
KNN_Accuracy_Score = accuracy_score(y_test, predictions)
KNN_JaccardIndex = jaccard_score(y_test, predictions)
KNN_F1_Score = f1_score(y_test, predictions)

### Decision Tree


In [60]:
# Training and Predictions
Tree = DecisionTreeClassifier(criterion="entropy", max_depth = 4)
Tree.fit(x_train, y_train)
predictions = Tree.predict(x_test)

# Evaluation Metrics
Tree_Accuracy_Score = accuracy_score(y_test, predictions)
Tree_JaccardIndex = jaccard_score(y_test, predictions)
Tree_F1_Score = f1_score(y_test, predictions)

### Logistic Regression


In [61]:
# Data Split
x_train, x_test, y_train, y_test = train_test_split(features, Y, test_size = 0.2, random_state=1)

# Training and predictions
LR = LogisticRegression(C=0.01, solver='liblinear').fit(x_train,y_train)
predictions = LR.predict(x_test)

# Evaluation Metrics
LR_Accuracy_Score = accuracy_score(y_test, predictions)
LR_JaccardIndex = jaccard_score(y_test, predictions)
LR_F1_Score = f1_score(y_test, predictions)
LR_Log_Loss = log_loss(y_test, predictions)

### SVM


In [70]:
#Training and Predictions
SVM = svm.SVC(kernel='rbf')
SVM.fit(x_train, y_train)
predictions = SVM.predict(x_test)

#Evaluation Metrics
SVM_Accuracy_Score = accuracy_score(y_test, predictions)
SVM_JaccardIndex = jaccard_score(y_test, predictions)
SVM_F1_Score = f1_score(y_test, predictions)


### Report


In [71]:
dict = {"Accuracy": [KNN_Accuracy_Score, Tree_Accuracy_Score, LR_Accuracy_Score, SVM_Accuracy_Score], 
        "Jaccard Index": [KNN_JaccardIndex, Tree_JaccardIndex, LR_JaccardIndex, SVM_JaccardIndex], 
        "F1-Score":[KNN_F1_Score, Tree_F1_Score, LR_F1_Score, SVM_F1_Score], 
        "LogLoss":["-", "-", LR_Log_Loss, "-"]}

Report = pd.DataFrame(dict, index=["K Nearest Neighbour", "Decision Tree", "Logistic Regression","Suport Vector Machine"])
Report

Unnamed: 0,Accuracy,Jaccard Index,F1-Score,LogLoss
K Nearest Neighbour,0.818321,0.425121,0.59661,-
Decision Tree,0.818321,0.480349,0.648968,-
Logistic Regression,0.827481,0.484018,0.652308,6.218218
Suport Vector Machine,0.719084,0.0,0.0,-
