<h1>Python Project For Rainfall Prediction</h1>
This Project was a part of my final evaluation of a Coursera Course.

This dataset contains observations of weather metrics for each day from 2008 to 2017. The **weatherAUS.csv** dataset includes the following fields:

| Field         | Description                                           | Unit            | Type   |
| ------------- | ----------------------------------------------------- | --------------- | ------ |
| Date          | Date of the Observation in YYYY-MM-DD                 | Date            | object |
| Location      | Location of the Observation                           | Location        | object |
| MinTemp       | Minimum temperature                                   | Celsius         | float  |
| MaxTemp       | Maximum temperature                                   | Celsius         | float  |
| Rainfall      | Amount of rainfall                                    | Millimeters     | float  |
| Evaporation   | Amount of evaporation                                 | Millimeters     | float  |
| Sunshine      | Amount of bright sunshine                             | hours           | float  |
| WindGustDir   | Direction of the strongest gust                       | Compass Points  | object |
| WindGustSpeed | Speed of the strongest gust                           | Kilometers/Hour | object |
| WindDir9am    | Wind direction averaged of 10 minutes prior to 9am    | Compass Points  | object |
| WindDir3pm    | Wind direction averaged of 10 minutes prior to 3pm    | Compass Points  | object |
| WindSpeed9am  | Wind speed averaged of 10 minutes prior to 9am        | Kilometers/Hour | float  |
| WindSpeed3pm  | Wind speed averaged of 10 minutes prior to 3pm        | Kilometers/Hour | float  |
| Humidity9am   | Humidity at 9am                                       | Percent         | float  |
| Humidity3pm   | Humidity at 3pm                                       | Percent         | float  |
| Pressure9am   | Atmospheric pressure reduced to mean sea level at 9am | Hectopascal     | float  |
| Pressure3pm   | Atmospheric pressure reduced to mean sea level at 3pm | Hectopascal     | float  |
| Cloud9am      | Fraction of the sky obscured by cloud at 9am          | Eights          | float  |
| Cloud3pm      | Fraction of the sky obscured by cloud at 3pm          | Eights          | float  |
| Temp9am       | Temperature at 9am                                    | Celsius         | float  |
| Temp3pm       | Temperature at 3pm                                    | Celsius         | float  |
| RainToday     | If there was rain today                               | Yes/No          | object |
| RISK_MM       | Amount of rain tomorrow                               | Millimeters     | float  |
| RainTomorrow  | If there is rain tomorrow                             | Yes/No          | float  |

Column definitions were gathered from [http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml](http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML0101ENSkillsNetwork20718538-2022-01-01)



## **Import the required libraries**


In [1]:
# Surpress warnings:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

In [2]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import jaccard_score
from sklearn.metrics import f1_score
from sklearn.metrics import log_loss
from sklearn.metrics import accuracy_score
import sklearn.metrics as metrics

### Importing the Dataset


In [3]:
df = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillUp/labs/ML-FinalAssignment/Weather_Data.csv')

df.head()

Unnamed: 0,Date,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,WindDir3pm,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2/1/2008,19.5,22.4,15.6,6.2,0.0,W,41,S,SSW,...,92,84,1017.6,1017.4,8,8,20.7,20.9,Yes,Yes
1,2/2/2008,19.5,25.6,6.0,3.4,2.7,W,41,W,E,...,83,73,1017.9,1016.4,7,7,22.4,24.8,Yes,Yes
2,2/3/2008,21.6,24.5,6.6,2.4,0.1,W,41,ESE,ESE,...,88,86,1016.7,1015.6,7,8,23.5,23.0,Yes,Yes
3,2/4/2008,20.2,22.8,18.8,2.2,0.0,W,41,NNE,E,...,83,90,1014.2,1011.8,8,8,21.4,20.9,Yes,Yes
4,2/5/2008,19.7,25.7,77.4,4.8,0.0,W,41,NNE,W,...,88,74,1008.3,1004.8,8,8,22.5,25.5,Yes,Yes


### Data Preprocessing


#### Transforming Categorical Variables


In [4]:
df_sydney_processed = pd.get_dummies(data=df, columns=['RainToday', 'WindGustDir', 'WindDir9am', 'WindDir3pm'])

In [5]:
df_sydney_processed.replace(['No', 'Yes'], [0,1], inplace=True)

### Training Data and Test Data


Now, we set our 'features' or x values and our Y or target variable.


In [6]:
df_sydney_processed.drop('Date',axis=1,inplace=True)


In [7]:
df_sydney_processed = df_sydney_processed.astype(float)
df_sydney_processed.dropna(inplace=True)

In [8]:
features = df_sydney_processed.drop(columns='RainTomorrow', axis=1)
Y = df_sydney_processed['RainTomorrow']
print(features,Y)

      MinTemp  MaxTemp  Rainfall  Evaporation  Sunshine  WindGustSpeed   
0        19.5     22.4      15.6          6.2       0.0           41.0  \
1        19.5     25.6       6.0          3.4       2.7           41.0   
2        21.6     24.5       6.6          2.4       0.1           41.0   
3        20.2     22.8      18.8          2.2       0.0           41.0   
4        19.7     25.7      77.4          4.8       0.0           41.0   
...       ...      ...       ...          ...       ...            ...   
3266      8.6     19.6       0.0          2.0       7.8           37.0   
3267      9.3     19.2       0.0          2.0       9.2           30.0   
3268      9.4     17.7       0.0          2.4       2.7           24.0   
3269     10.1     19.3       0.0          1.4       9.3           43.0   
3270      7.6     19.3       0.0          3.4       9.4           35.0   

      WindSpeed9am  WindSpeed3pm  Humidity9am  Humidity3pm  ...   
0             17.0          20.0         92.

### Linear Regression


Using the `train_test_split` function to split the `features` and `Y` dataframes with a `test_size` of `0.2` and the `random_state` set to `20`.


In [9]:
x_train, x_test, y_train, y_test = train_test_split(features,Y,test_size=0.2,random_state=20)
print(x_train, x_test, y_train, y_test)

      MinTemp  MaxTemp  Rainfall  Evaporation  Sunshine  WindGustSpeed   
2058     22.1     26.3       0.2          6.0       4.3           33.0  \
1901     17.0     21.2       0.0          4.6       1.7           41.0   
2261      8.9     16.7       5.0          4.2       7.7           76.0   
1063     18.0     23.0       0.2          6.8       0.0           35.0   
768      17.0     26.5       0.0          6.6      10.2           41.0   
...       ...      ...       ...          ...       ...            ...   
1428     17.5     24.0       0.0          6.4      10.9           30.0   
2441     21.1     24.5       0.0          7.2       0.9           35.0   
2972      8.4     19.6       0.0          4.0       6.1           22.0   
271      16.9     35.0       0.2          3.2       5.0           41.0   
2522     11.1     18.3       0.2          2.4       1.1           28.0   

      WindSpeed9am  WindSpeed3pm  Humidity9am  Humidity3pm  ...   
2058          19.0          22.0         78.

Creating and training a Linear Regression model called LinearReg using the training data (`x_train`, `y_train`)


In [10]:
LinearReg = LinearRegression()
LinearReg.fit(x_train, y_train)


Now using the `predict` method on the testing data (`x_test`) and saving it to the array `predictions`


In [11]:
predictions = LinearReg.predict(x_test)

Using the `predictions` and the `y_test` dataframe calculate the value for each metrics.


In [12]:
R1=[]
LinearRegression_MAE = metrics.mean_absolute_error(y_test,predictions)
LinearRegression_MSE = 	metrics.mean_squared_error(y_test,predictions)
LinearRegression_R2 = 	metrics.r2_score(y_test,predictions)
R1.append([LinearRegression_MAE,LinearRegression_MSE,LinearRegression_R2])

Tabulating metrics using data frame for the linear model.


In [13]:
df = pd.DataFrame(R1,columns=['MAE','MSE',"R2"],index=['LinearReg'])
print(df)

                MAE       MSE        R2
LinearReg  0.268893  0.126689  0.335673


### KNN


Creating and training a KNN model using the training data (`x_train`, `y_train`) with the `n_neighbors` parameter set to `5`.


In [14]:
KNN = KNeighborsClassifier(n_neighbors = 5)
KNN.fit(x_train, y_train)

Predicting the results for testing data (`x_test`) and save it.

In [15]:
prediction = KNN.predict(x_test)

Analyzing the Metrics for the KNN model.


In [16]:
k1=[]
KNN_Accuracy_Score = accuracy_score(y_test,prediction)
KNN_JaccardIndex = jaccard_score(y_test,prediction)
KNN_F1_Score = f1_score(y_test,prediction)
k1.append([KNN_Accuracy_Score,KNN_JaccardIndex,KNN_F1_Score])

### Decision Tree


Decision Tree model called Tree using the training data (`x_train`, `y_train`).


In [17]:
tree = DecisionTreeClassifier()
tree.fit(x_train,y_train)

In [18]:
predictions = tree.predict(x_test)

In [19]:
Tree_Accuracy_Score = accuracy_score(y_test,prediction)
Tree_JaccardIndex = jaccard_score(y_test,prediction)
Tree_F1_Score = f1_score(y_test,prediction)
k1.append([Tree_Accuracy_Score,Tree_JaccardIndex,Tree_F1_Score])

### Logistic Regression


Using the `train_test_split` function to split the `features` and `Y` dataframes with a `test_size` of `0.2` and the `random_state` set to `11`.


In [20]:
x_train, x_test, y_train, y_test = train_test_split(features,Y,test_size=0.2,random_state=11)
print(x_train, x_test, y_train, y_test)

      MinTemp  MaxTemp  Rainfall  Evaporation  Sunshine  WindGustSpeed   
2771     19.1     27.4       0.0          7.2      10.1           30.0  \
1903     20.1     31.6       0.0          7.4       9.7           74.0   
2169     10.5     20.3       0.0          1.8       9.9           39.0   
1859      8.3     18.8       0.0          3.6       6.8           26.0   
579      14.7     21.2       0.0          6.4      10.2           41.0   
...       ...      ...       ...          ...       ...            ...   
332      19.2     31.5       0.0          7.4      13.1           41.0   
1293     18.7     19.5       0.8         11.8       0.3           59.0   
3163     17.9     25.1       1.0          5.4       2.7           43.0   
1104     18.8     23.5      99.4          4.8       3.3           59.0   
1945     19.5     27.0       0.0          9.4       0.6           50.0   

      WindSpeed9am  WindSpeed3pm  Humidity9am  Humidity3pm  ...   
2771           9.0          17.0         76.

LogisticRegression model called LR using the training data (`x_train`, `y_train`) with the `solver` parameter set to `liblinear`.


In [21]:
LR = LogisticRegression(solver='liblinear')
LR.fit(x_train, y_train)

In [22]:
predictions = LR.predict(x_test)

In [23]:
LR_Accuracy_Score = accuracy_score(y_test,predictions)
LR_JaccardIndex = jaccard_score(y_test,prediction)
LR_F1_Score =f1_score(y_test,prediction)
LR_Log_Loss = log_loss(y_test,prediction)
k1.append([LR_Accuracy_Score,LR_JaccardIndex,LR_F1_Score,LR_Log_Loss])


### SVM


Create and train a SVM model called SVM using the training data (`x_train`, `y_train`).


In [24]:
from sklearn import svm
s1=svm.LinearSVC()
s1.fit(x_train,y_train)

In [25]:
predictions = s1.predict(x_test)

In [26]:
SVM_Accuracy_Score = accuracy_score(y_test,predictions)
SVM_JaccardIndex = jaccard_score(y_test,prediction)
SVM_F1_Score =f1_score(y_test,prediction)
k1.append([SVM_Accuracy_Score,SVM_JaccardIndex,SVM_F1_Score])

### Report


Metrics are depicted in a tabular format using data frame for all of the above models.



In [27]:

report = pd.DataFrame(k1,columns=['Accuracy Score',"Jaccard Index","F1 Score", "Log Loss"],index=['KNN','Desition Tree','Logistic Regression','SVM'])
print(report)

                     Accuracy Score  Jaccard Index  F1 Score   Log Loss
KNN                        0.819847       0.415842  0.587413        NaN
Desition Tree              0.819847       0.415842  0.587413        NaN
Logistic Regression        0.848855       0.136546  0.240283  11.831123
SVM                        0.848855       0.136546  0.240283        NaN


Defining a Deep learning Model for Predicting Rainfall.

In [34]:
from keras import regularizers
from keras.optimizers import Adam
from keras.layers import Dropout
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

#defining the ann model
model=Sequential()
model.add(Dense(128,input_dim=66,kernel_initializer='normal',kernel_regularizer=regularizers.l2(.001),activation='relu'))
#adding neural layers
model.add(Dropout(0.25))
#diconnecting non essential connections of neurons
model.add(Dense(64,kernel_initializer='normal',kernel_regularizer=regularizers.l2(.001),activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(32,kernel_initializer='normal',kernel_regularizer=regularizers.l1_l2(.001),activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(16,kernel_initializer='normal',kernel_regularizer=regularizers.l2(.001),activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(4,kernel_initializer='normal',activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_14 (Dense)            (None, 128)               8576      
                                                                 
 dropout_12 (Dropout)        (None, 128)               0         
                                                                 
 dense_15 (Dense)            (None, 64)                8256      
                                                                 
 dropout_13 (Dropout)        (None, 64)                0         
                                                                 
 dense_16 (Dense)            (None, 32)                2080      
                                                                 
 dropout_14 (Dropout)        (None, 32)                0         
                                                                 
 dense_17 (Dense)            (None, 16)               

Fitting the model to the `train data`.

In [35]:
history=model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=150,batch_size=40)

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

In [30]:
y_pred=(model.predict(x_test)>.5).astype('int32')



Assessing the Model on various Parameters.

In [31]:
from sklearn.metrics import confusion_matrix,precision_score,matthews_corrcoef,jaccard_score
from sklearn.metrics import f1_score
cm1=confusion_matrix(y_test,y_pred)
print('Confusion Matrix :\n',cm1)
total1=sum(sum(cm1))
accuracy1=(cm1[0,0]+cm1[1,1])/total1
sensitivity1=cm1[0,0]/(cm1[0,0]+cm1[0,1])
specificity1=cm1[1,1]/(cm1[1,0]+cm1[1,1])
jaccard=jaccard_score(y_test,y_pred)
mcc=matthews_corrcoef(y_test,y_pred)
f1_score=f1_score(y_test,y_pred)
prece=precision_score(y_test,y_pred)
print("Accuracy : " + str(accuracy1 ))
print("Sensitivity : "+str(sensitivity1))
print("Specificity : "+str(specificity1))
print("F1_Score : "+ str(f1_score))
print("Precision : " + str(prece))
print("Jaccard Score : " + str(jaccard))
print("Matthew's Corelation Coeff : " + str(mcc))

Confusion Matrix :
 [[490   0]
 [165   0]]
Accuracy : 0.7480916030534351
Sensitivity : 1.0
Specificity : 0.0
F1_Score : 0.0
Precision : 0.0
Jaccard Score : 0.0
Matthew's Corelation Coeff : 0.0
