In this exercise, we will try to use a neural network on a typical prediction task: predicting whether tomorrow will be a rainy day.

The dataset is in `weatherAUS.csv` available [here on kaggle](https://www.kaggle.com/datasets/gauravduttakiit/weather-in-aus). The target value is the column `'RainTomorrow'`.

In [1]:
import pandas as pd
import tensorflow as tf
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras import Sequential
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.metrics import Precision, Recall, AUC
from tensorflow.keras.optimizers import Adam
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score

## 1. Data exploration

In [2]:
weather_df = pd.read_csv("weatherAUS.csv", sep=',')
weather_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 142193 entries, 0 to 142192
Data columns (total 23 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           142193 non-null  object 
 1   Location       142193 non-null  object 
 2   MinTemp        141556 non-null  float64
 3   MaxTemp        141871 non-null  float64
 4   Rainfall       140787 non-null  float64
 5   Evaporation    81350 non-null   float64
 6   Sunshine       74377 non-null   float64
 7   WindGustDir    132863 non-null  object 
 8   WindGustSpeed  132923 non-null  float64
 9   WindDir9am     132180 non-null  object 
 10  WindDir3pm     138415 non-null  object 
 11  WindSpeed9am   140845 non-null  float64
 12  WindSpeed3pm   139563 non-null  float64
 13  Humidity9am    140419 non-null  float64
 14  Humidity3pm    138583 non-null  float64
 15  Pressure9am    128179 non-null  float64
 16  Pressure3pm    128212 non-null  float64
 17  Cloud9am       88536 non-null

In [3]:
weather_df.head(15)

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,W,...,71.0,22.0,1007.7,1007.1,8.0,,16.9,21.8,No,No
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,NNW,...,44.0,25.0,1010.6,1007.8,,,17.2,24.3,No,No
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,W,...,38.0,30.0,1007.6,1008.7,,2.0,21.0,23.2,No,No
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,SE,...,45.0,16.0,1017.6,1012.8,,,18.1,26.5,No,No
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,ENE,...,82.0,33.0,1010.8,1006.0,7.0,8.0,17.8,29.7,No,No
5,2008-12-06,Albury,14.6,29.7,0.2,,,WNW,56.0,W,...,55.0,23.0,1009.2,1005.4,,,20.6,28.9,No,No
6,2008-12-07,Albury,14.3,25.0,0.0,,,W,50.0,SW,...,49.0,19.0,1009.6,1008.2,1.0,,18.1,24.6,No,No
7,2008-12-08,Albury,7.7,26.7,0.0,,,W,35.0,SSE,...,48.0,19.0,1013.4,1010.1,,,16.3,25.5,No,No
8,2008-12-09,Albury,9.7,31.9,0.0,,,NNW,80.0,SE,...,42.0,9.0,1008.9,1003.6,,,18.3,30.2,No,Yes
9,2008-12-10,Albury,13.1,30.1,1.4,,,W,28.0,S,...,58.0,27.0,1007.0,1005.7,,,20.1,28.2,Yes,No


In [4]:
weather_df['Date'] = pd.to_datetime(weather_df['Date']).dt.date

In [5]:
weather_df['Date'].sample(10)

78441     2016-08-14
133460    2010-06-24
49274     2013-05-08
48126     2009-11-24
113543    2014-02-15
132876    2017-06-08
52233     2013-07-12
129721    2017-02-05
105777    2016-12-07
141120    2014-07-07
Name: Date, dtype: object

In [6]:
weather_df['Location'][0]

'Albury'

In [7]:
weather_df['Location'].unique()

array(['Albury', 'BadgerysCreek', 'Cobar', 'CoffsHarbour', 'Moree',
       'Newcastle', 'NorahHead', 'NorfolkIsland', 'Penrith', 'Richmond',
       'Sydney', 'SydneyAirport', 'WaggaWagga', 'Williamtown',
       'Wollongong', 'Canberra', 'Tuggeranong', 'MountGinini', 'Ballarat',
       'Bendigo', 'Sale', 'MelbourneAirport', 'Melbourne', 'Mildura',
       'Nhil', 'Portland', 'Watsonia', 'Dartmoor', 'Brisbane', 'Cairns',
       'GoldCoast', 'Townsville', 'Adelaide', 'MountGambier', 'Nuriootpa',
       'Woomera', 'Albany', 'Witchcliffe', 'PearceRAAF', 'PerthAirport',
       'Perth', 'SalmonGums', 'Walpole', 'Hobart', 'Launceston',
       'AliceSprings', 'Darwin', 'Katherine', 'Uluru'], dtype=object)

In [8]:
df_copy = weather_df.copy()
df_copy.sample(10)

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
117182,2016-01-18,PerthAirport,19.5,23.9,0.0,8.0,2.9,WNW,35.0,WNW,...,76.0,55.0,1006.2,1004.7,8.0,7.0,20.6,22.9,No,No
29563,2008-04-07,Sydney,15.3,22.0,12.0,3.2,4.7,,,SSW,...,83.0,70.0,1026.3,1024.2,5.0,6.0,18.2,19.8,Yes,Yes
46477,2013-12-04,Canberra,14.9,32.4,0.0,,,NW,74.0,NNW,...,29.0,18.0,1005.5,998.6,,2.0,24.2,31.3,No,Yes
81690,2017-06-22,Dartmoor,7.5,15.1,0.0,,,NW,26.0,ENE,...,97.0,72.0,1026.8,1022.9,,,8.9,14.8,No,No
128838,2014-09-03,Hobart,3.2,15.9,1.6,1.8,7.2,SW,37.0,NNW,...,59.0,71.0,1020.9,1018.6,3.0,3.0,10.4,12.4,Yes,Yes
99947,2017-05-26,MountGambier,3.9,15.6,4.6,,,NNE,30.0,NNE,...,99.0,66.0,1022.7,1018.8,5.0,3.0,9.4,14.1,Yes,No
105722,2016-10-13,Woomera,9.4,24.3,0.0,6.8,,E,30.0,SE,...,63.0,23.0,1024.2,1020.7,1.0,,13.8,23.0,No,No
2629,2016-06-04,Albury,9.8,14.9,11.6,,,SE,30.0,SE,...,95.0,92.0,1015.2,1008.3,8.0,8.0,11.6,13.5,Yes,Yes
94720,2010-12-09,Adelaide,16.1,24.5,0.0,4.4,11.1,NW,46.0,WSW,...,72.0,41.0,1012.0,1011.2,,,19.1,23.7,No,No
59750,2009-01-15,Sale,15.1,25.7,0.0,10.0,7.3,W,54.0,WSW,...,65.0,36.0,1013.0,1012.7,7.0,7.0,16.8,23.1,No,No


In [9]:
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 142193 entries, 0 to 142192
Data columns (total 23 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           142193 non-null  object 
 1   Location       142193 non-null  object 
 2   MinTemp        141556 non-null  float64
 3   MaxTemp        141871 non-null  float64
 4   Rainfall       140787 non-null  float64
 5   Evaporation    81350 non-null   float64
 6   Sunshine       74377 non-null   float64
 7   WindGustDir    132863 non-null  object 
 8   WindGustSpeed  132923 non-null  float64
 9   WindDir9am     132180 non-null  object 
 10  WindDir3pm     138415 non-null  object 
 11  WindSpeed9am   140845 non-null  float64
 12  WindSpeed3pm   139563 non-null  float64
 13  Humidity9am    140419 non-null  float64
 14  Humidity3pm    138583 non-null  float64
 15  Pressure9am    128179 non-null  float64
 16  Pressure3pm    128212 non-null  float64
 17  Cloud9am       88536 non-null

In [10]:
locations = df_copy['Location'].unique()
locations

array(['Albury', 'BadgerysCreek', 'Cobar', 'CoffsHarbour', 'Moree',
       'Newcastle', 'NorahHead', 'NorfolkIsland', 'Penrith', 'Richmond',
       'Sydney', 'SydneyAirport', 'WaggaWagga', 'Williamtown',
       'Wollongong', 'Canberra', 'Tuggeranong', 'MountGinini', 'Ballarat',
       'Bendigo', 'Sale', 'MelbourneAirport', 'Melbourne', 'Mildura',
       'Nhil', 'Portland', 'Watsonia', 'Dartmoor', 'Brisbane', 'Cairns',
       'GoldCoast', 'Townsville', 'Adelaide', 'MountGambier', 'Nuriootpa',
       'Woomera', 'Albany', 'Witchcliffe', 'PearceRAAF', 'PerthAirport',
       'Perth', 'SalmonGums', 'Walpole', 'Hobart', 'Launceston',
       'AliceSprings', 'Darwin', 'Katherine', 'Uluru'], dtype=object)

In [11]:
for location in locations:
    location_filter = df_copy[df_copy['Location']==location]
    df_copy['MinTemp'] = df_copy['MinTemp'].fillna(np.mean(location_filter['MinTemp']))

In [12]:
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 142193 entries, 0 to 142192
Data columns (total 23 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           142193 non-null  object 
 1   Location       142193 non-null  object 
 2   MinTemp        142193 non-null  float64
 3   MaxTemp        141871 non-null  float64
 4   Rainfall       140787 non-null  float64
 5   Evaporation    81350 non-null   float64
 6   Sunshine       74377 non-null   float64
 7   WindGustDir    132863 non-null  object 
 8   WindGustSpeed  132923 non-null  float64
 9   WindDir9am     132180 non-null  object 
 10  WindDir3pm     138415 non-null  object 
 11  WindSpeed9am   140845 non-null  float64
 12  WindSpeed3pm   139563 non-null  float64
 13  Humidity9am    140419 non-null  float64
 14  Humidity3pm    138583 non-null  float64
 15  Pressure9am    128179 non-null  float64
 16  Pressure3pm    128212 non-null  float64
 17  Cloud9am       88536 non-null

In [13]:
for location in locations:
    location_filter = df_copy[df_copy['Location']==location]
    df_copy['MaxTemp'] = df_copy['MaxTemp'].fillna(np.mean(location_filter['MaxTemp']))

In [14]:
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 142193 entries, 0 to 142192
Data columns (total 23 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           142193 non-null  object 
 1   Location       142193 non-null  object 
 2   MinTemp        142193 non-null  float64
 3   MaxTemp        142193 non-null  float64
 4   Rainfall       140787 non-null  float64
 5   Evaporation    81350 non-null   float64
 6   Sunshine       74377 non-null   float64
 7   WindGustDir    132863 non-null  object 
 8   WindGustSpeed  132923 non-null  float64
 9   WindDir9am     132180 non-null  object 
 10  WindDir3pm     138415 non-null  object 
 11  WindSpeed9am   140845 non-null  float64
 12  WindSpeed3pm   139563 non-null  float64
 13  Humidity9am    140419 non-null  float64
 14  Humidity3pm    138583 non-null  float64
 15  Pressure9am    128179 non-null  float64
 16  Pressure3pm    128212 non-null  float64
 17  Cloud9am       88536 non-null

In [15]:
weather_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 142193 entries, 0 to 142192
Data columns (total 23 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           142193 non-null  object 
 1   Location       142193 non-null  object 
 2   MinTemp        141556 non-null  float64
 3   MaxTemp        141871 non-null  float64
 4   Rainfall       140787 non-null  float64
 5   Evaporation    81350 non-null   float64
 6   Sunshine       74377 non-null   float64
 7   WindGustDir    132863 non-null  object 
 8   WindGustSpeed  132923 non-null  float64
 9   WindDir9am     132180 non-null  object 
 10  WindDir3pm     138415 non-null  object 
 11  WindSpeed9am   140845 non-null  float64
 12  WindSpeed3pm   139563 non-null  float64
 13  Humidity9am    140419 non-null  float64
 14  Humidity3pm    138583 non-null  float64
 15  Pressure9am    128179 non-null  float64
 16  Pressure3pm    128212 non-null  float64
 17  Cloud9am       88536 non-null

In [16]:
weather_df.columns

Index(['Date', 'Location', 'MinTemp', 'MaxTemp', 'Rainfall', 'Evaporation',
       'Sunshine', 'WindGustDir', 'WindGustSpeed', 'WindDir9am', 'WindDir3pm',
       'WindSpeed9am', 'WindSpeed3pm', 'Humidity9am', 'Humidity3pm',
       'Pressure9am', 'Pressure3pm', 'Cloud9am', 'Cloud3pm', 'Temp9am',
       'Temp3pm', 'RainToday', 'RainTomorrow'],
      dtype='object')

In [17]:
numerical_cols = weather_df.select_dtypes(include=['number']).columns
numerical_cols

Index(['MinTemp', 'MaxTemp', 'Rainfall', 'Evaporation', 'Sunshine',
       'WindGustSpeed', 'WindSpeed9am', 'WindSpeed3pm', 'Humidity9am',
       'Humidity3pm', 'Pressure9am', 'Pressure3pm', 'Cloud9am', 'Cloud3pm',
       'Temp9am', 'Temp3pm'],
      dtype='object')

In [18]:
categorical_cols = weather_df.select_dtypes(include=['object', 'category']).columns
categorical_cols

Index(['Date', 'Location', 'WindGustDir', 'WindDir9am', 'WindDir3pm',
       'RainToday', 'RainTomorrow'],
      dtype='object')

In [19]:
for col in numerical_cols:
    for location in locations:
        location_filter = df_copy[df_copy['Location']==location]
        df_copy[col] = df_copy[col].fillna(np.mean(location_filter[col]))

In [20]:
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 142193 entries, 0 to 142192
Data columns (total 23 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           142193 non-null  object 
 1   Location       142193 non-null  object 
 2   MinTemp        142193 non-null  float64
 3   MaxTemp        142193 non-null  float64
 4   Rainfall       142193 non-null  float64
 5   Evaporation    142193 non-null  float64
 6   Sunshine       142193 non-null  float64
 7   WindGustDir    132863 non-null  object 
 8   WindGustSpeed  142193 non-null  float64
 9   WindDir9am     132180 non-null  object 
 10  WindDir3pm     138415 non-null  object 
 11  WindSpeed9am   142193 non-null  float64
 12  WindSpeed3pm   142193 non-null  float64
 13  Humidity9am    142193 non-null  float64
 14  Humidity3pm    142193 non-null  float64
 15  Pressure9am    142193 non-null  float64
 16  Pressure3pm    142193 non-null  float64
 17  Cloud9am       142193 non-nul

In [21]:
cols_deleted = ['WindGustDir', 'WindDir9am', 'WindDir3pm']
df_copy = df_copy.drop(columns=cols_deleted)

In [22]:
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 142193 entries, 0 to 142192
Data columns (total 20 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           142193 non-null  object 
 1   Location       142193 non-null  object 
 2   MinTemp        142193 non-null  float64
 3   MaxTemp        142193 non-null  float64
 4   Rainfall       142193 non-null  float64
 5   Evaporation    142193 non-null  float64
 6   Sunshine       142193 non-null  float64
 7   WindGustSpeed  142193 non-null  float64
 8   WindSpeed9am   142193 non-null  float64
 9   WindSpeed3pm   142193 non-null  float64
 10  Humidity9am    142193 non-null  float64
 11  Humidity3pm    142193 non-null  float64
 12  Pressure9am    142193 non-null  float64
 13  Pressure3pm    142193 non-null  float64
 14  Cloud9am       142193 non-null  float64
 15  Cloud3pm       142193 non-null  float64
 16  Temp9am        142193 non-null  float64
 17  Temp3pm        142193 non-nul

In [23]:
df_copy = df_copy.dropna(subset=['RainToday'])

In [24]:
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
Index: 140787 entries, 0 to 142192
Data columns (total 20 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           140787 non-null  object 
 1   Location       140787 non-null  object 
 2   MinTemp        140787 non-null  float64
 3   MaxTemp        140787 non-null  float64
 4   Rainfall       140787 non-null  float64
 5   Evaporation    140787 non-null  float64
 6   Sunshine       140787 non-null  float64
 7   WindGustSpeed  140787 non-null  float64
 8   WindSpeed9am   140787 non-null  float64
 9   WindSpeed3pm   140787 non-null  float64
 10  Humidity9am    140787 non-null  float64
 11  Humidity3pm    140787 non-null  float64
 12  Pressure9am    140787 non-null  float64
 13  Pressure3pm    140787 non-null  float64
 14  Cloud9am       140787 non-null  float64
 15  Cloud3pm       140787 non-null  float64
 16  Temp9am        140787 non-null  float64
 17  Temp3pm        140787 non-null  fl

In [25]:
df_copy = df_copy.drop(columns=['Date', 'Location'])

In [26]:
df_copy['RainToday'] = df_copy['RainToday'].replace({'Yes':1, 'No':0})
df_copy['RainTomorrow'] = df_copy['RainTomorrow'].replace({'Yes':1, 'No':0})

  df_copy['RainToday'] = df_copy['RainToday'].replace({'Yes':1, 'No':0})
  df_copy['RainTomorrow'] = df_copy['RainTomorrow'].replace({'Yes':1, 'No':0})


In [27]:
df_copy.sample(10)

Unnamed: 0,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustSpeed,WindSpeed9am,WindSpeed3pm,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
120993,8.4,22.1,0.0,6.720797,8.674364,50.0,17.0,17.0,62.0,40.0,1018.367253,1015.755504,6.392356,5.419788,14.9,20.5,0,0
21037,19.5,24.1,0.0,5.2,0.3,20.0,11.0,13.0,74.0,91.0,1015.9,1015.0,7.0,7.0,23.3,22.6,0,1
136876,24.4,30.5,49.4,7.4,2.8,52.0,7.0,9.0,89.0,87.0,1009.5,1005.8,7.0,7.0,26.3,26.0,1,1
97664,6.6,19.5,0.0,4.0,12.3,41.0,20.0,24.0,49.0,46.0,1026.9,1024.1,5.0,1.0,15.1,17.7,0,0
112563,12.6,22.0,8.6,6.720797,10.1,48.0,15.0,31.0,84.0,47.0,1014.7,1013.6,8.0,3.0,17.8,21.7,1,0
6775,4.0,20.5,0.0,2.2,8.674364,19.0,9.0,2.0,57.0,20.0,1030.5,1027.3,7.0,7.0,10.8,20.2,0,0
22305,16.7,22.0,0.0,5.4,6.7,37.0,15.0,17.0,68.0,70.0,1021.0,1019.7,7.0,6.0,19.2,20.1,0,0
92540,25.5,30.6,0.0,8.8,8.6,43.0,30.0,33.0,70.0,66.0,1013.0,1010.4,6.0,6.0,28.1,28.7,0,0
132702,7.6,21.1,0.0,6.720797,8.674364,31.0,11.0,17.0,73.0,62.0,1018.367253,1015.755504,6.392356,8.0,14.6,19.3,0,0
121137,2.9,19.7,0.0,6.720797,8.674364,22.0,6.0,6.0,77.0,50.0,1018.367253,1015.755504,6.392356,5.419788,12.2,19.3,0,1


Now build a MLP model. Begin with for example 2 hidden layers of 20 units.

In [28]:
X = df_copy.drop(columns='RainTomorrow')
y = df_copy['RainTomorrow']

PS : note that all the steps for data preparation can be easily done in a single function (pipeline).

## 2. Building a model

In [29]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.2, random_state=1)

In [30]:
scaler = StandardScaler()
Xtrain = scaler.fit_transform(Xtrain)
Xtest = scaler.transform(Xtest)

In [31]:
# MLP model architecture
def model(n_features, hl_activation):
    model = Sequential(
        [
            Input(shape=(n_features,)),
            Dense(20, activation=hl_activation[0]),
            Dense(20, activation=hl_activation[1]),
            Dense(1, activation='sigmoid')
        ]
    )
    return model

In [32]:
n_features = Xtrain.shape[1]
n_features

17

In [33]:
hl_activation = ['tanh', 'relu']

In [34]:
weather_model = model(n_features=n_features, hl_activation=hl_activation)
weather_model.summary()

Now compile and fit your model.

In [35]:
weather_model.compile(optimizer = Adam(learning_rate=0.0001), loss='binary_crossentropy', metrics=['accuracy', Precision(), Recall(), AUC()])

In [36]:
callbacks = [EarlyStopping(monitor='val_loss', patience=3)]

In [37]:
history = weather_model.fit(x=Xtrain, y=ytrain, validation_split=.25, epochs=20, batch_size=32, callbacks=callbacks)

Epoch 1/20
[1m2640/2640[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 654us/step - accuracy: 0.7339 - auc: 0.7003 - loss: 0.5391 - precision: 0.4398 - recall: 0.4138 - val_accuracy: 0.8350 - val_auc: 0.8485 - val_loss: 0.3768 - val_precision: 0.6671 - val_recall: 0.4909
Epoch 2/20
[1m2640/2640[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 560us/step - accuracy: 0.8366 - auc: 0.8533 - loss: 0.3729 - precision: 0.6737 - recall: 0.5087 - val_accuracy: 0.8425 - val_auc: 0.8599 - val_loss: 0.3626 - val_precision: 0.6794 - val_recall: 0.5304
Epoch 3/20
[1m2640/2640[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 545us/step - accuracy: 0.8424 - auc: 0.8624 - loss: 0.3622 - precision: 0.6895 - recall: 0.5291 - val_accuracy: 0.8462 - val_auc: 0.8631 - val_loss: 0.3582 - val_precision: 0.7016 - val_recall: 0.5169
Epoch 4/20
[1m2640/2640[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 540us/step - accuracy: 0.8453 - auc: 0.8668 - loss: 0.3579 - precision: 0.7073 - 

Now check the accuracy on the test dataset.

In [38]:
# Model Evaluation
loss, accuracy, precision, recall, AUC_coeff = weather_model.evaluate(Xtest, ytest, verbose=0)
print('loss is:', loss)
print('accuracy is:', accuracy)
print('Precision is:', precision)
print('Recall is:', recall)
print('AUC is:', AUC_coeff)


loss is: 0.34911754727363586
accuracy is: 0.8490659594535828
Precision is: 0.7152747511863708
Recall is: 0.5298076868057251
AUC is: 0.8715755343437195


---

Now let's try to use a classical machine learning classification method.

In [39]:
lr_model = LogisticRegression()

In [40]:
lr_model.get_params()

{'C': 1.0,
 'class_weight': None,
 'dual': False,
 'fit_intercept': True,
 'intercept_scaling': 1,
 'l1_ratio': None,
 'max_iter': 100,
 'multi_class': 'deprecated',
 'n_jobs': None,
 'penalty': 'l2',
 'random_state': None,
 'solver': 'lbfgs',
 'tol': 0.0001,
 'verbose': 0,
 'warm_start': False}

In [41]:
lr_model.fit(Xtrain, ytrain)

In [42]:
lr_pred = lr_model.predict(Xtest)

In [43]:
print("LR accuracy:", accuracy_score(lr_pred, ytest))
print("Classification report:\n", classification_report(lr_pred, ytest))

LR accuracy: 0.8441295546558705
Classification report:
               precision    recall  f1-score   support

           0       0.95      0.87      0.90     23913
           1       0.49      0.72      0.58      4245

    accuracy                           0.84     28158
   macro avg       0.72      0.79      0.74     28158
weighted avg       0.88      0.84      0.86     28158



The Deep neural network seems to outperform classical logistic regression