## Keras layered model: CNN

## Table of Contents

#### 1. Importing Libraries and Data
#### 2. Data Wrangling
#### 3. Reshaping for modeling
#### 4. Data Split
#### 5. Creating Keras Model
#### 6. Creating Confusion Matrix

## Importing Libraries and Data

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import os
import operator
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from numpy import unique
from numpy import reshape
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, Conv2D, Dense, BatchNormalization, Flatten, MaxPooling1D, Dropout
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

import warnings
warnings.filterwarnings("ignore")

In [2]:
# Define path
path = r'C:\Users\Lukman\Desktop\FEMINA CF\Machine Learning\Datasets'

In [3]:
# Import data
climate = pd.read_csv(os.path.join(path, 'Dataset-weather-prediction-dataset-processed.csv'))
climate

Unnamed: 0,DATE,MONTH,BASEL_cloud_cover,BASEL_wind_speed,BASEL_humidity,BASEL_pressure,BASEL_global_radiation,BASEL_precipitation,BASEL_snow_depth,BASEL_sunshine,...,VALENTIA_cloud_cover,VALENTIA_humidity,VALENTIA_pressure,VALENTIA_global_radiation,VALENTIA_precipitation,VALENTIA_snow_depth,VALENTIA_sunshine,VALENTIA_temp_mean,VALENTIA_temp_min,VALENTIA_temp_max
0,19600101,1,7,2.1,0.85,1.0180,0.32,0.09,0,0.7,...,5,0.88,1.0003,0.45,0.34,0,4.7,8.5,6.0,10.9
1,19600102,1,6,2.1,0.84,1.0180,0.36,1.05,0,1.1,...,7,0.91,1.0007,0.25,0.84,0,0.7,8.9,5.6,12.1
2,19600103,1,8,2.1,0.90,1.0180,0.18,0.30,0,0.0,...,7,0.91,1.0096,0.17,0.08,0,0.1,10.5,8.1,12.9
3,19600104,1,3,2.1,0.92,1.0180,0.58,0.00,0,4.1,...,7,0.86,1.0184,0.13,0.98,0,0.0,7.4,7.3,10.6
4,19600105,1,6,2.1,0.95,1.0180,0.65,0.14,0,5.4,...,3,0.80,1.0328,0.46,0.00,0,5.7,5.7,3.0,8.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22945,20221027,10,1,2.1,0.79,1.0248,1.34,0.22,0,7.7,...,5,0.82,1.0142,1.13,0.41,0,3.4,10.7,7.9,13.5
22946,20221028,10,6,2.1,0.77,1.0244,1.34,0.22,0,5.4,...,5,0.82,1.0142,1.13,0.41,0,3.4,10.7,7.9,13.5
22947,20221029,10,4,2.1,0.76,1.0227,1.34,0.22,0,6.1,...,5,0.82,1.0142,1.13,0.41,0,3.4,10.7,7.9,13.5
22948,20221030,10,5,2.1,0.80,1.0212,1.34,0.22,0,5.8,...,5,0.82,1.0142,1.13,0.41,0,3.4,10.7,7.9,13.5


In [4]:
#Read in the pleasant weather data.
df = pd.read_csv(os.path.join(path,'Dataset-Answers-Weather_Prediction_Pleasant_Weather.csv'))
df

Unnamed: 0,DATE,BASEL_pleasant_weather,BELGRADE_pleasant_weather,BUDAPEST_pleasant_weather,DEBILT_pleasant_weather,DUSSELDORF_pleasant_weather,HEATHROW_pleasant_weather,KASSEL_pleasant_weather,LJUBLJANA_pleasant_weather,MAASTRICHT_pleasant_weather,MADRID_pleasant_weather,MUNCHENB_pleasant_weather,OSLO_pleasant_weather,SONNBLICK_pleasant_weather,STOCKHOLM_pleasant_weather,VALENTIA_pleasant_weather
0,19600101,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,19600102,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,19600103,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,19600104,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,19600105,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22945,20221027,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
22946,20221028,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
22947,20221029,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
22948,20221030,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## 2. Data Wrangling
Ensure data is structured with correct shape to feed the deep learning model:
- Drop 3 weather stations not included in answers.
- Remove 2 types of observations (columns) missing multiple entries for most stations.
- Fill in 3 individual observations assuming nearby stations have similar weather.
- Drop DATE and MONTH from observations and DATE from predictions data set.
- Export dataset as "Cleaned" version. X shape should be (22950, 135) and y shape should be (22950, 15).

In [5]:
# Drop the columns related to Tours, Gdansk and Rome from the unscaled dataset

climate = climate.drop(['GDANSK_cloud_cover', 'GDANSK_humidity', 'GDANSK_precipitation', 'GDANSK_snow_depth', 'GDANSK_temp_mean', 'GDANSK_temp_min', 'GDANSK_temp_max',
                        'ROMA_cloud_cover', 'ROMA_wind_speed', 'ROMA_humidity', 'ROMA_pressure', 'ROMA_sunshine', 'ROMA_temp_mean',
                        'TOURS_wind_speed', 'TOURS_humidity', 'TOURS_pressure', 'TOURS_global_radiation', 'TOURS_precipitation', 'TOURS_temp_mean', 'TOURS_temp_min', 'TOURS_temp_max'], axis=1)

In [6]:
climate.shape

(22950, 149)

In [7]:
# Extract the different observation types

observation_types = ['cloud_cover', 'wind_speed', 'humidity', 'pressure',
                     'global_radiation', 'precipitation', 'snow_depth', 
                     'sunshine', 'temp_mean', 'temp_min', 'temp_max']

In [8]:
# Create a dictionary to store the count of stations for each observation type
station_counts = {}

for obs in observation_types:
    # Select columns related to the current observation type
    columns = [col for col in climate.columns if col.endswith(obs)]
    
    # Count the number of stations (i.e., the number of columns) for the current observation type
    station_counts[obs] = len(columns)

# Print the count of stations for each observation type
print("Number of stations covered by each observation type:")
for obs, count in station_counts.items():
    print(f"{obs}: {count} stations")

Number of stations covered by each observation type:
cloud_cover: 14 stations
wind_speed: 9 stations
humidity: 14 stations
pressure: 14 stations
global_radiation: 15 stations
precipitation: 15 stations
snow_depth: 6 stations
sunshine: 15 stations
temp_mean: 15 stations
temp_min: 15 stations
temp_max: 15 stations


In [9]:
#The two columns missing multiple entries for most stations are: wind_speed (only 9 stations) and snow_depth (only 6 stations).

# Drop columns that end with wind_speed and snow_depth from the dataset

columns_to_drop = climate.filter(regex='(_wind_speed|_snow_depth)$').columns
columns_to_drop

Index(['BASEL_wind_speed', 'BASEL_snow_depth', 'DEBILT_wind_speed',
       'DUSSELDORF_wind_speed', 'DUSSELDORF_snow_depth', 'HEATHROW_snow_depth',
       'KASSEL_wind_speed', 'LJUBLJANA_wind_speed', 'MAASTRICHT_wind_speed',
       'MADRID_wind_speed', 'MUNCHENB_snow_depth', 'OSLO_wind_speed',
       'OSLO_snow_depth', 'SONNBLICK_wind_speed', 'VALENTIA_snow_depth'],
      dtype='object')

In [10]:
climate_new = climate.drop(columns=columns_to_drop)

In [11]:
climate_new

Unnamed: 0,DATE,MONTH,BASEL_cloud_cover,BASEL_humidity,BASEL_pressure,BASEL_global_radiation,BASEL_precipitation,BASEL_sunshine,BASEL_temp_mean,BASEL_temp_min,...,STOCKHOLM_temp_max,VALENTIA_cloud_cover,VALENTIA_humidity,VALENTIA_pressure,VALENTIA_global_radiation,VALENTIA_precipitation,VALENTIA_sunshine,VALENTIA_temp_mean,VALENTIA_temp_min,VALENTIA_temp_max
0,19600101,1,7,0.85,1.0180,0.32,0.09,0.7,6.5,0.8,...,4.9,5,0.88,1.0003,0.45,0.34,4.7,8.5,6.0,10.9
1,19600102,1,6,0.84,1.0180,0.36,1.05,1.1,6.1,3.3,...,5.0,7,0.91,1.0007,0.25,0.84,0.7,8.9,5.6,12.1
2,19600103,1,8,0.90,1.0180,0.18,0.30,0.0,8.5,5.1,...,4.1,7,0.91,1.0096,0.17,0.08,0.1,10.5,8.1,12.9
3,19600104,1,3,0.92,1.0180,0.58,0.00,4.1,6.3,3.8,...,2.3,7,0.86,1.0184,0.13,0.98,0.0,7.4,7.3,10.6
4,19600105,1,6,0.95,1.0180,0.65,0.14,5.4,3.0,-0.7,...,4.3,3,0.80,1.0328,0.46,0.00,5.7,5.7,3.0,8.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22945,20221027,10,1,0.79,1.0248,1.34,0.22,7.7,15.9,11.4,...,14.2,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5
22946,20221028,10,6,0.77,1.0244,1.34,0.22,5.4,16.7,14.3,...,14.3,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5
22947,20221029,10,4,0.76,1.0227,1.34,0.22,6.1,16.7,13.1,...,14.4,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5
22948,20221030,10,5,0.80,1.0212,1.34,0.22,5.8,15.4,11.6,...,12.4,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5


In [12]:
# Create a list of all unique station names in the dataset to find the 1 missing entry in cloud_cover, humidity, and pressure. 
all_stations = set([col.split('_')[0] for col in climate_new.columns if '_' in col])
all_stations

{'BASEL',
 'BELGRADE',
 'BUDAPEST',
 'DEBILT',
 'DUSSELDORF',
 'HEATHROW',
 'KASSEL',
 'LJUBLJANA',
 'MAASTRICHT',
 'MADRID',
 'MUNCHENB',
 'OSLO',
 'SONNBLICK',
 'STOCKHOLM',
 'VALENTIA'}

In [13]:
observation_types = ['cloud_cover', 'humidity', 'pressure']

missing_stations_by_observation = {}

for obs in observation_types:
    # Select columns related to the current observation type
    columns = [col for col in climate_new.columns if col.endswith(obs)]
    
    # Extract station names by removing the observation type from the column names
    station_names = set([col.replace(f'_{obs}', '') for col in columns])
    
    # Identify stations that are in all_stations but missing from the current observation type
    missing_stations = all_stations - station_names
    
    # Store the missing station names in the dictionary
    missing_stations_by_observation[obs] = missing_stations

# Print the missing station names for each observation type
for obs, missing_stations in missing_stations_by_observation.items():
    print(f"\nStations missing from {obs}:")
    if missing_stations:
        for station in missing_stations:
            print(station)
    else:
        print("None")


Stations missing from cloud_cover:
KASSEL

Stations missing from humidity:
STOCKHOLM

Stations missing from pressure:
MUNCHENB


In [14]:
# Get the position of HEATHROW_temp_max to see where we need to position the new KASSEL_cloud_cover  (+1 next to it)

climate_new.columns.get_loc('HEATHROW_temp_max')

55

In [15]:
climate_new.columns.get_loc('MUNCHENB_humidity') # +2

92

In [16]:
climate_new.columns.get_loc('STOCKHOLM_cloud_cover')

117

In [17]:
# Insert new columns into "unscaled" at specific positions.
# The data for these new columns is copied from other existing columns:
# Kassel_cloud_cover with Dusseldorf_cloud_cover
# Stockholm_humidity with Oslo_humidity
# Munchenb_pressure with Basel_pressure

climate_new.insert(56,'KASSEL_cloud_cover', climate_new['DUSSELDORF_cloud_cover'])
climate_new.insert(119, 'STOCKHOLM_humidity', climate_new['OSLO_humidity'])
climate_new.insert(94,'MUNCHENB_pressure',climate_new['BASEL_pressure'])

In [18]:
climate_new.columns.tolist()

['DATE',
 'MONTH',
 'BASEL_cloud_cover',
 'BASEL_humidity',
 'BASEL_pressure',
 'BASEL_global_radiation',
 'BASEL_precipitation',
 'BASEL_sunshine',
 'BASEL_temp_mean',
 'BASEL_temp_min',
 'BASEL_temp_max',
 'BELGRADE_cloud_cover',
 'BELGRADE_humidity',
 'BELGRADE_pressure',
 'BELGRADE_global_radiation',
 'BELGRADE_precipitation',
 'BELGRADE_sunshine',
 'BELGRADE_temp_mean',
 'BELGRADE_temp_min',
 'BELGRADE_temp_max',
 'BUDAPEST_cloud_cover',
 'BUDAPEST_humidity',
 'BUDAPEST_pressure',
 'BUDAPEST_global_radiation',
 'BUDAPEST_precipitation',
 'BUDAPEST_sunshine',
 'BUDAPEST_temp_mean',
 'BUDAPEST_temp_min',
 'BUDAPEST_temp_max',
 'DEBILT_cloud_cover',
 'DEBILT_humidity',
 'DEBILT_pressure',
 'DEBILT_global_radiation',
 'DEBILT_precipitation',
 'DEBILT_sunshine',
 'DEBILT_temp_mean',
 'DEBILT_temp_min',
 'DEBILT_temp_max',
 'DUSSELDORF_cloud_cover',
 'DUSSELDORF_humidity',
 'DUSSELDORF_pressure',
 'DUSSELDORF_global_radiation',
 'DUSSELDORF_precipitation',
 'DUSSELDORF_sunshine',
 'DUSS

In [19]:
# Export cleaned dataset with date

climate_new.to_csv(os.path.join(path, 'climate_cleaned_with_date.csv'), index=False)

In [20]:
# Drop unnecessary columns

climate_new.drop(['DATE', 'MONTH'], axis=1, inplace=True)

In [21]:
climate_new.shape

(22950, 135)

In [22]:
df.drop(columns = 'DATE', inplace = True)

In [23]:
df.shape

(22950, 15)

In [24]:
# Export cleaned dataset without date

climate_new.to_csv(os.path.join(path, 'climate_cleaned.csv'), index=False)

## 3. Reshaping for modeling

In [25]:
X = pd.read_csv(os.path.join(path, 'climate_cleaned.csv'), index_col = False)

In [26]:
X

Unnamed: 0,BASEL_cloud_cover,BASEL_humidity,BASEL_pressure,BASEL_global_radiation,BASEL_precipitation,BASEL_sunshine,BASEL_temp_mean,BASEL_temp_min,BASEL_temp_max,BELGRADE_cloud_cover,...,STOCKHOLM_temp_max,VALENTIA_cloud_cover,VALENTIA_humidity,VALENTIA_pressure,VALENTIA_global_radiation,VALENTIA_precipitation,VALENTIA_sunshine,VALENTIA_temp_mean,VALENTIA_temp_min,VALENTIA_temp_max
0,7,0.85,1.0180,0.32,0.09,0.7,6.5,0.8,10.9,1,...,4.9,5,0.88,1.0003,0.45,0.34,4.7,8.5,6.0,10.9
1,6,0.84,1.0180,0.36,1.05,1.1,6.1,3.3,10.1,6,...,5.0,7,0.91,1.0007,0.25,0.84,0.7,8.9,5.6,12.1
2,8,0.90,1.0180,0.18,0.30,0.0,8.5,5.1,9.9,6,...,4.1,7,0.91,1.0096,0.17,0.08,0.1,10.5,8.1,12.9
3,3,0.92,1.0180,0.58,0.00,4.1,6.3,3.8,10.6,8,...,2.3,7,0.86,1.0184,0.13,0.98,0.0,7.4,7.3,10.6
4,6,0.95,1.0180,0.65,0.14,5.4,3.0,-0.7,6.0,8,...,4.3,3,0.80,1.0328,0.46,0.00,5.7,5.7,3.0,8.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22945,1,0.79,1.0248,1.34,0.22,7.7,15.9,11.4,21.4,2,...,14.2,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5
22946,6,0.77,1.0244,1.34,0.22,5.4,16.7,14.3,21.9,0,...,14.3,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5
22947,4,0.76,1.0227,1.34,0.22,6.1,16.7,13.1,22.4,2,...,14.4,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5
22948,5,0.80,1.0212,1.34,0.22,5.8,15.4,11.6,21.1,1,...,12.4,5,0.82,1.0142,1.13,0.41,3.4,10.7,7.9,13.5


In [27]:
y=df
y

Unnamed: 0,BASEL_pleasant_weather,BELGRADE_pleasant_weather,BUDAPEST_pleasant_weather,DEBILT_pleasant_weather,DUSSELDORF_pleasant_weather,HEATHROW_pleasant_weather,KASSEL_pleasant_weather,LJUBLJANA_pleasant_weather,MAASTRICHT_pleasant_weather,MADRID_pleasant_weather,MUNCHENB_pleasant_weather,OSLO_pleasant_weather,SONNBLICK_pleasant_weather,STOCKHOLM_pleasant_weather,VALENTIA_pleasant_weather
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22945,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
22946,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
22947,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
22948,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [28]:
# Turn X and y to arrays

X = np.array(X)
y = np.array(y)

In [29]:
X = X.reshape(-1,15,9)
X

array([[[  7.    ,   0.85  ,   1.018 , ...,   6.5   ,   0.8   ,
          10.9   ],
        [  1.    ,   0.81  ,   1.0195, ...,   3.7   ,  -0.9   ,
           7.9   ],
        [  4.    ,   0.67  ,   1.017 , ...,   2.4   ,  -0.4   ,
           5.1   ],
        ...,
        [  4.    ,   0.73  ,   1.0304, ...,  -5.9   ,  -8.5   ,
          -3.2   ],
        [  5.    ,   0.98  ,   1.0114, ...,   4.2   ,   2.2   ,
           4.9   ],
        [  5.    ,   0.88  ,   1.0003, ...,   8.5   ,   6.    ,
          10.9   ]],

       [[  6.    ,   0.84  ,   1.018 , ...,   6.1   ,   3.3   ,
          10.1   ],
        [  6.    ,   0.84  ,   1.0172, ...,   2.9   ,   2.2   ,
           4.4   ],
        [  4.    ,   0.67  ,   1.017 , ...,   2.3   ,   1.4   ,
           3.1   ],
        ...,
        [  6.    ,   0.97  ,   1.0292, ...,  -9.5   , -10.5   ,
          -8.5   ],
        [  5.    ,   0.62  ,   1.0114, ...,   4.    ,   3.    ,
           5.    ],
        [  7.    ,   0.91  ,   1.0007, ...,   8.

## 4. Split the data

In [30]:
# Split data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X,y,random_state = 42)

In [31]:
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

(17212, 15, 9) (17212, 15)
(5738, 15, 9) (5738, 15)


## 5. Creating Keras Model

In [32]:
epochs = 30
batch_size = 64
n_hidden = 64

timesteps = len(X_train[0])
input_dim = len(X_train[0][0])
n_classes = len(y_train[0])

model = Sequential()
model.add(Conv1D(n_hidden, kernel_size=2, activation='relu', input_shape=(timesteps, input_dim)))
model.add(Dense(16, activation='relu'))
model.add(MaxPooling1D())
model.add(Flatten())
model.add(Dense(n_classes, activation='sigmoid')) # Options: sigmoid, tanh, softmax, relu

In [33]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv1d (Conv1D)             (None, 14, 64)            1216      
                                                                 
 dense (Dense)               (None, 14, 16)            1040      
                                                                 
 max_pooling1d (MaxPooling1D  (None, 7, 16)            0         
 )                                                               
                                                                 
 flatten (Flatten)           (None, 112)               0         
                                                                 
 dense_1 (Dense)             (None, 15)                1695      
                                                                 
Total params: 3,951
Trainable params: 3,951
Non-trainable params: 0
______________________________________________________

In [34]:
# Build Model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [35]:
# Run Model
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, verbose=2)

Epoch 1/30
269/269 - 1s - loss: 220.8376 - accuracy: 0.1106 - 1s/epoch - 5ms/step
Epoch 2/30
269/269 - 0s - loss: 2127.2634 - accuracy: 0.1116 - 414ms/epoch - 2ms/step
Epoch 3/30
269/269 - 0s - loss: 7171.8506 - accuracy: 0.1150 - 386ms/epoch - 1ms/step
Epoch 4/30
269/269 - 0s - loss: 16120.4443 - accuracy: 0.1150 - 390ms/epoch - 1ms/step
Epoch 5/30
269/269 - 0s - loss: 29399.8750 - accuracy: 0.1163 - 391ms/epoch - 1ms/step
Epoch 6/30
269/269 - 1s - loss: 47561.5195 - accuracy: 0.1201 - 535ms/epoch - 2ms/step
Epoch 7/30
269/269 - 0s - loss: 71000.9844 - accuracy: 0.1183 - 478ms/epoch - 2ms/step
Epoch 8/30
269/269 - 0s - loss: 102889.3359 - accuracy: 0.1232 - 373ms/epoch - 1ms/step
Epoch 9/30
269/269 - 0s - loss: 138144.7188 - accuracy: 0.1238 - 403ms/epoch - 1ms/step
Epoch 10/30
269/269 - 0s - loss: 184437.7188 - accuracy: 0.1268 - 393ms/epoch - 1ms/step
Epoch 11/30
269/269 - 0s - loss: 231358.9531 - accuracy: 0.1231 - 455ms/epoch - 2ms/step
Epoch 12/30
269/269 - 0s - loss: 288602.6875

<keras.callbacks.History at 0x18869af7e50>

In [36]:
epochs = 12
batch_size = 50
n_hidden = 8

timesteps = len(X_train[0])
input_dim = len(X_train[0][0])
n_classes = len(y_train[0])

model = Sequential()
model.add(Conv1D(n_hidden, kernel_size=3, activation='relu', input_shape=(timesteps, input_dim)))
model.add(Dense(16, activation='relu'))
model.add(MaxPooling1D())
model.add(Flatten())
model.add(Dense(n_classes, activation='softmax')) # Options: sigmoid, tanh, softmax, relu

In [37]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv1d_1 (Conv1D)           (None, 13, 8)             224       
                                                                 
 dense_2 (Dense)             (None, 13, 16)            144       
                                                                 
 max_pooling1d_1 (MaxPooling  (None, 6, 16)            0         
 1D)                                                             
                                                                 
 flatten_1 (Flatten)         (None, 96)                0         
                                                                 
 dense_3 (Dense)             (None, 15)                1455      
                                                                 
Total params: 1,823
Trainable params: 1,823
Non-trainable params: 0
____________________________________________________

In [38]:
# Build Model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [39]:
# Ru model
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, verbose=2)

Epoch 1/12
345/345 - 1s - loss: 152.1586 - accuracy: 0.1127 - 923ms/epoch - 3ms/step
Epoch 2/12
345/345 - 1s - loss: 1279.0787 - accuracy: 0.0950 - 532ms/epoch - 2ms/step
Epoch 3/12
345/345 - 0s - loss: 3969.7996 - accuracy: 0.0967 - 469ms/epoch - 1ms/step
Epoch 4/12
345/345 - 0s - loss: 8922.4678 - accuracy: 0.0990 - 467ms/epoch - 1ms/step
Epoch 5/12
345/345 - 0s - loss: 15389.9375 - accuracy: 0.1014 - 428ms/epoch - 1ms/step
Epoch 6/12
345/345 - 0s - loss: 25343.6230 - accuracy: 0.1036 - 469ms/epoch - 1ms/step
Epoch 7/12
345/345 - 0s - loss: 37980.8438 - accuracy: 0.1013 - 437ms/epoch - 1ms/step
Epoch 8/12
345/345 - 1s - loss: 51809.5977 - accuracy: 0.1029 - 509ms/epoch - 1ms/step
Epoch 9/12
345/345 - 0s - loss: 70460.1953 - accuracy: 0.1049 - 406ms/epoch - 1ms/step
Epoch 10/12
345/345 - 0s - loss: 91956.0312 - accuracy: 0.1073 - 468ms/epoch - 1ms/step
Epoch 11/12
345/345 - 0s - loss: 113912.0156 - accuracy: 0.1061 - 499ms/epoch - 1ms/step
Epoch 12/12
345/345 - 0s - loss: 141760.9375 

<keras.callbacks.History at 0x18867d2a370>

In [42]:
from tensorflow.keras.optimizers import Adam

epochs = 8
batch_size = 20
n_hidden = 32
optimizer = Adam(learning_rate=0.000001)

timesteps = len(X_train[0])
input_dim = len(X_train[0][0])
n_classes = len(y_train[0])

# Implement complex layers
model = Sequential()
model.add(Conv1D(n_hidden, kernel_size=2, activation='relu', input_shape=(timesteps, input_dim)))
model.add(Dense(16, activation='relu'))
model.add(MaxPooling1D())
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(n_classes, activation='tanh')) 


# Build Model
model.compile(loss='categorical_crossentropy',  optimizer=optimizer, metrics=['accuracy'])

In [43]:
# Ru model
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, verbose=2)

Epoch 1/8
861/861 - 2s - loss: 23.4435 - accuracy: 0.0499 - 2s/epoch - 3ms/step
Epoch 2/8
861/861 - 1s - loss: 23.7023 - accuracy: 0.0474 - 929ms/epoch - 1ms/step
Epoch 3/8
861/861 - 1s - loss: 23.6263 - accuracy: 0.0497 - 910ms/epoch - 1ms/step
Epoch 4/8
861/861 - 1s - loss: 23.5849 - accuracy: 0.0483 - 920ms/epoch - 1ms/step
Epoch 5/8
861/861 - 1s - loss: 23.5026 - accuracy: 0.0509 - 1s/epoch - 1ms/step
Epoch 6/8
861/861 - 1s - loss: 23.7139 - accuracy: 0.0460 - 1s/epoch - 1ms/step
Epoch 7/8
861/861 - 1s - loss: 23.6597 - accuracy: 0.0468 - 1s/epoch - 1ms/step
Epoch 8/8
861/861 - 1s - loss: 23.4941 - accuracy: 0.0480 - 970ms/epoch - 1ms/step


<keras.callbacks.History at 0x1886a761970>

## 6. Creating Confusion Matrix

In [44]:
# Define list of stations names

stations = {
0: 'BASEL',
1: 'BELGRADE',
2: 'BUDAPEST',
3: 'DEBILT',
4: 'DUSSELDORF',
5: 'HEATHROW',
6: 'KASSEL',
7: 'LJUBLJANA',
8: 'MAASTRICHT',
9: 'MADRID',
10: 'MUNCHENB',
11: 'OSLO',
12: 'SONNBLICK',
13: 'STOCKHOLM',
14: 'VALENTIA'

}

In [45]:
def confusion_matrix(y_true, y_pred):
    y_true = pd.Series([stations[y] for y in np.argmax(y_true, axis=1)])
    y_pred = pd.Series([stations[y] for y in np.argmax(y_pred, axis=1)])

    return pd.crosstab(y_true, y_pred, rownames=['True'], colnames=['Pred'])

In [46]:
# Evaluate

print(confusion_matrix(y_test, model.predict(X_test)))

Pred        BASEL  BELGRADE  DUSSELDORF  KASSEL  LJUBLJANA  MAASTRICHT  \
True                                                                     
BASEL          34         5          12     385          0        2094   
BELGRADE        0         0           0       0          1        1004   
BUDAPEST        0         0           0       0          0         199   
DEBILT          0         0           0       0          0          80   
DUSSELDORF      0         0           0       0          0          29   
HEATHROW        0         0           0       0          0          73   
KASSEL          0         0           0       0          0           9   
LJUBLJANA       0         0           0       0          0          60   
MAASTRICHT      0         0           0       0          0           6   
MADRID          0         0           0       6          0         303   
MUNCHENB        0         0           0       0          0           8   
OSLO            0         0           

In [47]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv1d_2 (Conv1D)           (None, 14, 32)            608       
                                                                 
 dense_4 (Dense)             (None, 14, 16)            528       
                                                                 
 max_pooling1d_2 (MaxPooling  (None, 7, 16)            0         
 1D)                                                             
                                                                 
 dropout (Dropout)           (None, 7, 16)             0         
                                                                 
 flatten_2 (Flatten)         (None, 112)               0         
                                                                 
 dense_5 (Dense)             (None, 15)                1695      
                                                      