# Workshop Notebook

This jupyter notebook will make for the interactive part of this workshop

## Step 1: Inspect the data

Usually, the first thing we want to do when dealing with any new type of data, we want to inspect it first to get some intuitions for it. By visualizing the data, we can often get some ideas as to how to tackle the data and features we can extract from it.

In [1]:
import pandas as pd

label_file = 'data/labels.csv'
df = pd.read_csv(label_file)
df


Unnamed: 0,filename,train_type_id
0,signal_000.bin,0.0
1,signal_001.bin,0.0
2,signal_002.bin,0.0
3,signal_003.bin,0.0
4,signal_004.bin,0.0
5,signal_005.bin,0.0
6,signal_006.bin,0.0
7,signal_007.bin,0.0
8,signal_008.bin,0.0
9,signal_009.bin,0.0


What has been done here, is to load a csv file containing rows of filepaths and correspendong train types. The filepaths are stored as binary blobs which can be found in data/signals. The table shown above is an excerpt of this list as it has been read into a dataframe
Let us explore a couple of the signatures we can find there. I also encourage you to look at more of them to get an even better idea of the data


In [6]:
import matplotlib.pyplot as plt
from helpers import load_binary
from helpers import plot_size
%matplotlib inline

type_a = df.loc[df['train_type_id'] == 0]
type_b = df.loc[df['train_type_id'] == 1]
type_c = df.loc[df['train_type_id'] == 2]
type_d = df.loc[df['train_type_id'] == 3]

file_a = 'data/signals/' + type_a['filename'].iloc[0]
file_b = 'data/signals/' + type_b['filename'].iloc[0]
file_c = 'data/signals/' + type_c['filename'].iloc[0]
file_d = 'data/signals/' + type_d['filename'].iloc[0]

signal_a = load_binary(file_a)
signal_b = load_binary(file_b)
signal_c = load_binary(file_c)
signal_d = load_binary(file_d)

plot_size(16, 8)
plt.subplot(411)
plt.title('Train A')
plt.plot(signal_a)
plt.subplot(412)
plt.title('Train B')
plt.plot(signal_b)
plt.subplot(413)
plt.title('Train C')
plt.plot(signal_c)
plt.subplot(414)
plt.title('Train D')
plt.plot(signal_d)
plt.tight_layout()
plt.show()


'signal_000.bin'

In [None]:
from keras.layers import Dense, Conv1D, MaxPool1D, Flatten
from keras.models import Sequential
from scipy.signal import periodogram

import numpy as np
import matplotlib.pyplot as plt
from helpers import load_binary
from keras.utils import to_categorical

%matplotlib inline

# model = Sequential()
# model.add(Dense(2, input_dim=1))
# model.add(Dense(1))
# model.compile(optimizer='adam', loss='mse')

with open('data/labels.csv', 'r') as fp:
    labels = [line.replace('\n', '').split(',') for line in fp.readlines()[1:]]

np.random.seed(1337)
np.random.shuffle(labels)

x_train, y_train = [], []
for label in labels:
    filepath = 'data/signals/%s' % label[0]
    label_id = 4 if label[1] == '' else int(label[1])
    label_id = to_categorical(label_id, 5)
    data = load_binary(filepath)
    _, data = periodogram(data, fs=2000)
    x_train.append(data)
    y_train.append(label_id)
x_train = np.array(x_train).reshape((len(x_train), -1, 1))
y_train = np.array(y_train)

model = Sequential()
model.add(Conv1D(32, 9, padding='valid', input_shape=(x_train.shape[1], x_train.shape[2])))
model.add(MaxPool1D(2))
model.add(Conv1D(64, 9))
model.add(MaxPool1D(2))
model.add(Conv1D(128, 9))
model.add(Flatten())
model.add(Dense(5, activation='softmax'))
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

logger = model.fit(x_train, y_train, epochs=200, batch_size=32, validation_split=0.25)
model.evaluate(x_train[-25:], y_train[-25:])

plt.plot(logger.history['accuracy'])
plt.plot(logger.history['val_accuracy'])
plt.tight_layout()
plt.show()

# data = np.array(x[1337]).reshape((1, 1))
# p = model.predict(data)

# print(p)

# plt.plot(x)
# plt.tight_layout()
# plt.show()


## Step 2: Do stuff
