# IBM Advance Data Science Specialization
## Capstone Project
### Determining The Accuracy Of Machine Learning Model;
### Trained To Predict The Rate Of Mortality Per 100,000 Population On Roads, Globally. 
In this project we will train a machine learning model to predict the rate of mortality per 100,000 population on roads, globally. We will accomplish that by feeding the data for the Road Safety Previous Decade of Action and a projected data for the next Road Safety Decade of Action. Then we will train the model and evaluate its accuracy. Finally, we will predict the rate of mortality for the year 2030.


### Note: This Notebook works best on a Python Spark Machine !

## Importing The Data

1. We check that tensorflow and Spark is already installed on the machine.

In [None]:
pip show tensorflow

In [None]:
pip show pyspark

2. We import all the dependencies.

In [None]:
import numpy as np
import requests
import io
from numpy import concatenate
from matplotlib import pyplot
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
import sklearn
import pandas
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.layers import LSTM
from tensorflow.keras.callbacks import Callback
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Activation
import tensorflow as tf
import pickle
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import sys
from queue import Queue
import pandas as pd
import json
from tensorflow.keras.optimizers import SGD
import tensorflow
%matplotlib inline

3. We get the data required for training the model. In this case, the rate of mortality per 100,000 of the population for the previous and next decade.

In [None]:
url = "https://raw.githubusercontent.com/IjA1020/IBM-Project/main/previous.csv"
download = requests.get(url).content
DataFrame1 = pd.read_csv(io.StringIO(download.decode('utf-8')))
dataset1 = DataFrame1.values
previous = dataset1
X1 = dataset1[:,0].astype(float)
Y1 = dataset1[:,1]
print (DataFrame1)

In [None]:
url = "https://raw.githubusercontent.com/IjA1020/IBM-Project/main/next.csv"
download = requests.get(url).content
DataFrame2 = pd.read_csv(io.StringIO(download.decode('utf-8')))
dataset2 = DataFrame2.values
next = dataset2
X2 = dataset2[:,0].astype(float)
Y2 = dataset2[:,1]
print (DataFrame2)

## Plotting The Data

1. Now we plot the data.

In [None]:
fig = plt.figure()
ax = fig.subplots()

ax.plot(X1, Y1, lw=2)
ax.set_xlabel("YEAR")
ax.set_ylabel("CASUALTIES")
ax.set_title("Previous Decade")

In [None]:
fig = plt.figure()
ax = fig.subplots()

ax.plot(X2, Y2, lw=2)
ax.set_xlabel("YEAR")
ax.set_ylabel("CASUALTIES")
ax.set_title("previous Decade")

2. Now we convert the data into frequencies through Fast Fourier Transform (FFT) function.

In [None]:
previous_fft = np.fft.fft(previous).real
next_fft = np.fft.fft(next).real

3. We plot the frequencies and observe the trends.

In [None]:
fig, ax = plt.subplots(num=None, figsize=(13, 5), dpi=60, facecolor='w', edgecolor='r')
size = len(previous_fft)
ax.plot(range(0,size), previous_fft[:,0].real, '-', color='green', animated = True, linewidth=3)
ax.plot(range(0,size), previous_fft[:,1].real, '-', color='brown', animated = True, linewidth=3)

In [None]:
fig, ax = plt.subplots(num=None, figsize=(13, 5), dpi=60, facecolor='w', edgecolor='r')
size = len(next_fft)
ax.plot(range(0,size), next_fft[:,0].real, '-', color='green', animated = True, linewidth=3)
ax.plot(range(0,size), next_fft[:,1].real, '-', color='brown', animated = True, linewidth=3)

## Converting The Data Into Machine Readable Algorithms

1. In order to train the model, we need to convert the data into machine learning algorithms, in other words 0's and 1's.

In [None]:
def scaleData(data):
    scaler = MinMaxScaler(feature_range=(0, 1))
    return scaler.fit_transform(data)


In [None]:
previous_scaled = scaleData(previous_fft)
next_scaled = scaleData(next_fft)

In [None]:
previous_scaled = previous_scaled.T
next_scaled = next_scaled.T

2. Now we reshape the data into 2 rows and 10 columns. Each row represeting the frequency spectrums for each column of our initially uploaded data file.

In [None]:
previous_scaled.reshape(2, 10)
next_scaled.reshape(2, 10)


3. Now we prepare the training data by concatenating a label “0” for the previous decade and a label “1” for the next decade and than we combine the two data sets together

In [None]:
label_previous = np.repeat(1,2)
label_previous.shape = (2,1)
label_next = np.repeat(0,2)
label_next.shape = (2,1)

train_previous = np.hstack((previous_scaled,label_previous))
train_next = np.hstack((next_scaled,label_next))
train_both = np.vstack((train_previous,train_next))

4. Just to check that the data has been successfully converted, we plot it.

In [None]:
pd.DataFrame(train_previous)

In [None]:
pd.DataFrame(train_next)

In [None]:
pd.DataFrame(train_both)

## Creating The Model

Now that we have our data in machine readable algorithms, we can easily create our model.

1. Firstly, we need to define our parameters. We can do that through the array slicing syntax.

In [None]:
rate = train_both[:,0:-1]
labels = train_both[:,10]

2. Define the loss model.

In [None]:
class LossHistory(Callback):
    def on_train_begin(self, logs={}):
        self.losses = []

    def on_batch_end(self, batch, logs={}):
        sys.stdout.write(str(logs.get('loss'))+str(', '))
        sys.stdout.flush()
        self.losses.append(logs.get('loss'))

        
learning_rate = LossHistory()

3. Now we can create our model.

In [None]:

model = Sequential()
model.add(Dense(500, input_shape=(10, ), activation='relu'))
model.add(Dense(1000, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

def train(data,label):
    model.fit(rate, labels, epochs=50, batch_size=20, verbose=0, shuffle=True, validation_data=(data, label), callbacks=[learning_rate])


def score(data):
    return model.predict(data)


## Training The Model

1. Now we can train our model.

In [None]:
train(rate, labels)

2. See the model summary report.

In [None]:
print(model.summary())

3. Test our predictions.

In [None]:
score(previous_scaled)

In [None]:
score(next_scaled)

4. Evaluate our model and print accuracy.

In [None]:
accuracy = model.evaluate(features, labels)
print(accuracy)

5. Save our model.

In [None]:
model.save('road_accidents.h5')

5. Plot the losses.

In [None]:
fig, ax = plt.subplots(num=None, figsize=(13, 5), dpi=60, facecolor='w', edgecolor='k')
size = len(learning_rate.losses)
ax.plot(range(0,size), learning_rate.losses, '-', color='blue', animated = True, linewidth=1)