## Learn the Number Plates of a Fleet of Company Vehicles

### Kenya Number Plate Extraction from Strings

According to the Wikipedia article at: https://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Kenya, 
vehicle registeration plates in Kenya are currently on a white plate with black lettering. The format is LLL NNNL, where L, signifies a letter and N signifies a number. 

For purposes of this exercise, we will assume that we are only dealing with non-special plates (excluding special number plates for the Kenyan Government, Army, Air Force, Navy, Diplomats, Motorcycles, Tricycles and NGOs)

Answer the following questions:

### Question 1

Write a function that takes in a string (sentence) and extracts a Kenyan vehicle number plate.


In [2]:
import re
import pandas as pd
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
def num_plate(stringa):
    return re.findall(r'[A-Z]{3}\s[0-9]{3}[A-Z]', stringa)

num_plate('She has 3 cars with the following number plates: KBL 878K, KBT 667J, KML 700P')

['KBL 878K', 'KBT 667J', 'KML 700P']

### Question 2

Write a python function that takes in two Kenyan number plates and subtracts how many
cars have been bought in between the two number plates.

In [3]:
def bought_cars(stringa, stringb):
    c_1 = stringa[1]
    c_2 = stringa[2]
    c_3 = stringa[4:7]
    c_4 = stringa[7]
    d_1 = stringb[1]
    d_2 = stringb[2]
    d_3 = stringb[4:7]
    d_4 = stringb[7]
    num_one = 675324*((ord(c_1)-64)-1)+25974*((ord(c_2)-64)-1)+999*((ord(c_4)-64)-1)+ int(c_3)
    num_two = 675324*((ord(d_1)-64)-1)+25974*((ord(d_2)-64)-1)+999*((ord(d_4)-64)-1)+ int(d_3)
    return abs(num_one - num_two)
     
bought_cars('KAA 679T', 'KAA 888T')

209

### Question 3
Given the data attached (Number plates on randomly selected KBS buses) , implement a
recurrent neutral network in python to predict how many buses will be added to the fleet next
year. Explain results.

We use pandas to read the fleet.csv dataset. We concatenate the two columns (Number Plate, Fleet Number), with a singular space in between, corresponding with the official format of the Kenyan Number Plate system.

In [4]:
fleet = pd.read_csv('fleet.csv')
fleet["final_plate"] = fleet["Number Plate"] + " " + fleet["Fleet Number"]
fleet.head()

Unnamed: 0,Number Plate,Fleet Number,final_plate
0,KBW,548P,KBW 548P
1,KBU,282P,KBU 282P
2,KBQ,844U,KBQ 844U
3,KBX,535W,KBX 535W
4,KBU,838W,KBU 838W


Extract the new column (final_plate) into a list called final_plate. The list contains 60 number plates.


In [5]:
final_plate = fleet['final_plate'].tolist()
len(final_plate)

60

We will frame the problem as a random collection of 'one word' input to 'one word' output pairs. We will also define an LSTM network comprising of 32 units and one output neuron with a softmax activation function for making predictions. The model is then fit with 500 epochs and a batch size of 1. 

In [6]:
char_to_int = dict((c, i) for i, c in enumerate(final_plate))
int_to_char = dict((i, c) for i, c in enumerate(final_plate))
seq_length = 1


In [7]:
dataX = []
dataY = []
for i in range(0, len(final_plate) - seq_length, 1):
    seq_in = final_plate[i:i + seq_length]
    seq_out = final_plate[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
    print (seq_in, '->', seq_out)

['KBW 548P'] -> KBU 282P
['KBU 282P'] -> KBQ 844U
['KBQ 844U'] -> KBX 535W
['KBX 535W'] -> KBU 838W
['KBU 838W'] -> KBK 364J
['KBK 364J'] -> KCC 768D
['KCC 768D'] -> KBK 221U
['KBK 221U'] -> KAZ 644L
['KAZ 644L'] -> KBX 956R
['KBX 956R'] -> KBT 973R
['KBT 973R'] -> KBP 267A
['KBP 267A'] -> KAY 618P
['KAY 618P'] -> KAX 693P
['KAX 693P'] -> KBP 326Y
['KBP 326Y'] -> KCB 399Q
['KCB 399Q'] -> KAV 625X
['KAV 625X'] -> KBA 047M
['KBA 047M'] -> KBS 980D
['KBS 980D'] -> KBR 143P
['KBR 143P'] -> KBW 437P
['KBW 437P'] -> KBX 449E
['KBX 449E'] -> KBW 905K
['KBW 905K'] -> KCF 586U
['KCF 586U'] -> KAW 624G
['KAW 624G'] -> KBW 510N
['KBW 510N'] -> KCE 322M
['KCE 322M'] -> KBT 411F
['KBT 411F'] -> KBP 189K
['KBP 189K'] -> KBA 383A
['KBA 383A'] -> KBT 980J
['KBT 980J'] -> KBR 061B
['KBR 061B'] -> KBL 038D
['KBL 038D'] -> KBE 347V
['KBE 347V'] -> KCA 590D
['KCA 590D'] -> KAT 575Q
['KAT 575Q'] -> KBY 872L
['KBY 872L'] -> KBZ 980G
['KBZ 980G'] -> KBA 184N
['KBA 184N'] -> KBH 980C
['KBH 980C'] -> KBC 036Y


In [8]:
X = numpy.reshape(dataX, (len(dataX), 1, seq_length))
X = X / float(len(final_plate))

In [9]:
Y = np_utils.to_categorical(dataY)

After model fitting, we summarize performance results on the whole training set. We also re-run the training dataset on the network so as to get an overview of how well the network performed in generating predictions.

In [10]:
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(Y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs=500, batch_size=1, verbose=2)

Epoch 1/500
7s - loss: 4.1116 - acc: 0.0000e+00
Epoch 2/500
0s - loss: 4.0965 - acc: 0.0169
Epoch 3/500
0s - loss: 4.0924 - acc: 0.0169
Epoch 4/500
0s - loss: 4.0882 - acc: 0.0169
Epoch 5/500
0s - loss: 4.0838 - acc: 0.0000e+00
Epoch 6/500
0s - loss: 4.0787 - acc: 0.0000e+00
Epoch 7/500
0s - loss: 4.0725 - acc: 0.0169
Epoch 8/500
0s - loss: 4.0651 - acc: 0.0169
Epoch 9/500
0s - loss: 4.0567 - acc: 0.0339
Epoch 10/500
0s - loss: 4.0468 - acc: 0.0000e+00
Epoch 11/500
0s - loss: 4.0340 - acc: 0.0169
Epoch 12/500
0s - loss: 4.0200 - acc: 0.0339
Epoch 13/500
0s - loss: 4.0048 - acc: 0.0339
Epoch 14/500
0s - loss: 3.9852 - acc: 0.0339
Epoch 15/500
0s - loss: 3.9653 - acc: 0.0339
Epoch 16/500
0s - loss: 3.9432 - acc: 0.0339
Epoch 17/500
0s - loss: 3.9191 - acc: 0.0508
Epoch 18/500
0s - loss: 3.8943 - acc: 0.0339
Epoch 19/500
0s - loss: 3.8666 - acc: 0.0339
Epoch 20/500
0s - loss: 3.8395 - acc: 0.0169
Epoch 21/500
0s - loss: 3.8111 - acc: 0.0169
Epoch 22/500
0s - loss: 3.7828 - acc: 0.0169
Epo

<keras.callbacks.History at 0x2e6e1158e10>

In [11]:
scores = model.evaluate(X, Y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))

Model Accuracy: 64.41%


Model predictions on training dataset.
There are quite many repetitions, and at an accuracy of 64.41%, clearly the model struggles a little in predicting outputs.

In [13]:
for pattern in dataX:
    x = numpy.reshape(pattern, (1, 1, len(pattern)))
    x = x / float(len(final_plate))
    prediction = model.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    print (seq_in, "->", result)


['KBW 548P'] -> KBU 282P
['KBU 282P'] -> KBU 282P
['KBQ 844U'] -> KBU 282P
['KBX 535W'] -> KBU 838W
['KBU 838W'] -> KBU 838W
['KBK 364J'] -> KCC 768D
['KCC 768D'] -> KBK 221U
['KBK 221U'] -> KAZ 644L
['KAZ 644L'] -> KBX 956R
['KBX 956R'] -> KBT 973R
['KBT 973R'] -> KBP 267A
['KBP 267A'] -> KBP 267A
['KAY 618P'] -> KAX 693P
['KAX 693P'] -> KBP 326Y
['KBP 326Y'] -> KCB 399Q
['KCB 399Q'] -> KAV 625X
['KAV 625X'] -> KBA 047M
['KBA 047M'] -> KBA 047M
['KBS 980D'] -> KBR 143P
['KBR 143P'] -> KBW 437P
['KBW 437P'] -> KBW 437P
['KBX 449E'] -> KCF 586U
['KBW 905K'] -> KCF 586U
['KCF 586U'] -> KAW 624G
['KAW 624G'] -> KBW 510N
['KBW 510N'] -> KCE 322M
['KCE 322M'] -> KBT 411F
['KBT 411F'] -> KBP 189K
['KBP 189K'] -> KBP 189K
['KBA 383A'] -> KBT 980J
['KBT 980J'] -> KBT 980J
['KBR 061B'] -> KBL 038D
['KBL 038D'] -> KBE 347V
['KBE 347V'] -> KBE 347V
['KCA 590D'] -> KAT 575Q
['KAT 575Q'] -> KBY 872L
['KBY 872L'] -> KBZ 980G
['KBZ 980G'] -> KBZ 980G
['KBA 184N'] -> KBC 036Y
['KBH 980C'] -> KBV 686E


Next, we will use our model to predict a random pattern of two buses belonging to next year's fleet. Using these two buses, we can estimate the number of vehicles bought next year. 

In [15]:
for i in range(0,2):
    pattern_index = numpy.random.randint(len(dataX))
    pattern = dataX[pattern_index]
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(len(final_plate))
    prediction = model.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    print (seq_in, "->", result)

['KBL 980K'] -> KBR 826S
['KBA 047M'] -> KBA 047M


We use the function bought_cars to calculate the number of vehicles bought next year

In [16]:
bought_cars('KBR 826S', 'KBA 047M')

448331