# Vehicle Prediction Test

Given a corpus of metal-content readings and controlled testing environments, build a classification engine that can then be used to predict the type of vehicle travelling on top of that meta. Assume a single metal strip and assume all vehicles are travelling at around 15KM/hr. Assume that input files are provided one set each for a (a) bus (b) car (c) motorbike, with each category having distinct average metal readings.
 
For example, the corpus has the following readings for various time instances, where subsequent time instances are separated by 1 second. First column is time, second is Ampere rating, third column says what vehicle it is.
 
## Input data sample for one metal reading: 

* 1, 0.0, -
* 2,-0.05, -
* 3,0.1, -
* 4, 0.5, Motorbike
* 5, 0.4, Motorbike
* 6, 0.0, -
* 7, 1.5, Bus
* 8, 1.55, Bus
* 9, 1.29, Bus
* 10, 1.62, Bus
* 11, -0.09, -
* 12, -0.01, -
* 13, 5.5, Car
* 14, 5.66, Car
* 15, 4.58, Car
* 16, 0.3, -
* 17, 0.2, -
 
### Question 3.1: Build a classification engine for being able to correctly classify given a set of readings the category they belong to. Note that, as per data above, different vehicles induce readings of differing lengths on the metal.
 
### Question 3.2: Demonstrate how that classification engine can be used for prediction given unclassified data.
 
### Question 3.3: How would you change the model building approach if the problem setting is such that there are multiple metal detectors each covering a width of about 1 feet? Note that while a motorcycle is of width < 1 feet, a bus or a car have a width of at least 3 feet and so would span to cover multiple metal detectors.
 
### Question 3.4: How would your predictive model  (built in Question 2) perform when there are 1000s of vehicles moving at 60km/hr every second on a national highway? 

In [45]:
### IMPORTS

%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

Loaded the available data into a .csv file for convenience

In [46]:
## Labelled the columns with appropriate headings and converted it into a dataframe


columns = ['Time(s)', 'Ampere Rating(A)', 'Vehicle Type']
df = pd.read_csv("C:/Users/Lohith/Desktop/The Data Team DS Interview/file.csv", names=columns, usecols=range(3))

In [47]:
df

Unnamed: 0,Time(s),Ampere Rating(A),Vehicle Type
0,1,0.0,-
1,2,-0.05,-
2,3,0.1,-
3,4,0.5,Motorbike
4,5,0.4,Motorbike
5,6,0.0,-
6,7,1.5,Bus
7,8,1.55,Bus
8,9,1.29,Bus
9,10,1.62,Bus


In [48]:
inp = ['Time(s)', 'Ampere Rating(A)']

X = df[inp]
y = df['Vehicle Type']

In [49]:
### I'm using a label encoder for the target variable to keep the output in the form of numbers.

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

In [50]:
y_enc = le.fit_transform(y)

In [51]:
le.classes_

array([' -', ' Bus', ' Car', ' Motorbike'], dtype=object)

In [52]:
y_enc

array([0, 0, 0, 3, 3, 0, 1, 1, 1, 1, 0, 0, 2, 2, 2, 0, 0], dtype=int64)

In [53]:
df['V_Type'] = y_enc

In [54]:
df

Unnamed: 0,Time(s),Ampere Rating(A),Vehicle Type,V_Type
0,1,0.0,-,0
1,2,-0.05,-,0
2,3,0.1,-,0
3,4,0.5,Motorbike,3
4,5,0.4,Motorbike,3
5,6,0.0,-,0
6,7,1.5,Bus,1
7,8,1.55,Bus,1
8,9,1.29,Bus,1
9,10,1.62,Bus,1


## Vehicle Type Notation : 

* 0 = N/A
* 1 - Bus
* 2 - Car
* 3 - Motorbike

In [55]:
X =df[inp]
y_new = df['V_Type']

In [56]:
### I'm trying out with Gaussian Naive-bayes Classifier


from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X,y_new)

GaussianNB(priors=None)

In [57]:
### Using Random input values to get an idea of how the outputs might show up.

rng = np.random.RandomState(0)
X_test = [-1, 6] + [14, 18] * rng.rand(2000, 2)
y_test = model.predict(X_test)

In [58]:
y_test

array([2, 2, 2, ..., 2, 2, 2], dtype=int64)

In [59]:
## ML Models 

from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import KFold
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier as knn
from sklearn.naive_bayes import GaussianNB as GB
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

In [60]:
def Classification_model(model,x,y): 
    test_x = X_test
    test_y = y_test
    
    model.fit(X,y_new.values.ravel())
    
    pred=model.predict(test_x)
    
    accuracy=accuracy_score(test_y,pred)
    return accuracy

In [61]:
### Checking the accuracy with all the features

models=["RandomForestClassifier","Gaussian Naive Bays","KNN","Logistic_Regression","Support_Vector"]
Classification_models = [RandomForestClassifier(n_estimators=100),GB(),knn(n_neighbors=7),LogisticRegression(),SVC()]
Model_Accuracy = []
for model in Classification_models:
    Accuracy=Classification_model(model,X_test,y_test)
    Model_Accuracy.append(Accuracy*100)

In [62]:
print("Model_Accuracy:")
print (Model_Accuracy)

Model_Accuracy:
[11.4, 100.0, 0.70000000000000007, 100.0, 0.14999999999999999]


* Gaussian Naive-Bayes and Logistic regression shows 100% accuracy, which has a high probability to be false.

* Support Vector Classifier shows 14.9% accuracy, and by far that seems logically correct.

* Also, one other thing to keep in mind is the training values given is very less - Only 17 rows which may affect the accuracy.