# Predicting 'GPS Speed' of Truck 2

#### Imports and global variables are defined here.

In [1]:
# Imports required for this notebook.
import csv
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy
import scipy.stats as stats
import seaborn as sns

# Local path to the CSV file containing the data for truck one (1).
truckOnePath = "../data/trucks/truck1.csv"

# Local path to the CSV file containing the data for truck two (2).
truckTwoPath = "../data/trucks/truck2.csv"

#### Functions are defined here.

In [2]:
"""Uses Pandas's read_csv method to read a CSV file and returns a DataFrame of it to the notebook.
This function reads in all rows and defines the header row at index 0 by default."""
def readCsv(truck, records = None, headerIdx = 0):
    return pd.read_csv(truck, nrows = records, header = [headerIdx])

"""Takes a data frame of a truck (with inaccurate data) and the mean difference of a 
different truck (with accurate data) in order to attempt to predict or correct the GPS Speed value."""
def correctGps(truckDf, meanDiff):
    for i in truckDf:
        truckDf['CorrectedGpsSpeed'] = truckDf['WheelBasedVehicleSpeed'] + meanDiff
        
    return truckDf

#### Predicting GPS Speed for Truck 2

Based on our previous statistics, it is clear that the GPS Speed for Truck 2 is incorrect. Perhaps the component was broken, or maybe it was set to a different unit. Either way, it is not consistent with the Wheel-Based Vehicle Speed (which appears to be working correctly) and it is not consistent with the trends found in Truck 1, where both components were measuring relatively similar speeds at any given time.

In an attempt to rectify this, we will be using data from Truck 1 and analytically predict the GPS Speed values for Truck 2 given its Wheel-Based Vehicle Speed values.

In [3]:
# Read in the data for Truck 1.
truckOneDf = readCsv(truckOnePath)
truckOneDf = truckOneDf.ffill().bfill()

# Read in the data for Truck 2.
truckTwoDf = readCsv(truckTwoPath)
truckTwoDf = truckTwoDf.ffill().bfill()

To decide how to correct Truck 2's GPS Speed, we need to find the difference between the two components for Truck 1.

In [4]:
# Subtract the values of Wheel-Based Vehicle Speed from GPS Speed for Truck 1.
truckOneDiff = truckOneDf['GPS speed'].sub(truckOneDf['WheelBasedVehicleSpeed'])

# Calculate the average difference between all of the points of difference.
truckOneDiffMean = truckOneDiff.mean()

# Print the average difference of both components for Truck 1.
print(truckOneDiffMean)

-0.28539807628549124


Notice that the value is a negative number. This indicates that the GPS Speed component is, on average, reading a slightly higher value than the Wheel-Based Vehicle Speed component. Therefore, to begin predicting, or "correcting", the GPS Speed for Truck 2, we can start by simply adding that average to any given Wheel-Based Vehicle Speed value for Truck 2.

In [8]:
correctedTruckTwoDf = correctGps(truckTwoDf, truckOneDiffMean)
correctedTruckTwoDf[['GPS speed', 'WheelBasedVehicleSpeed', 'CorrectedGpsSpeed']].head(20)

Unnamed: 0,GPS speed,WheelBasedVehicleSpeed,CorrectedGpsSpeed
0,29.323334,106.886719,106.601321
1,29.323334,107.101562,106.816164
2,29.323334,107.089844,106.804446
3,29.323334,106.921875,106.636477
4,29.323334,107.109375,106.823977
5,29.323334,107.238281,106.952883
6,29.323334,107.300781,107.015383
7,29.323334,107.171875,106.886477
8,29.323334,107.316406,107.031008
9,29.323334,107.316406,107.031008


To see how well this worked, let's take a look at the first few records for Truck 1.

In [9]:
truckOneDf[['GPS speed', 'WheelBasedVehicleSpeed']].head(20)

Unnamed: 0,GPS speed,WheelBasedVehicleSpeed
0,6.1116,7.96875
1,6.1116,8.042969
2,6.1116,8.042969
3,6.1116,8.105469
4,6.1116,8.105469
5,6.1116,8.125
6,6.1116,8.125
7,6.1116,8.128906
8,6.1116,8.128906
9,6.1116,8.148438


This doesn't look too bad. We can perform a test to see if there is a significant difference between the two mean differences between the components for both trucks.

#### Verification Through Hypothesis Testing