Big G Express: Predicting Derates

In this project, you will be working with fault code data and vehicle onboard diagnostic data to try and predict an upcoming full derate. These are indicated by an SPN 5246.

You have been provided with a two files containing the data you will use to make these predictions (J1939Faults.csv and VehicleDiagnosticOnboardData.csv) as well as two files describing some of the contents (DataInfo.docx and Service Fault Codes_1_0_0_167.xlsx)

Note that in its raw form the data does not have "labels", so you must define what labels you are going to use and create those labels in your dataset. Also, you will likely need to perform some significant feature engineering in order to build an accurate predictor.

There are service locations at (36.0666667, -86.4347222), (35.5883333, -86.4438888), and (36.1950, -83.174722), so you should remove any records in the vicinity of these locations, as fault codes may be tripped when working on the vehicles.

When evaluating the performance of your model, assume that the cost associated with a missed full derate is approximately $4000 in towing and repairs, and the cost of a false positive prediction is about $500 due to having the truck off the road and serviced unnecessarily

QCVehDiagOnboardData –
 
Id -  the record Id
Name – the name of the diagnostic
Value – the value for that diagnostic
FaultId – foreign key to the QCJ1939Fault record

 
These are the engine data parameters that are sent with the engine faults.
 
QCJ1939Fault –
 
ESS_Id – the event subscriber service event that contained the fault
EventTimeStamp – when the event took place
eventDescription – brief text of meaning of the code (not always present)
actionDescription – never seen this filled in
ecuSoftwareVersion – version string from the reporting vehicle computer system
ecuSerialNumber – Serial number of the reporting Engine Control Module (ECM)
ecuModel -Model of the reporting ECM
ecuMake – Manufacturer of the reporting ECM
ecuSource –
spn – Fault code being reported
fmi – Failure Mode associated with the Fault Code
active – whether the code is being set or being removed
activeTransitionCount – Number of times code has been set/unset
faultValue – never seen used
EquipmentID – Assigned truck number of the unit in question
MCTNumber – Communications Terminal assigned to the truck
Latitude – Latitude at time of event
Longitude – Longitude at time of event
LocationTimeStamp – Time latitude and longitude were obtained


In [1]:
import pandas as pd
import numpy as np

In [2]:
faults = pd.read_csv("../data/J1939Faults.csv", nrows = 100)
faults.head()

Unnamed: 0,RecordID,ESS_Id,EventTimeStamp,eventDescription,actionDescription,ecuSoftwareVersion,ecuSerialNumber,ecuModel,ecuMake,ecuSource,spn,fmi,active,activeTransitionCount,faultValue,EquipmentID,MCTNumber,Latitude,Longitude,LocationTimeStamp
0,1,990349,2015-02-21 10:47:13.000,Low (Severity Low) Engine Coolant Level,,unknown,unknown,unknown,unknown,0,111,17,True,2,,1439,105354361,38.857638,-84.626851,2015-02-21 11:34:25.000
1,2,990360,2015-02-21 11:34:34.000,,,unknown,unknown,unknown,unknown,11,629,12,True,127,,1439,105354361,38.857638,-84.626851,2015-02-21 11:35:10.000
2,3,990364,2015-02-21 11:35:31.000,Incorrect Data Steering Wheel Angle,,unknown,unknown,unknown,unknown,11,1807,2,False,127,,1369,105336226,41.42125,-87.767361,2015-02-21 11:35:26.000
3,4,990370,2015-02-21 11:35:33.000,Incorrect Data Steering Wheel Angle,,unknown,unknown,unknown,unknown,11,1807,2,True,127,,1369,105336226,41.421018,-87.767361,2015-02-21 11:36:08.000
4,5,990416,2015-02-21 11:39:41.000,,,22281684P01*22357957P01*22362082P01*,13063430,0USA13_13_0415_2238A,VOLVO,0,4364,17,False,2,,1674,105427130,38.416481,-89.442638,2015-02-21 11:39:37.000


In [3]:
diagnostics = pd.read_csv("../data/VehicleDiagnosticOnboardData.csv")

diagnostics.head(25)

Unnamed: 0,Id,Name,Value,FaultId
0,1,IgnStatus,False,1
1,2,EngineOilPressure,0,1
2,3,EngineOilTemperature,96.74375,1
3,4,TurboBoostPressure,0,1
4,5,EngineLoad,11,1
5,6,AcceleratorPedal,0,1
6,7,IntakeManifoldTemperature,78.8,1
7,8,FuelRate,0,1
8,9,FuelLtd,12300.907429328,1
9,10,EngineRpm,0,1


In [4]:
diagnostics.dtypes

Id          int64
Name       object
Value      object
FaultId     int64
dtype: object

Pivot diagnostics table to wide format so each FaultId is one row (and fill empty values with NaN)

In [5]:
diagnostics=pd.DataFrame(diagnostics)
diagnostics['Name'].unique()

array(['IgnStatus', 'EngineOilPressure', 'EngineOilTemperature',
       'TurboBoostPressure', 'EngineLoad', 'AcceleratorPedal',
       'IntakeManifoldTemperature', 'FuelRate', 'FuelLtd', 'EngineRpm',
       'LampStatus', 'BarometricPressure', 'FuelLevel', 'Speed',
       'EngineTimeLtd', 'CruiseControlSetSpeed', 'CruiseControlActive',
       'EngineCoolantTemperature', 'ParkingBrake',
       'SwitchedBatteryVoltage', 'DistanceLtd', 'Throttle',
       'FuelTemperature', 'ServiceDistance'], dtype=object)

In [6]:
diagnostics_pvt=diagnostics.pivot(index=['FaultId'], columns=['Name'],
                      values=['Value'])
diagnostics_pvt.columns = diagnostics_pvt.columns.droplevel()

In [7]:
diagnostics_pvt = diagnostics_pvt.rename_axis(None, axis=1)

In [8]:
diagnostics_pvt= diagnostics_pvt.reset_index()

In [9]:
pd.options.display.max_columns = None

In [10]:
diagnostics_pvt.head(10)

Unnamed: 0,FaultId,AcceleratorPedal,BarometricPressure,CruiseControlActive,CruiseControlSetSpeed,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLevel,FuelLtd,FuelRate,FuelTemperature,IgnStatus,IntakeManifoldTemperature,LampStatus,ParkingBrake,ServiceDistance,Speed,SwitchedBatteryVoltage,Throttle,TurboBoostPressure
0,1,0.0,14.21,False,66.48672,423178.7,100.4,11.0,0.0,96.74375,0.0,1632.2,43.2,12300.907429328,0.0,,False,78.8,1023,True,,0.0,3276.75,,0.0
1,2,,,,,,,,,,,,,,,,True,,1279,,,,,,
2,3,,,,,,,,,,,,,,,,,,1279,,,,,,
3,4,,,,,,,,,,,,,,,,True,,1279,,,,,,
4,5,,,,,,,,,,,,,,,,,,16639,,,,,,
5,6,48.0,14.4275,False,64.6226,470381.4,181.4,30.0,38.28,196.5313,1514.5,9480.0,44.0,70349.809963756,4.583399,,True,111.2,1023,,,13.6022,3276.75,,6.67
6,7,82.8,14.2825,False,64.6226,278736.7,188.6,80.0,39.44,210.0313,1711.375,6292.2,64.8,40961.065436834,14.29175,,True,78.8,1023,,,41.53478,3276.75,,20.59
7,8,,,,,,,,,,,,,,,,True,,1023,,,,,,
8,9,,,,,,,,,,,,,,,,True,,1023,,,,,,
9,10,,,,,,,,,,,,,,,,,,1023,,,,,,


In [11]:
diagnostics_pvt['Throttle'].unique

<bound method Series.unique of 0           NaN
1           NaN
2           NaN
3           NaN
4           NaN
           ... 
1187330     NaN
1187331    73.2
1187332     100
1187333     100
1187334     NaN
Name: Throttle, Length: 1187335, dtype: object>

In [12]:
print(diagnostics.shape)
print(diagnostics_pvt.shape)

(12821626, 4)
(1187335, 25)


In [13]:
diagnostics.dtypes

Id          int64
Name       object
Value      object
FaultId     int64
dtype: object

In [14]:
diagnostics_pvt.dtypes

FaultId                       int64
AcceleratorPedal             object
BarometricPressure           object
CruiseControlActive          object
CruiseControlSetSpeed        object
DistanceLtd                  object
EngineCoolantTemperature     object
EngineLoad                   object
EngineOilPressure            object
EngineOilTemperature         object
EngineRpm                    object
EngineTimeLtd                object
FuelLevel                    object
FuelLtd                      object
FuelRate                     object
FuelTemperature              object
IgnStatus                    object
IntakeManifoldTemperature    object
LampStatus                   object
ParkingBrake                 object
ServiceDistance              object
Speed                        object
SwitchedBatteryVoltage       object
Throttle                     object
TurboBoostPressure           object
dtype: object

In [15]:
diagnostics_pvt.columns

Index(['FaultId', 'AcceleratorPedal', 'BarometricPressure',
       'CruiseControlActive', 'CruiseControlSetSpeed', 'DistanceLtd',
       'EngineCoolantTemperature', 'EngineLoad', 'EngineOilPressure',
       'EngineOilTemperature', 'EngineRpm', 'EngineTimeLtd', 'FuelLevel',
       'FuelLtd', 'FuelRate', 'FuelTemperature', 'IgnStatus',
       'IntakeManifoldTemperature', 'LampStatus', 'ParkingBrake',
       'ServiceDistance', 'Speed', 'SwitchedBatteryVoltage', 'Throttle',
       'TurboBoostPressure'],
      dtype='object')

In [16]:
convert_dict = {'FaultId': object, 'AcceleratorPedal':float, 'BarometricPressure':float,
       'CruiseControlActive':bool, 'CruiseControlSetSpeed':float, 'DistanceLtd':float,
       'EngineCoolantTemperature':float, 'EngineLoad':int, 'EngineOilPressure':float,
       'EngineOilTemperature':float, 'EngineRpm':float, 'EngineTimeLtd':float, 'FuelLevel':float,
       'FuelLtd':float, 'FuelRate':float, 'FuelTemperature':float, 'IgnStatus':bool,
       'IntakeManifoldTemperature':float, 'LampStatus':object, 'ParkingBrake':bool,
       'ServiceDistance':float, 'Speed':float, 'SwitchedBatteryVoltage':float, 'Throttle':float,
       'TurboBoostPressure':float
                }
 



In [18]:
diagnostics_pvt.astype(convert_dict)

ValueError: could not convert string to float: '4,8': Error while type casting for column 'AcceleratorPedal'

In [None]:
diagnostics_pvt.dtypes