In [1]:
import pandas as pd
import numpy as np

pd.set_option("display.max_rows", 500)
pd.set_option("display.max_columns", 500)

Read in the J1939 Faults (Cummins' Connected Diagnostics file)

In [2]:
j1939faults = pd.read_csv('data/J1939Faults.csv', low_memory = False)
j1939faults.head()


Unnamed: 0,RecordID,ESS_Id,EventTimeStamp,eventDescription,actionDescription,ecuSoftwareVersion,ecuSerialNumber,ecuModel,ecuMake,ecuSource,spn,fmi,active,activeTransitionCount,faultValue,EquipmentID,MCTNumber,Latitude,Longitude,LocationTimeStamp
0,1,990349,2015-02-21 10:47:13.000,Low (Severity Low) Engine Coolant Level,,unknown,unknown,unknown,unknown,0,111,17,True,2,,1439,105354361,38.857638,-84.626851,2015-02-21 11:34:25.000
1,2,990360,2015-02-21 11:34:34.000,,,unknown,unknown,unknown,unknown,11,629,12,True,127,,1439,105354361,38.857638,-84.626851,2015-02-21 11:35:10.000
2,3,990364,2015-02-21 11:35:31.000,Incorrect Data Steering Wheel Angle,,unknown,unknown,unknown,unknown,11,1807,2,False,127,,1369,105336226,41.42125,-87.767361,2015-02-21 11:35:26.000
3,4,990370,2015-02-21 11:35:33.000,Incorrect Data Steering Wheel Angle,,unknown,unknown,unknown,unknown,11,1807,2,True,127,,1369,105336226,41.421018,-87.767361,2015-02-21 11:36:08.000
4,5,990416,2015-02-21 11:39:41.000,,,22281684P01*22357957P01*22362082P01*,13063430,0USA13_13_0415_2238A,VOLVO,0,4364,17,False,2,,1674,105427130,38.416481,-89.442638,2015-02-21 11:39:37.000


In [3]:
j1939faults.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1187335 entries, 0 to 1187334
Data columns (total 20 columns):
 #   Column                 Non-Null Count    Dtype  
---  ------                 --------------    -----  
 0   RecordID               1187335 non-null  int64  
 1   ESS_Id                 1187335 non-null  int64  
 2   EventTimeStamp         1187335 non-null  object 
 3   eventDescription       1126490 non-null  object 
 4   actionDescription      0 non-null        float64
 5   ecuSoftwareVersion     891285 non-null   object 
 6   ecuSerialNumber        844318 non-null   object 
 7   ecuModel               1122577 non-null  object 
 8   ecuMake                1122577 non-null  object 
 9   ecuSource              1187335 non-null  int64  
 10  spn                    1187335 non-null  int64  
 11  fmi                    1187335 non-null  int64  
 12  active                 1187335 non-null  bool   
 13  activeTransitionCount  1187335 non-null  int64  
 14  faultValue        

In [4]:
j1939faults.ecuSerialNumber.nunique()

1989

In [5]:
j1939faults.ecuSource.unique()

array([ 0, 11, 49, 61,  3], dtype=int64)

In [6]:
j1939faults.ecuMake.unique()

array(['unknown', 'VOLVO', 'CMMNS', '?????', 'PCAR', nan, '?CAR', '?MMNS',
       '???R', '?????MX', '??MNS', 'BNDWS', 'PACCR', '?ACCR', '????S',
       '?NDWS', '????R', 'EATON', '?????MX16U13D13', '?ATON', '??DWS',
       '???CR', '5516014'], dtype=object)

In [7]:
#drop these columns because they are all Nan. (did not put 'axis=1')
j1939faults = j1939faults.drop(columns=['actionDescription', 'faultValue'])

In [8]:
#convert to lowercase 
j1939faults.columns=j1939faults.columns.str.lower()

In [9]:
j1939faults.head()

Unnamed: 0,recordid,ess_id,eventtimestamp,eventdescription,ecusoftwareversion,ecuserialnumber,ecumodel,ecumake,ecusource,spn,fmi,active,activetransitioncount,equipmentid,mctnumber,latitude,longitude,locationtimestamp
0,1,990349,2015-02-21 10:47:13.000,Low (Severity Low) Engine Coolant Level,unknown,unknown,unknown,unknown,0,111,17,True,2,1439,105354361,38.857638,-84.626851,2015-02-21 11:34:25.000
1,2,990360,2015-02-21 11:34:34.000,,unknown,unknown,unknown,unknown,11,629,12,True,127,1439,105354361,38.857638,-84.626851,2015-02-21 11:35:10.000
2,3,990364,2015-02-21 11:35:31.000,Incorrect Data Steering Wheel Angle,unknown,unknown,unknown,unknown,11,1807,2,False,127,1369,105336226,41.42125,-87.767361,2015-02-21 11:35:26.000
3,4,990370,2015-02-21 11:35:33.000,Incorrect Data Steering Wheel Angle,unknown,unknown,unknown,unknown,11,1807,2,True,127,1369,105336226,41.421018,-87.767361,2015-02-21 11:36:08.000
4,5,990416,2015-02-21 11:39:41.000,,22281684P01*22357957P01*22362082P01*,13063430,0USA13_13_0415_2238A,VOLVO,0,4364,17,False,2,1674,105427130,38.416481,-89.442638,2015-02-21 11:39:37.000


In [10]:
#convert dates/times to datetime format
j1939faults[['eventtimestamp', 'locationtimestamp']]=j1939faults[["eventtimestamp", "locationtimestamp"]].apply(pd.to_datetime, format='%Y-%m-%d %H:%M:%S.%f')

In [11]:
#convert all columns except "active", which is boolean, to string
j1939faults[['recordid', 'ess_id', 'eventdescription', 'ecusoftwareversion', 'ecuserialnumber', 'ecumodel', 'ecusource', 'spn', 
            'fmi', 'activetransitioncount', 'equipmentid', 'mctnumber']] = j1939faults[['recordid', 'ess_id', 'eventdescription', 'ecusoftwareversion', 'ecuserialnumber', 'ecumodel', 'ecusource', 'spn', 
            'fmi', 'activetransitioncount', 'equipmentid', 'mctnumber']].astype("str")

In [12]:
j1939faults.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1187335 entries, 0 to 1187334
Data columns (total 18 columns):
 #   Column                 Non-Null Count    Dtype         
---  ------                 --------------    -----         
 0   recordid               1187335 non-null  object        
 1   ess_id                 1187335 non-null  object        
 2   eventtimestamp         1187335 non-null  datetime64[ns]
 3   eventdescription       1187335 non-null  object        
 4   ecusoftwareversion     1187335 non-null  object        
 5   ecuserialnumber        1187335 non-null  object        
 6   ecumodel               1187335 non-null  object        
 7   ecumake                1122577 non-null  object        
 8   ecusource              1187335 non-null  object        
 9   spn                    1187335 non-null  object        
 10  fmi                    1187335 non-null  object        
 11  active                 1187335 non-null  bool          
 12  activetransitioncount  11873

In [13]:
#finally, rename columns that need a _
j1939faults = j1939faults.rename(columns = {'recordid':'record_id', 'eventtimestamp':'event_timestamp', 'eventdescription':'event_descr', 'ecusoftwareversion':'ecu_software', 'ecuserialnumber': 'ecu_serial', 
                                            'ecumodel':'ecu_model', 'ecumake':'ecu_make', 'ecusource':'ecu_source', 'activetransitioncount':'active_trans_count', 'equipmentid':'equipment_id', 'mctnumber':'mct_number',
                                           'locationtimestamp':'location_timestamp'})

In [14]:
j1939faults.tail(3)

Unnamed: 0,record_id,ess_id,event_timestamp,event_descr,ecu_software,ecu_serial,ecu_model,ecu_make,ecu_source,spn,fmi,active,active_trans_count,equipment_id,mct_number,latitude,longitude,location_timestamp
1187332,1248456,123905996,2020-03-06 14:13:38,Abnormal Rate of Change Aftertreatment 1 Intak...,05317106*05100987*050719120655*09401585*G1*BDR*,79880653.0,6X1u13D1500000000,CMMNS,0,3216,10,True,1,1850,105336308,34.43037,-84.920509,2020-03-06 14:14:14
1187333,1248457,123906113,2020-03-06 14:14:13,Low (Severity Medium) Engine Coolant Level,04384413*22544852*090619141107*60701756*G1*BGT*,,,,0,111,18,True,8,2377,108605700,35.030925,-85.321527,2020-03-06 14:14:49
1187334,1248458,123906131,2020-03-06 14:15:34,Low (Severity Medium) Engine Coolant Level,04384413*22544852*090619141107*60701756*G1*BGT*,,,,0,111,18,False,8,2377,108605700,35.027314,-85.323472,2020-03-06 14:15:30


Read in the Vehicle Diagnostics Onboard Data file

In [15]:
veh_diag = pd.read_csv('data/VehicleDiagnosticOnboardData.csv')
veh_diag.head()

Unnamed: 0,Id,Name,Value,FaultId
0,1,IgnStatus,False,1
1,2,EngineOilPressure,0,1
2,3,EngineOilTemperature,96.74375,1
3,4,TurboBoostPressure,0,1
4,5,EngineLoad,11,1


In [16]:
veh_diag.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12821626 entries, 0 to 12821625
Data columns (total 4 columns):
 #   Column   Dtype 
---  ------   ----- 
 0   Id       int64 
 1   Name     object
 2   Value    object
 3   FaultId  int64 
dtypes: int64(2), object(2)
memory usage: 391.3+ MB


In [17]:
veh_diag.Name.value_counts()

LampStatus                   1187335
IgnStatus                     608454
EngineRpm                     586921
IntakeManifoldTemperature     586291
EngineOilPressure             586244
EngineCoolantTemperature      586071
BarometricPressure            585976
DistanceLtd                   585819
EngineLoad                    585621
FuelRate                      585237
FuelLtd                       585195
Speed                         583916
EngineOilTemperature          583912
TurboBoostPressure            583351
EngineTimeLtd                 581366
CruiseControlSetSpeed         576458
CruiseControlActive           574916
AcceleratorPedal              531889
FuelLevel                     502795
Throttle                      420503
ParkingBrake                  399972
FuelTemperature               299110
SwitchedBatteryVoltage        114059
ServiceDistance                  215
Name: Name, dtype: int64

In [18]:
diagnostics = veh_diag.pivot(index = 'FaultId', columns='Name', values='Value').reset_index()

In [19]:
diagnostics.tail()

Name,FaultId,AcceleratorPedal,BarometricPressure,CruiseControlActive,CruiseControlSetSpeed,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLevel,FuelLtd,FuelRate,FuelTemperature,IgnStatus,IntakeManifoldTemperature,LampStatus,ParkingBrake,ServiceDistance,Speed,SwitchedBatteryVoltage,Throttle,TurboBoostPressure
1187330,1248454,,,,,,,,,,,,,,,,,,1023,,,,,,
1187331,1248455,100.0,14.5,True,64.6226,423937.9,185.0,51.0,37.12,211.4937,1310.25,10722.7,96.4,58979.184415546,7.647805,32.0,True,98.6,18431,False,,65.01096,,73.2,7.83
1187332,1248456,0.0,14.355,True,66.48672,465925.4,186.8,62.0,41.18,212.8438,1340.75,9326.75,100.0,65080.10587046,8.995086,,True,91.4,17407,,,66.5741,,100.0,6.96
1187333,1248457,1.6,14.4275,False,67.72946,28606.65625,181.4,0.0,27.26,221.7312,863.25,586.75,23.6,4042.49282573,0.0,,True,100.4,1023,False,,11.84489,14.1,100.0,1.74
1187334,1248458,,,,,,,,,,,,,,,,,,1023,,,,,,


In [106]:
#this erred; no numeric data to plot.  
#diagnostics.DistanceLtd.plot(kind='hist')

In [20]:
#diagnostics = diagnostics.drop(columns=['Name'])

In [21]:
# change column name FaultID to record_id to merge with the j1939 dataset
diagnostics = diagnostics.rename(columns={'FaultId':'record_id'})

In [22]:
diagnostics.head(3)

Name,record_id,AcceleratorPedal,BarometricPressure,CruiseControlActive,CruiseControlSetSpeed,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLevel,FuelLtd,FuelRate,FuelTemperature,IgnStatus,IntakeManifoldTemperature,LampStatus,ParkingBrake,ServiceDistance,Speed,SwitchedBatteryVoltage,Throttle,TurboBoostPressure
0,1,0.0,14.21,False,66.48672,423178.7,100.4,11.0,0.0,96.74375,0.0,1632.2,43.2,12300.907429328,0.0,,False,78.8,1023,True,,0.0,3276.75,,0.0
1,2,,,,,,,,,,,,,,,,True,,1279,,,,,,
2,3,,,,,,,,,,,,,,,,,,1279,,,,,,


In [23]:
diagnostics.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1187335 entries, 0 to 1187334
Data columns (total 25 columns):
 #   Column                     Non-Null Count    Dtype 
---  ------                     --------------    ----- 
 0   record_id                  1187335 non-null  int64 
 1   AcceleratorPedal           531889 non-null   object
 2   BarometricPressure         585976 non-null   object
 3   CruiseControlActive        574916 non-null   object
 4   CruiseControlSetSpeed      576458 non-null   object
 5   DistanceLtd                585819 non-null   object
 6   EngineCoolantTemperature   586071 non-null   object
 7   EngineLoad                 585621 non-null   object
 8   EngineOilPressure          586244 non-null   object
 9   EngineOilTemperature       583912 non-null   object
 10  EngineRpm                  586921 non-null   object
 11  EngineTimeLtd              581366 non-null   object
 12  FuelLevel                  502795 non-null   object
 13  FuelLtd                    

In [24]:
#convert record_id to string in order to merge
diagnostics[['record_id']] = diagnostics[['record_id']].astype("str")

## Join the j1939faults with the diagnostics dataset

In [25]:
join_j1939_diag = j1939faults.merge(diagnostics, on = 'record_id', suffixes = ('_j1939', '_diag'))

In [26]:
join_j1939_diag.shape

(1187335, 42)

In [27]:
join_j1939_diag.head()

Unnamed: 0,record_id,ess_id,event_timestamp,event_descr,ecu_software,ecu_serial,ecu_model,ecu_make,ecu_source,spn,fmi,active,active_trans_count,equipment_id,mct_number,latitude,longitude,location_timestamp,AcceleratorPedal,BarometricPressure,CruiseControlActive,CruiseControlSetSpeed,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLevel,FuelLtd,FuelRate,FuelTemperature,IgnStatus,IntakeManifoldTemperature,LampStatus,ParkingBrake,ServiceDistance,Speed,SwitchedBatteryVoltage,Throttle,TurboBoostPressure
0,1,990349,2015-02-21 10:47:13,Low (Severity Low) Engine Coolant Level,unknown,unknown,unknown,unknown,0,111,17,True,2,1439,105354361,38.857638,-84.626851,2015-02-21 11:34:25,0.0,14.21,False,66.48672,423178.7,100.4,11.0,0.0,96.74375,0.0,1632.2,43.2,12300.907429328,0.0,,False,78.8,1023,True,,0.0,3276.75,,0.0
1,2,990360,2015-02-21 11:34:34,,unknown,unknown,unknown,unknown,11,629,12,True,127,1439,105354361,38.857638,-84.626851,2015-02-21 11:35:10,,,,,,,,,,,,,,,,True,,1279,,,,,,
2,3,990364,2015-02-21 11:35:31,Incorrect Data Steering Wheel Angle,unknown,unknown,unknown,unknown,11,1807,2,False,127,1369,105336226,41.42125,-87.767361,2015-02-21 11:35:26,,,,,,,,,,,,,,,,,,1279,,,,,,
3,4,990370,2015-02-21 11:35:33,Incorrect Data Steering Wheel Angle,unknown,unknown,unknown,unknown,11,1807,2,True,127,1369,105336226,41.421018,-87.767361,2015-02-21 11:36:08,,,,,,,,,,,,,,,,True,,1279,,,,,,
4,5,990416,2015-02-21 11:39:41,,22281684P01*22357957P01*22362082P01*,13063430,0USA13_13_0415_2238A,VOLVO,0,4364,17,False,2,1674,105427130,38.416481,-89.442638,2015-02-21 11:39:37,,,,,,,,,,,,,,,,,,16639,,,,,,


In [28]:
join_j1939_diag.IgnStatus.value_counts()

True     605818
False      2636
Name: IgnStatus, dtype: int64

Before more cleaning:  

Remove faults occurring in the vicinity of the service locations at (36.0666667, -86.4347222), (35.5883333, -86.4438888), and (36.1950, -83.174722)

Remove faults where the EquipmentID has more than 5 characters.

Remove faults where 'active' is set to False 

In [29]:
join_j1939_diag.columns

Index(['record_id', 'ess_id', 'event_timestamp', 'event_descr', 'ecu_software',
       'ecu_serial', 'ecu_model', 'ecu_make', 'ecu_source', 'spn', 'fmi',
       'active', 'active_trans_count', 'equipment_id', 'mct_number',
       'latitude', 'longitude', 'location_timestamp', 'AcceleratorPedal',
       'BarometricPressure', 'CruiseControlActive', 'CruiseControlSetSpeed',
       'DistanceLtd', 'EngineCoolantTemperature', 'EngineLoad',
       'EngineOilPressure', 'EngineOilTemperature', 'EngineRpm',
       'EngineTimeLtd', 'FuelLevel', 'FuelLtd', 'FuelRate', 'FuelTemperature',
       'IgnStatus', 'IntakeManifoldTemperature', 'LampStatus', 'ParkingBrake',
       'ServiceDistance', 'Speed', 'SwitchedBatteryVoltage', 'Throttle',
       'TurboBoostPressure'],
      dtype='object')

In [30]:
join_j1939_diag.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1187335 entries, 0 to 1187334
Data columns (total 42 columns):
 #   Column                     Non-Null Count    Dtype         
---  ------                     --------------    -----         
 0   record_id                  1187335 non-null  object        
 1   ess_id                     1187335 non-null  object        
 2   event_timestamp            1187335 non-null  datetime64[ns]
 3   event_descr                1187335 non-null  object        
 4   ecu_software               1187335 non-null  object        
 5   ecu_serial                 1187335 non-null  object        
 6   ecu_model                  1187335 non-null  object        
 7   ecu_make                   1122577 non-null  object        
 8   ecu_source                 1187335 non-null  object        
 9   spn                        1187335 non-null  object        
 10  fmi                        1187335 non-null  object        
 11  active                     1187335 no

In [31]:
#delete all rows where lat/long is the service area.  To solve issue of "within vicinity", 
#shorten each lat/long decimal points to broaden the coordinate zones 

#.01 +/- swing and truncate on each lat long. **NOTE**: I increased the spread for the last lat (18-21 vs 19-20)
service_ctrs = join_j1939_diag[ 
    ((join_j1939_diag['latitude'].between(36.05, 36.07, inclusive=True)) & (join_j1939_diag['longitude'].between(-86.44, -86.42, inclusive=True))) |
    ((join_j1939_diag['latitude'].between(35.57, 35.59, inclusive=True)) & (join_j1939_diag['longitude'].between(-86.45, -86.43, inclusive=True))) |
    ((join_j1939_diag['latitude'].between(36.18, 36.21, inclusive=True)) & (join_j1939_diag['longitude'].between(-83.18, -83.16, inclusive=True)))
    ].index

In [32]:
type(service_ctrs)
#test

pandas.core.indexes.numeric.Int64Index

In [33]:
join_j1939_diag = join_j1939_diag.drop(service_ctrs, axis=0)


In [34]:
#type(join_j1939_diag)

In [35]:
join_j1939_diag.shape
# 1187260 rows when using full original lat/long coordinates
# using swing version of +/- (.0001) changed row count to 1,175,137 
# changing swing version to +/- .001 changed row count to 1,083,798
# changing swing version to +/- .01 and truncating lat/lon to .xx spaces, row count at 1,057,486


(1057486, 42)

In [107]:
join_j1939_diag.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 547766 entries, 0 to 1187333
Data columns (total 26 columns):
 #   Column                     Non-Null Count   Dtype         
---  ------                     --------------   -----         
 0   record_id                  547766 non-null  object        
 1   event_timestamp            547766 non-null  datetime64[ns]
 2   event_descr                547766 non-null  object        
 3   spn                        547766 non-null  object        
 4   fmi                        547766 non-null  object        
 5   active_trans_count         547766 non-null  object        
 6   equipment_id               547766 non-null  object        
 7   latitude                   547766 non-null  float64       
 8   longitude                  547766 non-null  float64       
 9   location_timestamp         547766 non-null  datetime64[ns]
 10  BarometricPressure         530056 non-null  object        
 11  DistanceLtd                529914 non-null  object 

In [37]:
#remove rows where equipment_id has 5 or more characters
join_j1939_diag = join_j1939_diag[join_j1939_diag['equipment_id'].map(len) < 6]

In [38]:
join_j1939_diag.shape

(1055687, 42)

In [39]:
#Remove rows where 'active' = False
join_j1939_diag = join_j1939_diag[join_j1939_diag['active'] == True]


In [40]:
join_j1939_diag.shape

(547766, 42)

In [116]:
join_j1939_diag[join_j1939_diag['equipment_id']=='1490']

Unnamed: 0,record_id,event_timestamp,event_descr,spn,fmi,active_trans_count,equipment_id,latitude,longitude,location_timestamp,BarometricPressure,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLtd,FuelRate,FuelTemperature,IntakeManifoldTemperature,LampStatus,Speed,Throttle,TurboBoostPressure
2865,2866,2015-02-23 14:43:04,,4334,18,1,1490,40.391388,-86.792777,2015-02-23 14:43:40,14.5725,451075.3,181.4,67.0,35.96,212.9563,1424.125,8608.8,65063.859289262,11.37264,,95.0,17407,66.97216,,17.11
4361,4362,2015-02-24 14:52:39,,4334,18,1,1490,37.590833,-85.869675,2015-02-24 14:36:00,14.4275,451614.3,183.2,60.0,35.96,211.55,1425.0,8618.8,65158.432883878,11.34622,,93.2,17407,67.07896,,15.66
39378,40490,2015-05-01 15:18:01,,4334,18,1,1490,38.50949,-78.784166,2015-05-01 15:18:37,14.065,477667.6,183.2,48.0,34.8,215.9937,1294.875,9108.25,68937.282001712,7.053416,,113.0,17407,60.72932,,13.63
54742,56812,2015-05-17 16:53:22,,4334,18,1,1490,38.295509,-85.628611,2015-05-17 16:53:58,14.4275,484629.3,181.4,0.0,37.12,214.3062,1499.25,9234.95,69973.89313376,0.0,,102.2,17407,70.50621,,1.16
58293,60363,2015-05-20 15:00:15,,4334,18,1,1490,34.054351,-84.593842,2015-05-20 15:00:51,14.065,485953.0,199.4,77.0,33.64,227.8625,1309.5,9258.1,70167.927505954,12.38971,,141.8,17407,61.65167,,20.88
59889,61959,2015-05-21 17:05:51,,4334,18,1,1490,38.412129,-82.59912,2015-05-21 17:06:27,14.4275,486413.2,185.0,33.0,35.38,210.5375,1244.625,9267.6,70243.612798852,4.239975,,89.6,17407,32.03945,,1.74
73184,75570,2015-06-03 18:21:44,Not Reporting Data Engine Injector Cylinder #02,652,7,1,1490,33.596111,-86.382037,2015-06-03 18:22:21,14.2825,490079.7,183.2,0.0,37.7,213.575,1524.875,9339.3,70756.50283781,0.0,,107.6,17407,71.85575,,1.45
85454,87840,2015-06-15 13:30:38,Not Reporting Data Engine Injector Cylinder #02,652,7,1,1490,36.437824,-86.705231,2015-06-15 13:31:14,14.2825,493865.5,199.4,0.0,35.96,227.1313,1491.375,9408.45,71290.262468876,0.0,,127.4,17407,70.26349,,1.74
86964,89350,2015-06-16 17:00:49,Not Reporting Data Engine Injector Cylinder #01,651,7,1,1490,35.036851,-90.754861,2015-06-16 17:01:25,14.5,494597.8,197.6,0.0,35.96,221.7312,1425.625,9421.9,71400.686386612,0.0,,127.4,17407,67.04984,,1.74
87047,89433,2015-06-16 18:12:34,,4334,18,1,1490,34.78824,-92.098564,2015-06-16 18:13:10,14.5725,494678.0,186.8,57.0,35.96,215.825,1423.5,9423.1,71412.3099569,8.929043,,141.8,17407,67.011,,10.44


In [41]:
join_j1939_diag.tail(3)

Unnamed: 0,record_id,ess_id,event_timestamp,event_descr,ecu_software,ecu_serial,ecu_model,ecu_make,ecu_source,spn,fmi,active,active_trans_count,equipment_id,mct_number,latitude,longitude,location_timestamp,AcceleratorPedal,BarometricPressure,CruiseControlActive,CruiseControlSetSpeed,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLevel,FuelLtd,FuelRate,FuelTemperature,IgnStatus,IntakeManifoldTemperature,LampStatus,ParkingBrake,ServiceDistance,Speed,SwitchedBatteryVoltage,Throttle,TurboBoostPressure
1187331,1248455,123905139,2020-03-06 14:04:23,Condition Exists Engine Protection Torque Derate,04358814*06099720*030816202706*09400153*G1*BDR*,79932020.0,6X1u13D1500000000,CMMNS,0,1569,31,True,5,1994,105354084,34.39074,-79.461805,2020-03-06 14:04:59,100.0,14.5,True,64.6226,423937.9,185.0,51,37.12,211.4937,1310.25,10722.7,96.4,58979.184415546,7.647805,32.0,True,98.6,18431,False,,65.01096,,73.2,7.83
1187332,1248456,123905996,2020-03-06 14:13:38,Abnormal Rate of Change Aftertreatment 1 Intak...,05317106*05100987*050719120655*09401585*G1*BDR*,79880653.0,6X1u13D1500000000,CMMNS,0,3216,10,True,1,1850,105336308,34.43037,-84.920509,2020-03-06 14:14:14,0.0,14.355,True,66.48672,465925.4,186.8,62,41.18,212.8438,1340.75,9326.75,100.0,65080.10587046,8.995086,,True,91.4,17407,,,66.5741,,100.0,6.96
1187333,1248457,123906113,2020-03-06 14:14:13,Low (Severity Medium) Engine Coolant Level,04384413*22544852*090619141107*60701756*G1*BGT*,,,,0,111,18,True,8,2377,108605700,35.030925,-85.321527,2020-03-06 14:14:49,1.6,14.4275,False,67.72946,28606.65625,181.4,0,27.26,221.7312,863.25,586.75,23.6,4042.49282573,0.0,,True,100.4,1023,False,,11.84489,14.1,100.0,1.74


In [44]:
join_j1939_diag = join_j1939_diag.drop(columns=['ess_id', 'ecu_software','ecu_serial', 'ecu_model','ecu_make','ecu_source','active', 
                           'mct_number', 'AcceleratorPedal', 'ServiceDistance', 'CruiseControlActive', 'CruiseControlSetSpeed', 'IgnStatus', 
                            'SwitchedBatteryVoltage','FuelLevel','ParkingBrake'])

In [104]:
join_j1939_diag.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 547766 entries, 0 to 1187333
Data columns (total 26 columns):
 #   Column                     Non-Null Count   Dtype         
---  ------                     --------------   -----         
 0   record_id                  547766 non-null  object        
 1   event_timestamp            547766 non-null  datetime64[ns]
 2   event_descr                547766 non-null  object        
 3   spn                        547766 non-null  object        
 4   fmi                        547766 non-null  object        
 5   active_trans_count         547766 non-null  object        
 6   equipment_id               547766 non-null  object        
 7   latitude                   547766 non-null  float64       
 8   longitude                  547766 non-null  float64       
 9   location_timestamp         547766 non-null  datetime64[ns]
 10  BarometricPressure         530056 non-null  object        
 11  DistanceLtd                529914 non-null  object 

## 1569 events

In [45]:
derate1569 = join_j1939_diag[join_j1939_diag['spn']=='1569']
derate1569.head(4)

Unnamed: 0,record_id,event_timestamp,event_descr,spn,fmi,active_trans_count,equipment_id,latitude,longitude,location_timestamp,BarometricPressure,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLtd,FuelRate,FuelTemperature,IntakeManifoldTemperature,LampStatus,Speed,Throttle,TurboBoostPressure
40,41,2015-02-21 12:06:22,Condition Exists Engine Protection Torque Derate,1569,31,5,1721,39.051805,-84.560509,2015-02-21 12:06:57,14.21,121095.5,174.2,0,35.96,220.4375,1048.125,2319.65,15620.097176682,0.0,32,51.8,18431,51.97187,0,0.58
290,291,2015-02-21 15:35:45,Condition Exists Engine Protection Torque Derate,1569,31,6,1721,37.735185,-85.808101,2015-02-21 15:36:21,14.2825,121233.4,181.4,77,39.44,221.0,1561.125,2322.35,15638.32504827,13.24827,32,91.4,18431,40.73865,0,15.66
340,341,2015-02-21 16:22:24,Condition Exists Engine Protection Torque Derate,1569,31,7,1721,37.166666,-85.964027,2015-02-21 16:23:00,14.2825,121274.7,181.4,0,37.7,213.575,1122.375,2323.15,15643.476403284,0.02641729,32,60.8,18431,55.39912,0,0.87
378,379,2015-02-21 17:08:02,Condition Exists Engine Protection Torque Derate,1569,31,10,1721,36.770324,-86.48287,2015-02-21 17:08:37,14.355,121319.2,177.8,55,38.28,218.75,1274.125,2323.9,15649.420274454,6.960955,32,53.6,18431,62.89442,0,2.32


In [46]:
#Note for spn1569, all event descriptions the same. 'Condition exists engine protection torque derate'
derate1569.event_descr.unique()

array(['Condition Exists Engine Protection Torque Derate'], dtype=object)

In [47]:
derate1569.fmi.unique()

array(['31'], dtype=object)

Note for spn1569, 
All event descriptions the same. 'Condition exists engine protection torque derate'
All fmi = 31


## 5246 events
View 5246 codes (derates) for any trends 

In [48]:
derate5246 = join_j1939_diag[join_j1939_diag.spn == '5246']
derate5246.head(7)

Unnamed: 0,record_id,event_timestamp,event_descr,spn,fmi,active_trans_count,equipment_id,latitude,longitude,location_timestamp,BarometricPressure,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLtd,FuelRate,FuelTemperature,IntakeManifoldTemperature,LampStatus,Speed,Throttle,TurboBoostPressure
2089,2090,2015-02-23 05:05:44,,5246,0,1,1630,40.733009,-74.087777,2015-02-23 05:08:23,,,,,,,,4645.45,33470.466902374,,,,22527,,,
2971,2972,2015-02-23 15:54:22,,5246,0,1,1487,28.077361,-81.897083,2015-02-23 15:54:58,,,,,,,,,,,,,22527,,,
5713,5714,2015-02-25 13:53:08,,5246,0,1,1329,39.399583,-82.974768,2015-02-25 13:56:31,,,,,,,,,,,,,22527,,,
6534,6535,2015-02-26 22:24:29,,5246,0,1,1419,37.596805,-85.865555,2015-02-26 22:25:05,14.5,441699.6,185.0,10.0,20.3,198.1625,648.125,9087.95,69605.769379298,0.6340149,,140.0,22527,0.0,,0.58
6628,6629,2015-02-27 09:09:56,,5246,0,1,1486,40.534259,-76.431805,2015-02-27 09:10:33,,,,,,,,,,,,,22527,,,
6665,6666,2015-02-27 12:45:34,,5246,0,1,1486,41.225879,-77.074907,2015-02-27 12:46:11,,,,,,,,,,,,,22527,,,
6684,6685,2015-02-27 16:52:12,,5246,0,1,1486,41.033333,-77.515648,2015-02-27 16:52:49,14.5,413001.4,165.2,12.0,25.52,170.4312,649.375,9368.75,63017.054230366,0.7661014,,86.0,22527,0.0,,0.58


In [49]:
derate5246.shape 


(496, 26)

In [50]:
#derate5246.info()

Make sure Big G Express shops are not still showing up in report (36.0666667, -86.4347222), (35.5883333, -86.4438888), and (36.1950, -83.174722)

In [52]:
#derate5246.latitude.unique()

In [53]:
derate5246.event_descr.unique()

array(['nan'], dtype=object)

Run value counts on columns to see if anything interesting.  

In [54]:
derate5246.groupby('equipment_id')['active_trans_count'].value_counts()

equipment_id  active_trans_count
1329          1                      1
1339          1                      1
1366          1                      4
1373          1                      1
1375          1                      1
1378          1                      1
1383          1                      4
1384          1                      1
1389          1                      1
1391          1                      1
1395          1                      1
1396          1                      1
1399          1                      4
1401          1                      2
1403          1                      1
1407          1                      2
1417          1                      1
1418          1                      1
1419          1                      2
1431          1                      1
1437          1                      2
1440          1                      2
1443          1                      1
1444          1                      1
1452          1                

In [55]:
derate5246_a = derate5246[derate5246['equipment_id'].isin(['1746', '1748', '1749', '302', '305'])]

In [57]:
derate5246_a.head(5)

Unnamed: 0,record_id,event_timestamp,event_descr,spn,fmi,active_trans_count,equipment_id,latitude,longitude,location_timestamp,BarometricPressure,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLtd,FuelRate,FuelTemperature,IntakeManifoldTemperature,LampStatus,Speed,Throttle,TurboBoostPressure
306901,311407,2015-12-14 18:25:46,,5246,19,3,305,35.196203,-85.814722,2015-12-14 18:26:23,13.775,191889.2,197.6,0,130.5,222.4063,1626.5,4195.9,29591.23240478,0.0,104.0,113.0,255,46.49119,0.0,28.71
327279,332336,2016-01-03 17:49:35,,5246,19,4,305,34.793564,-84.823333,2016-01-03 17:50:10,14.355,194822.0,185.0,6,145.0,201.2,1323.25,4265.4,30089.328808826,1.452951,98.6,131.0,255,0.0,13.2,1.74
359437,366938,2016-02-01 06:15:40,,5246,19,1,305,34.354722,-82.941342,2016-02-01 06:16:16,14.2825,201406.2,186.8,24,93.38,208.0063,727.75,4423.25,31265.686956382,2.443599,116.6,120.2,255,0.0,6.8,1.16
820964,842328,2017-08-03 13:13:15,,5246,19,1,1749,36.172453,-86.77162,2017-08-03 13:13:50,14.5725,373084.1,186.8,33,29.0,202.775,709.75,7421.55,51085.987673838,2.53606,154.4,163.4,255,0.0,8.400001,1.45
823867,845231,2017-08-08 10:23:27,,5246,19,16,1749,36.141944,-86.719861,2017-08-08 10:24:03,14.4275,373089.4,75.2,0,1.16,75.70625,0.0,7422.75,51087.836878202,0.0,75.2,77.0,255,0.0,,0.0


## 1569 and 5246 

In [58]:
derate1569_5246 = join_j1939_diag[
    (join_j1939_diag['spn']=='1569') |
    (join_j1939_diag['spn']=='5246')
]
     
derate1569_5246.head(5)

Unnamed: 0,record_id,event_timestamp,event_descr,spn,fmi,active_trans_count,equipment_id,latitude,longitude,location_timestamp,BarometricPressure,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLtd,FuelRate,FuelTemperature,IntakeManifoldTemperature,LampStatus,Speed,Throttle,TurboBoostPressure
40,41,2015-02-21 12:06:22,Condition Exists Engine Protection Torque Derate,1569,31,5,1721,39.051805,-84.560509,2015-02-21 12:06:57,14.21,121095.5,174.2,0.0,35.96,220.4375,1048.125,2319.65,15620.097176682,0.0,32.0,51.8,18431,51.97187,0.0,0.58
290,291,2015-02-21 15:35:45,Condition Exists Engine Protection Torque Derate,1569,31,6,1721,37.735185,-85.808101,2015-02-21 15:36:21,14.2825,121233.4,181.4,77.0,39.44,221.0,1561.125,2322.35,15638.32504827,13.24827,32.0,91.4,18431,40.73865,0.0,15.66
340,341,2015-02-21 16:22:24,Condition Exists Engine Protection Torque Derate,1569,31,7,1721,37.166666,-85.964027,2015-02-21 16:23:00,14.2825,121274.7,181.4,0.0,37.7,213.575,1122.375,2323.15,15643.476403284,0.02641729,32.0,60.8,18431,55.39912,0.0,0.87
378,379,2015-02-21 17:08:02,Condition Exists Engine Protection Torque Derate,1569,31,10,1721,36.770324,-86.48287,2015-02-21 17:08:37,14.355,121319.2,177.8,55.0,38.28,218.75,1274.125,2323.9,15649.420274454,6.960955,32.0,53.6,18431,62.89442,0.0,2.32
1580,1581,2015-02-22 11:14:23,Condition Exists Engine Protection Torque Derate,1569,31,1,1515,30.376435,-83.299444,2015-02-22 11:14:58,,,,,,,,,,,,,18431,,,


In [61]:
derate1569_5246.spn.value_counts()

1569    5052
5246     496
Name: spn, dtype: int64

In [62]:
derate1569_5246.equipment_id.sort_values(ascending=False)

48291      310
115565     310
129562     310
853540     310
401648     310
          ... 
83425     1339
82873     1339
5714      1329
5713      1329
9897      1328
Name: equipment_id, Length: 5548, dtype: object

In [76]:
derate1569_5246.groupby('equipment_id')['active_trans_count'].value_counts()

equipment_id  active_trans_count
1328          1                      1
1329          1                      2
1339          1                      2
1340          1                     19
1341          1                      1
                                    ..
308           2                      1
309           1                      1
              2                      1
310           126                    5
              1                      1
Name: active_trans_count, Length: 2199, dtype: int64

In [63]:
derate1569_5246.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5548 entries, 40 to 1187331
Data columns (total 26 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   record_id                  5548 non-null   object        
 1   event_timestamp            5548 non-null   datetime64[ns]
 2   event_descr                5548 non-null   object        
 3   spn                        5548 non-null   object        
 4   fmi                        5548 non-null   object        
 5   active_trans_count         5548 non-null   object        
 6   equipment_id               5548 non-null   object        
 7   latitude                   5548 non-null   float64       
 8   longitude                  5548 non-null   float64       
 9   location_timestamp         5548 non-null   datetime64[ns]
 10  BarometricPressure         5296 non-null   object        
 11  DistanceLtd                5281 non-null   object        
 12  En

In [64]:
EI1490 = derate1569_5246[derate1569_5246['equipment_id']=='1490']

In [66]:
EI1490

Unnamed: 0,record_id,event_timestamp,event_descr,spn,fmi,active_trans_count,equipment_id,latitude,longitude,location_timestamp,BarometricPressure,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLtd,FuelRate,FuelTemperature,IntakeManifoldTemperature,LampStatus,Speed,Throttle,TurboBoostPressure
104110,106496,2015-07-01 07:01:31,Condition Exists Engine Protection Torque Derate,1569,31,2,1490,33.923564,-84.48574,2015-07-01 07:02:08,14.21,500492.6,194.0,93.0,34.22,227.075,1380.0,9531.35,72216.185511136,15.69187,,107.6,18431,65.07893,,11.02
111877,114263,2015-07-08 11:09:22,Condition Exists Engine Protection Torque Derate,1569,31,2,1490,38.561203,-81.282962,2015-07-08 11:09:57,14.2825,502249.4,186.8,97.0,34.22,215.9937,1288.25,9565.1,72479.432960954,19.58842,,123.8,18431,61.02059,,27.84
111961,114347,2015-07-08 12:11:10,Condition Exists Engine Protection Torque Derate,1569,31,2,1490,38.943333,-80.540648,2015-07-08 12:11:45,13.9925,502313.1,206.6,98.0,34.8,223.9812,1532.25,9566.15,72490.396101112,18.98082,,131.0,18431,53.82628,,26.1
112031,114417,2015-07-08 13:09:55,Condition Exists Engine Protection Torque Derate,1569,31,2,1490,39.590787,-79.933611,2015-07-08 13:10:30,14.1375,502375.4,203.0,0.0,35.96,222.9125,1451.875,9567.1,72500.434639088,0.0,,116.6,18431,68.27316,,31.03
112108,114494,2015-07-08 13:41:58,Condition Exists Engine Protection Torque Derate,1569,31,2,1490,39.681296,-79.366851,2015-07-08 13:50:30,13.5575,502404.4,210.2,100.0,33.06,228.425,1294.25,9567.65,72507.30311244,20.35452,,141.8,18431,33.56376,,27.84
132826,135212,2015-07-24 09:49:38,Condition Exists Engine Protection Torque Derate,1569,31,2,1490,36.243564,-80.348425,2015-07-24 09:50:15,14.1375,508714.5,179.6,0.0,38.28,218.2437,1618.0,9685.0,73425.829337244,0.0,,102.2,18431,75.37038,,2.32
133513,135899,2015-07-24 16:39:41,,5246,0,2,1490,35.297731,-82.405462,2015-07-24 16:40:17,,,,,,,,,,,,,22527,,,
258501,262321,2015-10-31 17:30:25,Condition Exists Engine Protection Torque Derate,1569,31,1,1490,35.233379,-85.824305,2015-10-31 17:31:01,13.7025,543068.6,204.8,0.0,24.94,224.8813,899.375,10396.55,78331.240170832,0.0,,91.4,18431,12.77695,,0.87
270200,274020,2015-11-11 11:11:14,Condition Exists Engine Protection Torque Derate,1569,31,1,1490,36.901851,-86.591574,2015-11-11 11:11:51,14.355,546180.3,183.2,37.0,35.38,212.8438,1371.5,10465.6,78777.690938712,5.626883,,95.0,18431,64.60319,,4.06
271531,275351,2015-11-12 09:28:47,Condition Exists Engine Protection Torque Derate,1569,31,1,1490,36.107777,-86.927314,2015-11-12 09:29:22,14.355,546694.8,183.2,0.0,35.96,211.55,1453.25,10475.8,78842.1489194,1.915253,,73.4,18431,68.29258,,1.16


In [77]:
EI1340 = derate1569_5246[derate1569_5246['equipment_id']=='1340']
EI1340

Unnamed: 0,record_id,event_timestamp,event_descr,spn,fmi,active_trans_count,equipment_id,latitude,longitude,location_timestamp,BarometricPressure,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLtd,FuelRate,FuelTemperature,IntakeManifoldTemperature,LampStatus,Speed,Throttle,TurboBoostPressure
8014,8015,2015-03-09 11:04:29,Condition Exists Engine Protection Torque Derate,1569,31,1,1340,39.970925,-77.574027,2015-03-09 11:05:05,14.4275,487651.2,192.2,52,35.96,215.2625,1370.0,9390.6,74220.987213764,8.030855,,89.6,2047,65.39931,,12.18
9556,9557,2015-03-17 19:04:39,Condition Exists Engine Protection Torque Derate,1569,31,1,1340,36.182962,-86.319537,2015-03-17 19:05:15,14.4275,490835.6,190.4,98,35.96,213.575,1344.25,9448.75,74718.291101654,16.74856,,78.8,18431,64.11774,,
9697,9698,2015-03-18 12:55:11,Condition Exists Engine Protection Torque Derate,1569,31,1,1340,34.927361,-91.122222,2015-03-18 12:55:46,14.645,491164.3,190.4,52,35.96,213.7437,1357.5,9455.05,74769.012135638,8.955461,,64.4,18431,64.95271,,
9725,9726,2015-03-18 14:40:17,Condition Exists Engine Protection Torque Derate,1569,31,1,1340,34.715648,-92.192129,2015-03-18 14:40:52,14.5725,491231.6,188.6,67,35.38,211.55,1251.875,9456.3,74779.446931692,11.42548,,62.6,18431,60.03028,,
9746,9747,2015-03-18 16:21:40,Condition Exists Engine Protection Torque Derate,1569,31,1,1340,33.725648,-93.54574,2015-03-18 16:22:15,14.5,491338.3,192.2,37,36.54,214.5313,1366.625,9458.0,74794.504738656,5.666508,,66.2,18431,65.28281,,
9764,9765,2015-03-18 16:41:03,Condition Exists Engine Protection Torque Derate,1569,31,1,1340,33.51375,-93.949305,2015-03-18 17:02:28,14.5725,491357.5,188.6,34,37.7,214.5313,1528.0,9458.35,74797.410631228,6.062768,,64.4,18431,53.17578,,
9822,9823,2015-03-19 03:00:15,Condition Exists Engine Protection Torque Derate,1569,31,1,1340,33.481388,-93.982916,2015-03-19 03:00:51,14.5725,491370.1,145.4,65,41.18,142.475,1268.75,9458.7,74799.524007644,11.12168,,68.0,18431,60.54486,,
9859,9860,2015-03-19 06:22:19,Condition Exists Engine Protection Torque Derate,1569,31,1,1340,32.951759,-96.918888,2015-03-19 06:22:55,14.4275,491563.8,190.4,69,36.54,205.9812,1387.125,9462.0,74827.39415913,12.25762,,87.8,18431,12.19441,,
24440,25323,2015-04-17 08:58:40,Condition Exists Engine Protection Torque Derate,1569,31,1,1340,31.472268,-85.678425,2015-04-17 08:59:16,14.5725,501482.9,190.4,99,33.64,215.2625,1218.375,9645.65,76405.029653674,18.84873,,93.2,18431,58.07879,,
30214,31326,2015-04-22 18:26:10,Condition Exists Engine Protection Torque Derate,1569,31,1,1340,36.280138,-84.208472,2015-04-22 18:26:46,13.9925,502929.9,190.4,93,35.38,212.8438,1295.125,9673.5,76657.710221412,15.96925,,87.8,18431,45.10767,,


In [83]:
EI1692 = derate1569_5246[derate1569_5246['equipment_id']=='1692']
EI1692

Unnamed: 0,record_id,event_timestamp,event_descr,spn,fmi,active_trans_count,equipment_id,latitude,longitude,location_timestamp,BarometricPressure,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLtd,FuelRate,FuelTemperature,IntakeManifoldTemperature,LampStatus,Speed,Throttle,TurboBoostPressure
41610,42722,2015-05-04 16:17:00,Condition Exists Engine Protection Torque Derate,1569,31,37,1692,34.949351,-78.904351,2015-05-04 16:17:36,14.79,210495.1,181.4,27.0,40.02,226.9063,1361.375,3706.95,25362.762454442,4.213557,32.0,105.8,2047,67.09838,0.0,4.93
167069,169455,2015-08-17 08:24:10,Condition Exists Engine Protection Torque Derate,1569,31,1,1692,40.231944,-77.006759,2015-08-17 08:24:06,14.5725,250401.8,185.0,8.0,42.34,203.7875,1455.125,4470.35,30381.899356416,0.9378137,32.0,120.2,18431,37.91335,0.0,4.64
167538,169924,2015-08-17 10:21:07,Condition Exists Engine Protection Torque Derate,1569,31,2,1692,39.412685,-76.942824,2015-08-17 10:21:03,14.4275,250478.3,172.4,28.0,35.96,200.9187,919.5,4472.1,30390.220776054,2.192635,32.0,145.4,18431,2.805879,0.0,0.87
167544,169930,2015-08-17 10:22:57,Condition Exists Engine Protection Torque Derate,1569,31,3,1692,39.412777,-76.94287,2015-08-17 10:22:53,14.4275,250478.4,185.0,15.0,23.78,201.0313,614.375,4472.1,30390.220776054,0.6075976,32.0,140.0,18431,0.0,0.0,0.87
168110,170496,2015-08-17 16:14:21,Condition Exists Engine Protection Torque Derate,1569,31,4,1692,39.396666,-77.329629,2015-08-17 16:14:17,14.5,250508.5,179.6,79.0,38.28,225.6125,1355.125,4472.85,30395.107959016,14.08041,32.0,132.8,18431,66.64206,0.0,20.88
168635,171021,2015-08-17 20:37:14,,5246,0,1,1692,37.830694,-79.377453,2015-08-18 06:05:45,,,,,,,,4477.05,30422.846024476,,,,22527,,,
170940,173326,2015-08-19 12:26:58,Condition Exists Engine Protection Torque Derate,1569,31,1,1692,37.93074,-79.232824,2015-08-19 12:26:54,,,,,,,,,,,,,2047,,,
171007,173393,2015-08-19 13:04:56,Condition Exists Engine Protection Torque Derate,1569,31,2,1692,37.930277,-79.233055,2015-08-19 13:04:53,,,,,,,,,,,,,2047,,,
171389,173775,2015-08-19 17:46:24,Condition Exists Engine Protection Torque Derate,1569,31,5,1692,36.10949,-83.333333,2015-08-19 17:46:21,,,,,,,,,,,,,2047,,,
171401,173787,2015-08-19 18:04:35,Condition Exists Engine Protection Torque Derate,1569,31,6,1692,36.109537,-83.333657,2015-08-19 18:04:31,,,,,,,,,,,,,2047,,,


In [68]:
derate1569_5246.groupby('spn')['equipment_id'].value_counts().sort_values(ascending=False)

spn   equipment_id
1569  1490            190
      1692            139
      1505            133
      1445             99
      1444             95
                     ... 
      1900              1
      1906              1
      1914              1
      1921              1
5246  306               1
Name: equipment_id, Length: 682, dtype: int64

In [69]:
#derate1569_5246[derate1569_5246['equipment_id']=='305']
#No 1569 entries for 305. FMI 19 means data error, FMI 14 means Special Instruction

Unnamed: 0,record_id,event_timestamp,event_descr,spn,fmi,active_trans_count,equipment_id,latitude,longitude,location_timestamp,BarometricPressure,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,EngineTimeLtd,FuelLtd,FuelRate,FuelTemperature,IntakeManifoldTemperature,LampStatus,Speed,Throttle,TurboBoostPressure
306901,311407,2015-12-14 18:25:46,,5246,19,3,305,35.196203,-85.814722,2015-12-14 18:26:23,13.775,191889.2,197.6,0,130.5,222.4063,1626.5,4195.9,29591.23240478,0.0,104.0,113.0,255,46.49119,0.0,28.71
327279,332336,2016-01-03 17:49:35,,5246,19,4,305,34.793564,-84.823333,2016-01-03 17:50:10,14.355,194822.0,185.0,6,145.0,201.2,1323.25,4265.4,30089.328808826,1.452951,98.6,131.0,255,0.0,13.2,1.74
359437,366938,2016-02-01 06:15:40,,5246,19,1,305,34.354722,-82.941342,2016-02-01 06:16:16,14.2825,201406.2,186.8,24,93.38,208.0063,727.75,4423.25,31265.686956382,2.443599,116.6,120.2,255,0.0,6.8,1.16
993600,1029758,2018-06-25 14:48:16,,5246,19,2,305,37.644166,-85.348148,2018-06-25 14:48:54,14.355,272189.4,188.6,15,34.8,191.6375,655.75,10131.0,43387.353648428,0.9378137,120.2,156.2,255,0.0,0.0,0.29
1021723,1060718,2018-09-26 12:37:10,,5246,19,24,305,38.192083,-85.707407,2018-09-26 12:38:01,14.5,279009.9,71.6,0,1.16,71.76875,0.0,10761.2,44599.771281082,0.0,71.6,73.4,255,0.0,0.0,0.0
1034642,1075522,2018-11-05 10:33:01,,5246,14,1,305,38.192453,-85.707824,2018-11-05 10:33:38,14.5,279339.8,86.0,0,1.74,80.375,0.0,10793.65,44660.398767016,0.0,84.2,87.8,18431,0.0,0.0,0.0
1034729,1075609,2018-11-05 14:11:23,,5246,19,26,305,38.192546,-85.707824,2018-11-05 14:12:51,14.4275,279339.8,77.0,19,83.52,76.26875,859.75,10793.7,44660.398767016,1.902045,69.8,68.0,255,0.0,10.0,1.45
1036165,1077045,2018-11-08 21:55:05,,5246,19,30,305,38.192314,-85.707592,2018-11-08 21:55:41,14.5,279339.8,66.2,0,1.74,67.1,0.0,10793.8,44660.530853042,0.0,66.2,69.8,255,0.0,0.0,0.0
1036168,1077048,2018-11-08 21:55:05,,5246,14,2,305,38.192314,-85.707592,2018-11-08 21:55:42,14.5,279339.8,66.2,0,1.74,67.1,0.0,10793.8,44660.530853042,0.0,66.2,69.8,22527,0.0,0.0,0.0
1036596,1077476,2018-11-08 22:13:34,,5246,19,36,305,38.192314,-85.707685,2018-11-08 23:45:38,14.5,279339.8,89.6,18,68.43999,83.46875,652.5,10793.85,44660.662939068,2.034131,66.2,73.4,255,0.0,0.0,0.29


In [70]:
derate1569_5246.equipment_id.value_counts()

1490    201
1692    142
1505    133
1758     99
1445     99
       ... 
1588      1
304       1
2005      1
1900      1
1660      1
Name: equipment_id, Length: 520, dtype: int64

In [None]:
#Try 1340, 1535, 1549 next

Read in the Service Fault Codes file

In [71]:
svc_fault_code = pd.read_csv('data/Service Fault Codes_1_0_0_167.csv')
svc_fault_code.head()

Unnamed: 0,Published in CES 14602,Cummins Fault Code,Revision,PID,SID,MID,J1587 FMI,SPN,J1939 FMI,J2012 Pcode,Lamp Color,Lamp Device,Cummins Description,Algorithm Description
0,Y,111,167,Not Mapped,254,0,12,629,12,P0606,Red,Stop / Shutdown,Engine Control Module Critical Internal Failur...,Error internal to the ECM related to memory ha...
1,Y,112,167,Not Mapped,20,128,7,635,7,Not Mapped,Red,Stop / Shutdown,Engine Timing Actuator Driver Circuit - Mechan...,Mechanical failure in the engine timing actuat...
2,Y,113,167,Not Mapped,20,128,3,635,3,Not Mapped,Amber,Warning,Engine Timing Actuator Driver Circuit - Voltag...,High signal voltage detected at the engine tim...
3,Y,114,167,Not Mapped,20,128,4,635,4,Not Mapped,Amber,Warning,Engine Timing Actuator Driver Circuit - Voltag...,Low voltage detected at the engine timing actu...
4,Y,115,167,190,Not Mapped,Not Mapped,2,612,2,P0008,Red,Stop / Shutdown,Engine Magnetic Speed/Position Lost Both of Tw...,The ECM has detected that the primary and back...


In [72]:
svc_fault_code.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7124 entries, 0 to 7123
Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Published in CES 14602  7124 non-null   object
 1   Cummins Fault Code      7124 non-null   int64 
 2   Revision                7124 non-null   int64 
 3   PID                     7124 non-null   object
 4   SID                     7124 non-null   object
 5   MID                     7124 non-null   object
 6   J1587 FMI               7124 non-null   int64 
 7   SPN                     7124 non-null   int64 
 8   J1939 FMI               7124 non-null   int64 
 9   J2012 Pcode             7124 non-null   object
 10  Lamp Color              7124 non-null   object
 11  Lamp Device             7124 non-null   object
 12  Cummins Description     7124 non-null   object
 13  Algorithm Description   2005 non-null   object
dtypes: int64(5), object(9)
memory usage: 779.3+ KB


In [73]:
svc_fault_code[(svc_fault_code['J1939 FMI']==31) & (svc_fault_code['SPN'] == 1569)]

Unnamed: 0,Published in CES 14602,Cummins Fault Code,Revision,PID,SID,MID,J1587 FMI,SPN,J1939 FMI,J2012 Pcode,Lamp Color,Lamp Device,Cummins Description,Algorithm Description
2520,Y,3714,167,Not Mapped,Not Mapped,Not Mapped,11,1569,31,Not Mapped,Amber,Warning,Engine Protection Torque Derate - Condition Ex...,


In [74]:
svc_fault_code[svc_fault_code['SPN'] == 1569]['Cummins Description'][5095]

'Engine Protection Torque Derate - Special Instructions'

# Creating a function to fill in data using forward fill and backward fill

In [85]:
#Test out on a dataset with only 1 equipment, original joined dataset filtered to active=yes, equipment<6, lat/long svc ctrs dropped
EI1692_2 = join_j1939_diag[join_j1939_diag['equipment_id'] == '1692']

In [88]:
EI1692_2.shape

(10002, 26)

In [89]:
#dataset prior to conducting a fillna
EI1692_2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10002 entries, 6857 to 1046644
Data columns (total 26 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   record_id                  10002 non-null  object        
 1   event_timestamp            10002 non-null  datetime64[ns]
 2   event_descr                10002 non-null  object        
 3   spn                        10002 non-null  object        
 4   fmi                        10002 non-null  object        
 5   active_trans_count         10002 non-null  object        
 6   equipment_id               10002 non-null  object        
 7   latitude                   10002 non-null  float64       
 8   longitude                  10002 non-null  float64       
 9   location_timestamp         10002 non-null  datetime64[ns]
 10  BarometricPressure         9849 non-null   object        
 11  DistanceLtd                9849 non-null   object        
 12 

In [94]:
# code to do a forward fill and backward fill on Nans
EI1692_2_simple_fill = EI1692_2.fillna(method='ffill').fillna(method='bfill').info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10002 entries, 6857 to 1046644
Data columns (total 26 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   record_id                  10002 non-null  object        
 1   event_timestamp            10002 non-null  datetime64[ns]
 2   event_descr                10002 non-null  object        
 3   spn                        10002 non-null  object        
 4   fmi                        10002 non-null  object        
 5   active_trans_count         10002 non-null  object        
 6   equipment_id               10002 non-null  object        
 7   latitude                   10002 non-null  float64       
 8   longitude                  10002 non-null  float64       
 9   location_timestamp         10002 non-null  datetime64[ns]
 10  BarometricPressure         10002 non-null  object        
 11  DistanceLtd                10002 non-null  object        
 12 

In [105]:
# if we want to do fillna on certain columns, can use .loc and list the columns
#df.loc['Throttle', 'FuelTemperature'].fillna(method='ffill').fillna(method='bfill')

In [98]:
#create function. Sort by time stamp, then fillna 
def input_fill(df):
    return df.sort_values(by='event_timestamp').fillna(method='ffill').fillna(method='bfill')

In [99]:
#the function worked on the single equipment id 1692
input_fill(EI1692).info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 142 entries, 41610 to 946737
Data columns (total 26 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   record_id                  142 non-null    object        
 1   event_timestamp            142 non-null    datetime64[ns]
 2   event_descr                142 non-null    object        
 3   spn                        142 non-null    object        
 4   fmi                        142 non-null    object        
 5   active_trans_count         142 non-null    object        
 6   equipment_id               142 non-null    object        
 7   latitude                   142 non-null    float64       
 8   longitude                  142 non-null    float64       
 9   location_timestamp         142 non-null    datetime64[ns]
 10  BarometricPressure         142 non-null    object        
 11  DistanceLtd                142 non-null    object        
 12  E

In [101]:
#now do the function on the original joined dataset (filted to active=y, svc ctrs dropped, equipment id < 6)
joined_fill = join_j1939_diag.groupby('equipment_id').apply(input_fill)

In [103]:
joined_fill.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 547766 entries, ('1327', 41577) to ('R1764', 4952)
Data columns (total 26 columns):
 #   Column                     Non-Null Count   Dtype         
---  ------                     --------------   -----         
 0   record_id                  547766 non-null  object        
 1   event_timestamp            547766 non-null  datetime64[ns]
 2   event_descr                547766 non-null  object        
 3   spn                        547766 non-null  object        
 4   fmi                        547766 non-null  object        
 5   active_trans_count         547766 non-null  object        
 6   equipment_id               547766 non-null  object        
 7   latitude                   547766 non-null  float64       
 8   longitude                  547766 non-null  float64       
 9   location_timestamp         547766 non-null  datetime64[ns]
 10  BarometricPressure         547766 non-null  object        
 11  DistanceLtd                54

Note that there are 2 columns that still have NaNs.  This may be because the specific equipment only had NaNs for 

those columns.  Requires some investigation on those instances. 

Other options for filling in missing data: 

use linear interpolation to fill in missing values    .interpolate()

scikit learn imputer 

nearest neighbors imputation 

insert new column boolean "Fuel Temperature missing"

In [None]:
#Taylor's code to show where all the commas are in a column, but do not show NaNs
#diagnostic[diagnostic['AcceleratorPedal'].str.contains(',', na=False)]


Reference for dealing with NaNs and logistic regression, https://www.kaggle.com/cemsarier/preprocessing-filling-nan-and-logistic-regression

Lifelines: https://lifelines.readthedocs.io/en/latest/Quickstart.html

kd nugget timeline examples: https://www.kdnuggets.com/2020/07/guide-survival-analysis-python-part-3.html

Datacamp Suvival Analysis in R:https://learn.datacamp.com/courses/survival-analysis-in-r



# Proposed age brackets 




## Age of truck resources ** assumes diesel engines 

https://www.coopskw.com/avoid-risk-buying-used-semi-trucks-sale-hot-tips/  Around 750,000 miles is when a well-made truck will probably need a major overhaul if it’s to keep running for heavy-duty work.  Now, sometimes you can get a great deal on a “fixer-upper,” but be aware if the mileage has climbed to high levels.

https://www.commerceexpressinc.com/2020/01/09/5-quick-facts-about-semi-trucks/  
Avg gas mileage:  4-8 miles. Uphill, 2.9 miles
Trucks average 45,000/year 
Last upwards of 750,000 miles, some can reach 1,000,000

https://usspecial.com/how-many-miles-do-semi-trucks-last/   Large logistics companies trade their trucks in at 500,000 miles before other components start to fail 

350,000 - 450,000:  moderately low mileage 


https://bigrigpros.com/why-do-semi-trucks-last-so-long/
Gasoline engines 
2000-3500 RPM
Is running twice as fast and working twice as hard as a diesel engine
Diesel engines 
1300-1600 RPM
A diesel engine is working at a much slower rate than a gasoline engine 

Through this statistic, it is easy to see why gasoline engines wear out much faster than a diesel engine. 