<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Looking-at-engine_faults" data-toc-modified-id="Looking-at-engine_faults-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Looking at <code>engine_faults</code></a></span><ul class="toc-item"><li><span><a href="#Data-Cleanups-engine_Faults" data-toc-modified-id="Data-Cleanups-engine_Faults-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Data Cleanups <code>engine_Faults</code></a></span></li></ul></li><li><span><a href="#Looking-at-diagnostics" data-toc-modified-id="Looking-at-diagnostics-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Looking at <code>diagnostics</code></a></span><ul class="toc-item"><li><span><a href="#Data-Cleanups:-diagnostics" data-toc-modified-id="Data-Cleanups:-diagnostics-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Data Cleanups: <code>diagnostics</code></a></span></li></ul></li><li><span><a href="#Data-Exploration" data-toc-modified-id="Data-Exploration-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Data Exploration</a></span></li></ul></div>

**Libraries**

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

**Datasets**

In [2]:
engine_faults = pd.read_csv("../data/J1939Faults.csv", low_memory=False)
diagnostics = pd.read_csv("../data/VehicleDiagnosticOnboardData.csv", low_memory=False)

## Looking at `engine_faults`

In [3]:
display(engine_faults.shape)
display(engine_faults.head(2))
display(engine_faults.info())

(1187335, 20)

Unnamed: 0,RecordID,ESS_Id,EventTimeStamp,eventDescription,actionDescription,ecuSoftwareVersion,ecuSerialNumber,ecuModel,ecuMake,ecuSource,spn,fmi,active,activeTransitionCount,faultValue,EquipmentID,MCTNumber,Latitude,Longitude,LocationTimeStamp
0,1,990349,2015-02-21 10:47:13.000,Low (Severity Low) Engine Coolant Level,,unknown,unknown,unknown,unknown,0,111,17,True,2,,1439,105354361,38.857638,-84.626851,2015-02-21 11:34:25.000
1,2,990360,2015-02-21 11:34:34.000,,,unknown,unknown,unknown,unknown,11,629,12,True,127,,1439,105354361,38.857638,-84.626851,2015-02-21 11:35:10.000


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1187335 entries, 0 to 1187334
Data columns (total 20 columns):
 #   Column                 Non-Null Count    Dtype  
---  ------                 --------------    -----  
 0   RecordID               1187335 non-null  int64  
 1   ESS_Id                 1187335 non-null  int64  
 2   EventTimeStamp         1187335 non-null  object 
 3   eventDescription       1126490 non-null  object 
 4   actionDescription      0 non-null        float64
 5   ecuSoftwareVersion     891285 non-null   object 
 6   ecuSerialNumber        844318 non-null   object 
 7   ecuModel               1122577 non-null  object 
 8   ecuMake                1122577 non-null  object 
 9   ecuSource              1187335 non-null  int64  
 10  spn                    1187335 non-null  int64  
 11  fmi                    1187335 non-null  int64  
 12  active                 1187335 non-null  bool   
 13  activeTransitionCount  1187335 non-null  int64  
 14  faultValue        

None

- `ESS_Id` – the event subscriber service event that contained the fault
- `EventTimeStamp` – when the event took place
- `eventDescription` – brief text of meaning of the code (not always present)
- `actionDescription` – never seen this filled in
- `ecuSoftwareVersion` – version string from the reporting vehicle computer system
- `ecuSerialNumber` – Serial number of the reporting Engine Control Module (ECM)
- `ecuModel` - Model of the reporting ECM
- `ecuMake` – Manufacturer of the reporting ECM
- `ecuSource` –
- `spn` – Fault code being reported
- `fmi` – Failure Mode associated with the Fault Code
- `active` – whether the code is being set or being removed
- `activeTransitionCount` – Number of times code has been set/unset
- `faultValue` – never seen used
- `EquipmentID` – Assigned truck number of the unit in question
- `MCTNumber` – Communications Terminal assigned to the truck
- `Latitude` – Latitude at time of event
- `Longitude` – Longitude at time of event
- `LocationTimeStamp` – Time latitude and longitude were obtained


### Data Cleanups `engine_Faults`

In [4]:
# Drop these columns: There are no data in these columns (Just `NaN`)
engine_faults = engine_faults.drop(columns=["actionDescription", "faultValue"])

# Rename all columns to format "column_name"
engine_faults = engine_faults.rename(columns={
    "RecordID": "record_id",
    "ESS_Id": "ess_id",
    "EventTimeStamp": "event_timestamp",
    "eventDescription": "event_description",
    "actionDescription": "action_description",
    "ecuSoftwareVersion": "ecu_software_version",
    "ecuSerialNumber": "ecu_serial_number",
    "ecuModel": "ecu_model",
    "ecuMake": "ecu_make",
    "ecuSource": "ecu_source",
    "activeTransitionCount": "active_transition_count",
    "faultValue": "fault_value",
    "EquipmentID": "equipment_id",
    "MCTNumber": "mct_number",
    "Latitude": "latitude",
    "Longitude": "longitude",
    "LocationTimeStamp": "location_timestamp"
})

# Columns To String
engine_faults["record_id"] = engine_faults["record_id"].astype("str")
engine_faults["ess_id"] = engine_faults["ess_id"].astype("str")
engine_faults["event_description"] = engine_faults["event_description"].astype("str")
engine_faults["ecu_software_version"] = engine_faults["ecu_software_version"].astype("str")
engine_faults["ecu_serial_number"] = engine_faults["ecu_serial_number"].astype("str")
engine_faults["ecu_model"] = engine_faults["ecu_model"].astype("str")
engine_faults["ecu_make"] = engine_faults["ecu_make"].astype("str")
engine_faults["ecu_source"] = engine_faults["ecu_source"].astype("str")
engine_faults["spn"] = engine_faults["spn"].astype("str")
engine_faults["fmi"] = engine_faults["fmi"].astype("str")
engine_faults["equipment_id"] = engine_faults["equipment_id"].astype("str")
engine_faults["mct_number"] = engine_faults["mct_number"].astype("str")
engine_faults["latitude"] = engine_faults["latitude"].astype("float64")
engine_faults["longitude"] = engine_faults["longitude"].astype("float64")

# Columns To Boolean
engine_faults["active"] = engine_faults["active"].astype("bool")

# Columns To Integer
engine_faults["active_transition_count"] = engine_faults["active_transition_count"].astype("int64")

# # Columns To Float
# # We should make these into string instead
# engine_faults["latitude"] = engine_faults["latitude"].astype("float64")
# engine_faults["longitude"] = engine_faults["longitude"].astype("float64")

# Columns To datetime
engine_faults["event_timestamp"] = engine_faults["event_timestamp"].astype("datetime64")
engine_faults["location_timestamp"] = engine_faults["location_timestamp"].astype("datetime64")

In [5]:
display(engine_faults.shape)
display(engine_faults.head(2))
display(engine_faults.info())

(1187335, 18)

Unnamed: 0,record_id,ess_id,event_timestamp,event_description,ecu_software_version,ecu_serial_number,ecu_model,ecu_make,ecu_source,spn,fmi,active,active_transition_count,equipment_id,mct_number,latitude,longitude,location_timestamp
0,1,990349,2015-02-21 10:47:13,Low (Severity Low) Engine Coolant Level,unknown,unknown,unknown,unknown,0,111,17,True,2,1439,105354361,38.857638,-84.626851,2015-02-21 11:34:25
1,2,990360,2015-02-21 11:34:34,,unknown,unknown,unknown,unknown,11,629,12,True,127,1439,105354361,38.857638,-84.626851,2015-02-21 11:35:10


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1187335 entries, 0 to 1187334
Data columns (total 18 columns):
 #   Column                   Non-Null Count    Dtype         
---  ------                   --------------    -----         
 0   record_id                1187335 non-null  object        
 1   ess_id                   1187335 non-null  object        
 2   event_timestamp          1187335 non-null  datetime64[ns]
 3   event_description        1187335 non-null  object        
 4   ecu_software_version     1187335 non-null  object        
 5   ecu_serial_number        1187335 non-null  object        
 6   ecu_model                1187335 non-null  object        
 7   ecu_make                 1187335 non-null  object        
 8   ecu_source               1187335 non-null  object        
 9   spn                      1187335 non-null  object        
 10  fmi                      1187335 non-null  object        
 11  active                   1187335 non-null  bool          
 12  

None

## Looking at `diagnostics`

In [6]:
display(diagnostics.shape)
display(diagnostics.head(10))
display(diagnostics.info())

(12821626, 4)

Unnamed: 0,Id,Name,Value,FaultId
0,1,IgnStatus,False,1
1,2,EngineOilPressure,0,1
2,3,EngineOilTemperature,96.74375,1
3,4,TurboBoostPressure,0,1
4,5,EngineLoad,11,1
5,6,AcceleratorPedal,0,1
6,7,IntakeManifoldTemperature,78.8,1
7,8,FuelRate,0,1
8,9,FuelLtd,12300.907429328,1
9,10,EngineRpm,0,1


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12821626 entries, 0 to 12821625
Data columns (total 4 columns):
 #   Column   Dtype 
---  ------   ----- 
 0   Id       int64 
 1   Name     object
 2   Value    object
 3   FaultId  int64 
dtypes: int64(2), object(2)
memory usage: 391.3+ MB


None

### Data Cleanups: `diagnostics`

In [7]:
# Pivotting
diagnostics = diagnostics.pivot(index = 'FaultId', columns='Name', values='Value').reset_index()
display(diagnostics.shape)
display(diagnostics.head(10))
display(diagnostics.info())

(1187335, 24)

Name,AcceleratorPedal,BarometricPressure,CruiseControlActive,CruiseControlSetSpeed,DistanceLtd,EngineCoolantTemperature,EngineLoad,EngineOilPressure,EngineOilTemperature,EngineRpm,...,FuelTemperature,IgnStatus,IntakeManifoldTemperature,LampStatus,ParkingBrake,ServiceDistance,Speed,SwitchedBatteryVoltage,Throttle,TurboBoostPressure
0,0.0,14.21,False,66.48672,423178.7,100.4,11.0,0.0,96.74375,0.0,...,,False,78.8,1023,True,,0.0,3276.75,,0.0
1,,,,,,,,,,,...,,True,,1279,,,,,,
2,,,,,,,,,,,...,,,,1279,,,,,,
3,,,,,,,,,,,...,,True,,1279,,,,,,
4,,,,,,,,,,,...,,,,16639,,,,,,
5,48.0,14.4275,False,64.6226,470381.4,181.4,30.0,38.28,196.5313,1514.5,...,,True,111.2,1023,,,13.6022,3276.75,,6.67
6,82.8,14.2825,False,64.6226,278736.7,188.6,80.0,39.44,210.0313,1711.375,...,,True,78.8,1023,,,41.53478,3276.75,,20.59
7,,,,,,,,,,,...,,True,,1023,,,,,,
8,,,,,,,,,,,...,,True,,1023,,,,,,
9,,,,,,,,,,,...,,,,1023,,,,,,


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1187335 entries, 0 to 1187334
Data columns (total 24 columns):
 #   Column                     Non-Null Count    Dtype 
---  ------                     --------------    ----- 
 0   AcceleratorPedal           531889 non-null   object
 1   BarometricPressure         585976 non-null   object
 2   CruiseControlActive        574916 non-null   object
 3   CruiseControlSetSpeed      576458 non-null   object
 4   DistanceLtd                585819 non-null   object
 5   EngineCoolantTemperature   586071 non-null   object
 6   EngineLoad                 585621 non-null   object
 7   EngineOilPressure          586244 non-null   object
 8   EngineOilTemperature       583912 non-null   object
 9   EngineRpm                  586921 non-null   object
 10  EngineTimeLtd              581366 non-null   object
 11  FuelLevel                  502795 non-null   object
 12  FuelLtd                    585195 non-null   object
 13  FuelRate                   

None

## Data Exploration

In [8]:
engine_faults.head()

Unnamed: 0,record_id,ess_id,event_timestamp,event_description,ecu_software_version,ecu_serial_number,ecu_model,ecu_make,ecu_source,spn,fmi,active,active_transition_count,equipment_id,mct_number,latitude,longitude,location_timestamp
0,1,990349,2015-02-21 10:47:13,Low (Severity Low) Engine Coolant Level,unknown,unknown,unknown,unknown,0,111,17,True,2,1439,105354361,38.857638,-84.626851,2015-02-21 11:34:25
1,2,990360,2015-02-21 11:34:34,,unknown,unknown,unknown,unknown,11,629,12,True,127,1439,105354361,38.857638,-84.626851,2015-02-21 11:35:10
2,3,990364,2015-02-21 11:35:31,Incorrect Data Steering Wheel Angle,unknown,unknown,unknown,unknown,11,1807,2,False,127,1369,105336226,41.42125,-87.767361,2015-02-21 11:35:26
3,4,990370,2015-02-21 11:35:33,Incorrect Data Steering Wheel Angle,unknown,unknown,unknown,unknown,11,1807,2,True,127,1369,105336226,41.421018,-87.767361,2015-02-21 11:36:08
4,5,990416,2015-02-21 11:39:41,,22281684P01*22357957P01*22362082P01*,13063430,0USA13_13_0415_2238A,VOLVO,0,4364,17,False,2,1674,105427130,38.416481,-89.442638,2015-02-21 11:39:37
