## Convert objects values to numeric values

In order to use a classification algorithm the dataset values have to be confirmed to numeric values.

The following steps will be to convert the dataset object values to numeric values:

- 1 Identify all datatypes in the dataset
- 2 Convert non-categorical feature values to numeric 
- 3 Convert categorical feature values to numeric


### 1 Identify all data types in the dataset

In [10]:
#check data types for each feature

combined.dtypes

AltitudeVariation             object
VehicleSpeedInstantaneous     object
VehicleSpeedAverage           object
VehicleSpeedVariance          object
VehicleSpeedVariation         object
LongitudinalAcceleration      object
EngineLoad                    object
EngineCoolantTemperature     float64
ManifoldAbsolutePressure     float64
EngineRPM                     object
MassAirFlow                   object
IntakeAirTemperature         float64
VerticalAcceleration          object
FuelConsumptionAverage        object
drivingStyle                  object
Car                           object
Journey                        int64
dtype: object

In [11]:
#The EngineRPM column numeric data has commas instead of decimal points. So will have to replace comma with decimal prior to converting 
#to float

combined.tail()

Unnamed: 0,AltitudeVariation,VehicleSpeedInstantaneous,VehicleSpeedAverage,VehicleSpeedVariance,VehicleSpeedVariation,LongitudinalAcceleration,EngineLoad,EngineCoolantTemperature,ManifoldAbsolutePressure,EngineRPM,MassAirFlow,IntakeAirTemperature,VerticalAcceleration,FuelConsumptionAverage,drivingStyle,Car,Journey
23765,1.0,28.79999924,28.55999908,57.19057079,3.600000381,-0.0292,25.88235283,81.0,115.0,17555,20.46999931,25.0,-0.1661,14.57800293,EvenPaceStyle,peugeot,4
23766,1.699996948,30.59999847,28.5299991,57.01026584,1.799999237,-0.0304,11.76470566,81.0,106.0,7365,17.73999977,25.0,-0.1987,14.58564186,EvenPaceStyle,peugeot,4
23767,1.800003052,29.69999886,28.49999908,56.8830454,-0.899999619,-0.1684,98.03921509,81.0,106.0,1254,9.520000458,24.0,-0.1156,14.54729366,EvenPaceStyle,peugeot,4
23768,2.100006104,29.69999886,28.40999908,56.16090993,0.0,-0.0644,79.60784149,80.0,112.0,1254,14.90999985,23.0,-0.076,14.54682827,EvenPaceStyle,peugeot,4
23769,1.5,33.29999924,28.34999911,55.34084256,3.600000381,-0.1817,80.0,80.0,113.0,13635,15.32999992,23.0,-0.0605,14.55406761,EvenPaceStyle,peugeot,4


In [12]:
#changing EngineRPM comma separator to decimal separator

combined['EngineRPM'] = combined['EngineRPM'].astype(str)
combined['EngineRPM'] = combined['EngineRPM'].str.replace(',', '.')

In [13]:
#check that EngineRPM is now showing decimal separators

combined.tail()

Unnamed: 0,AltitudeVariation,VehicleSpeedInstantaneous,VehicleSpeedAverage,VehicleSpeedVariance,VehicleSpeedVariation,LongitudinalAcceleration,EngineLoad,EngineCoolantTemperature,ManifoldAbsolutePressure,EngineRPM,MassAirFlow,IntakeAirTemperature,VerticalAcceleration,FuelConsumptionAverage,drivingStyle,Car,Journey
23765,1.0,28.79999924,28.55999908,57.19057079,3.600000381,-0.0292,25.88235283,81.0,115.0,1755.5,20.46999931,25.0,-0.1661,14.57800293,EvenPaceStyle,peugeot,4
23766,1.699996948,30.59999847,28.5299991,57.01026584,1.799999237,-0.0304,11.76470566,81.0,106.0,736.5,17.73999977,25.0,-0.1987,14.58564186,EvenPaceStyle,peugeot,4
23767,1.800003052,29.69999886,28.49999908,56.8830454,-0.899999619,-0.1684,98.03921509,81.0,106.0,1254.0,9.520000458,24.0,-0.1156,14.54729366,EvenPaceStyle,peugeot,4
23768,2.100006104,29.69999886,28.40999908,56.16090993,0.0,-0.0644,79.60784149,80.0,112.0,1254.0,14.90999985,23.0,-0.076,14.54682827,EvenPaceStyle,peugeot,4
23769,1.5,33.29999924,28.34999911,55.34084256,3.600000381,-0.1817,80.0,80.0,113.0,1363.5,15.32999992,23.0,-0.0605,14.55406761,EvenPaceStyle,peugeot,4


### 2 Convert non-categorical feature values to numeric

In [14]:
#create a new dataset labelled numeric to indicate the change in data type

new_numeric_dataset = combined

new_numeric_dataset

Unnamed: 0,AltitudeVariation,VehicleSpeedInstantaneous,VehicleSpeedAverage,VehicleSpeedVariance,VehicleSpeedVariation,LongitudinalAcceleration,EngineLoad,EngineCoolantTemperature,ManifoldAbsolutePressure,EngineRPM,MassAirFlow,IntakeAirTemperature,VerticalAcceleration,FuelConsumptionAverage,drivingStyle,Car,Journey
0,-2.2999878,25.67051888,13.22350089,121.5926897,-2.4769802,0.3555,4.705882549,68.0,106.0,1796,15.81000042,24.0,-0.1133,19.49733543,EvenPaceStyle,opel,1
1,-2.0999756,24.09425926,13.63891915,120.4225707,-1.57625962,0.4492,10.58823586,68.0,103.0,1689,14.64999962,22.0,-0.1289,19.51572227,EvenPaceStyle,opel,1
2,-1.5,22.74317932,14.03104293,118.4567689,-1.35107994,0.4258,27.45098114,68.0,103.0,1599,11.85000038,21.0,-0.1328,19.44176483,EvenPaceStyle,opel,1
3,0.1000366,22.29281998,14.17107305,117.5713084,-0.45035934,0.414,24.31372643,69.0,104.0,1620,12.21000004,20.0,-0.0859,19.38876915,EvenPaceStyle,opel,1
4,0.0999756,23.64389992,14.3289535,117.0741485,1.35107994,0.3945,20,69.0,104.0,1708,11.90999985,21.0,-0.0664,19.30163765,EvenPaceStyle,opel,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23765,1,28.79999924,28.55999908,57.19057079,3.600000381,-0.0292,25.88235283,81.0,115.0,1755.5,20.46999931,25.0,-0.1661,14.57800293,EvenPaceStyle,peugeot,4
23766,1.699996948,30.59999847,28.5299991,57.01026584,1.799999237,-0.0304,11.76470566,81.0,106.0,736.5,17.73999977,25.0,-0.1987,14.58564186,EvenPaceStyle,peugeot,4
23767,1.800003052,29.69999886,28.49999908,56.8830454,-0.899999619,-0.1684,98.03921509,81.0,106.0,1254,9.520000458,24.0,-0.1156,14.54729366,EvenPaceStyle,peugeot,4
23768,2.100006104,29.69999886,28.40999908,56.16090993,0,-0.0644,79.60784149,80.0,112.0,1254,14.90999985,23.0,-0.076,14.54682827,EvenPaceStyle,peugeot,4


In [15]:
#validate data types 

new_numeric_dataset.dtypes

AltitudeVariation             object
VehicleSpeedInstantaneous     object
VehicleSpeedAverage           object
VehicleSpeedVariance          object
VehicleSpeedVariation         object
LongitudinalAcceleration      object
EngineLoad                    object
EngineCoolantTemperature     float64
ManifoldAbsolutePressure     float64
EngineRPM                     object
MassAirFlow                   object
IntakeAirTemperature         float64
VerticalAcceleration          object
FuelConsumptionAverage        object
drivingStyle                  object
Car                           object
Journey                        int64
dtype: object

In [16]:
#convert numeric object columns to float or integer Note: having issues converting EngineRPM (it creates new NaNs), so will
#do it separately

columns_to_convert = ['AltitudeVariation', 'VehicleSpeedInstantaneous', 'VehicleSpeedAverage','VehicleSpeedVariance','VehicleSpeedVariation','LongitudinalAcceleration','EngineLoad','MassAirFlow','VerticalAcceleration','FuelConsumptionAverage' ]

for col in columns_to_convert:
    new_numeric_dataset[col] = pd.to_numeric(new_numeric_dataset[col], errors='coerce')

In [17]:
#check non-categorical datatypes after convert process 

new_numeric_dataset.dtypes

AltitudeVariation            float64
VehicleSpeedInstantaneous    float64
VehicleSpeedAverage          float64
VehicleSpeedVariance         float64
VehicleSpeedVariation        float64
LongitudinalAcceleration     float64
EngineLoad                   float64
EngineCoolantTemperature     float64
ManifoldAbsolutePressure     float64
EngineRPM                     object
MassAirFlow                  float64
IntakeAirTemperature         float64
VerticalAcceleration         float64
FuelConsumptionAverage       float64
drivingStyle                  object
Car                           object
Journey                        int64
dtype: object

In [18]:
#convert EngineRPM into numeric data type

new_numeric_dataset['EngineRPM'] = pd.to_numeric(new_numeric_dataset['EngineRPM'])

In [19]:
#Validate all non-categorical data types are now numeric after change to EngineRPM

new_numeric_dataset.dtypes

AltitudeVariation            float64
VehicleSpeedInstantaneous    float64
VehicleSpeedAverage          float64
VehicleSpeedVariance         float64
VehicleSpeedVariation        float64
LongitudinalAcceleration     float64
EngineLoad                   float64
EngineCoolantTemperature     float64
ManifoldAbsolutePressure     float64
EngineRPM                    float64
MassAirFlow                  float64
IntakeAirTemperature         float64
VerticalAcceleration         float64
FuelConsumptionAverage       float64
drivingStyle                  object
Car                           object
Journey                        int64
dtype: object

### 3 Convert categorical feature values to numeric

In [20]:
#convert the drivingStyle & car features to numeric values

combined['drivingStyle'] = pd.Categorical(combined['drivingStyle']).codes
combined['Car'] = pd.Categorical(combined['Car']).codes

In [21]:
#Validate all data types are now numeric

new_numeric_dataset.dtypes

AltitudeVariation            float64
VehicleSpeedInstantaneous    float64
VehicleSpeedAverage          float64
VehicleSpeedVariance         float64
VehicleSpeedVariation        float64
LongitudinalAcceleration     float64
EngineLoad                   float64
EngineCoolantTemperature     float64
ManifoldAbsolutePressure     float64
EngineRPM                    float64
MassAirFlow                  float64
IntakeAirTemperature         float64
VerticalAcceleration         float64
FuelConsumptionAverage       float64
drivingStyle                    int8
Car                             int8
Journey                        int64
dtype: object