## CO2 Emission by vehicles Prediction

The dataset captures the details of how CO2 emissions by a vehicle can vary with the different features. The dataset has been taken from Canada Government official open data website. This is a compiled version taken from [Kaggle datasets](https://www.kaggle.com/code/ahmetcanertekn/co2-emission-by-vehicles-eda-visualization/input). This contains data over a period of 7 years.

### Model

4WD/4X4 = Four-wheel drive

AWD = All-wheel drive

FFV = Flexible-fuel vehicle

SWB = Short wheelbase

LWB = Long wheelbase

EWB = Extended wheelbase

### Transmission

A = Automatic

M = Manual

### Fuel type

X = Regular gasoline(petrol)

Z = Premium gasoline(petrol)

D = Diesel

E = Ethanol (E85)

N = Natural gas

### Fuel Consumption

City and highway fuel consumption ratings are shown in litres per 100 kilometres (L/100 km) - the combined rating (55% city, 45% hwy) is shown in L/100 km and in miles per gallon (mpg)

**To convert mpg to kmpl**, divide the fuel economy value by **2.352**

### CO2 Emissions

The tailpipe emissions of carbon dioxide (in grams per kilometre) for combined city and highway driving


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

In [3]:
dataset = pd.read_csv("CO2_Emissions.csv")

In [4]:
dataset.head(10)

Unnamed: 0,Make,Model,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption City (L/100 km),Fuel Consumption Hwy (L/100 km),Fuel Consumption Comb (L/100 km),Fuel Consumption Comb (mpg),CO2 Emissions(g/km)
0,ACURA,ILX,COMPACT,2.0,4,A,Z,9.9,6.7,8.5,33,196
1,ACURA,ILX,COMPACT,2.4,4,M,Z,11.2,7.7,9.6,29,221
2,ACURA,ILX HYBRID,COMPACT,1.5,4,A,Z,6.0,5.8,5.9,48,136
3,ACURA,MDX 4WD,SUV - SMALL,3.5,6,A,Z,12.7,9.1,11.1,25,255
4,ACURA,RDX AWD,SUV - SMALL,3.5,6,A,Z,12.1,8.7,10.6,27,244
5,ACURA,RLX,MID-SIZE,3.5,6,A,Z,11.9,7.7,10.0,28,230
6,ACURA,TL,MID-SIZE,3.5,6,A,Z,11.8,8.1,10.1,28,232
7,ACURA,TL AWD,MID-SIZE,3.7,6,A,Z,12.8,9.0,11.1,25,255
8,ACURA,TL AWD,MID-SIZE,3.7,6,M,Z,13.4,9.5,11.6,24,267
9,ACURA,TSX,COMPACT,2.4,4,A,Z,10.6,7.5,9.2,31,212


### Extracting required features for training

In [5]:
dataset = dataset[["Vehicle Class", "Engine Size(L)", "Cylinders", "Transmission", "Fuel Type", "Fuel Consumption Comb (mpg)", "CO2 Emissions(g/km)"]]

In [6]:
dataset.head(5)

Unnamed: 0,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption Comb (mpg),CO2 Emissions(g/km)
0,COMPACT,2.0,4,A,Z,33,196
1,COMPACT,2.4,4,M,Z,29,221
2,COMPACT,1.5,4,A,Z,48,136
3,SUV - SMALL,3.5,6,A,Z,25,255
4,SUV - SMALL,3.5,6,A,Z,27,244


### Dropping rows with Fuel Type "E" and "N"

In [7]:
dataset = dataset[dataset["Fuel Type"] != "E"]

In [8]:
dataset = dataset[dataset["Fuel Type"] != "N"]

In [9]:
dataset.head(5)

Unnamed: 0,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption Comb (mpg),CO2 Emissions(g/km)
0,COMPACT,2.0,4,A,Z,33,196
1,COMPACT,2.4,4,M,Z,29,221
2,COMPACT,1.5,4,A,Z,48,136
3,SUV - SMALL,3.5,6,A,Z,25,255
4,SUV - SMALL,3.5,6,A,Z,27,244


In [10]:
dataset[["Fuel Type"]].value_counts()["X"]

3637

In [11]:
fuel_type_mapping = {'X': 0, 'Z': 1, 'D': 2}

In [12]:
dataset['Fuel Type'] = dataset['Fuel Type'].replace(fuel_type_mapping)

In [13]:
dataset.tail(10)

Unnamed: 0,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption Comb (mpg),CO2 Emissions(g/km)
7375,MID-SIZE,2.0,4,A,1,29,223
7376,STATION WAGON - SMALL,2.0,4,A,1,32,208
7377,STATION WAGON - SMALL,2.0,4,A,1,30,219
7378,STATION WAGON - SMALL,2.0,4,A,1,30,220
7379,SUV - SMALL,2.0,4,A,0,31,210
7380,SUV - SMALL,2.0,4,A,1,30,219
7381,SUV - SMALL,2.0,4,A,1,29,232
7382,SUV - SMALL,2.0,4,A,1,27,240
7383,SUV - STANDARD,2.0,4,A,1,29,232
7384,SUV - STANDARD,2.0,4,A,1,26,248


In [14]:
transmission_mapping = {'A': 0, 'M': 1}

In [15]:
dataset["Transmission"] = dataset["Transmission"].replace(transmission_mapping)

In [16]:
dataset.head(10)

Unnamed: 0,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption Comb (mpg),CO2 Emissions(g/km)
0,COMPACT,2.0,4,0,1,33,196
1,COMPACT,2.4,4,1,1,29,221
2,COMPACT,1.5,4,0,1,48,136
3,SUV - SMALL,3.5,6,0,1,25,255
4,SUV - SMALL,3.5,6,0,1,27,244
5,MID-SIZE,3.5,6,0,1,28,230
6,MID-SIZE,3.5,6,0,1,28,232
7,MID-SIZE,3.7,6,0,1,25,255
8,MID-SIZE,3.7,6,1,1,24,267
9,COMPACT,2.4,4,0,1,31,212


In [17]:
dataset.tail(10)

Unnamed: 0,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption Comb (mpg),CO2 Emissions(g/km)
7375,MID-SIZE,2.0,4,0,1,29,223
7376,STATION WAGON - SMALL,2.0,4,0,1,32,208
7377,STATION WAGON - SMALL,2.0,4,0,1,30,219
7378,STATION WAGON - SMALL,2.0,4,0,1,30,220
7379,SUV - SMALL,2.0,4,0,0,31,210
7380,SUV - SMALL,2.0,4,0,1,30,219
7381,SUV - SMALL,2.0,4,0,1,29,232
7382,SUV - SMALL,2.0,4,0,1,27,240
7383,SUV - STANDARD,2.0,4,0,1,29,232
7384,SUV - STANDARD,2.0,4,0,1,26,248


### Convert mpg to kmpl and drop the original mpg column as not needed anymore

In [18]:
dataset["Fuel Consumption Comb (kmpl)"] = dataset["Fuel Consumption Comb (mpg)"] / 2.352

dataset.drop(columns=["Fuel Consumption Comb (mpg)"], inplace=True)

In [19]:
dataset = dataset[['Vehicle Class', 'Engine Size(L)', 'Cylinders', 'Transmission', 'Fuel Type', 'Fuel Consumption Comb (kmpl)', 'CO2 Emissions(g/km)']]

In [20]:
dataset.head(5)

Unnamed: 0,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption Comb (kmpl),CO2 Emissions(g/km)
0,COMPACT,2.0,4,0,1,14.030612,196
1,COMPACT,2.4,4,1,1,12.329932,221
2,COMPACT,1.5,4,0,1,20.408163,136
3,SUV - SMALL,3.5,6,0,1,10.629252,255
4,SUV - SMALL,3.5,6,0,1,11.479592,244


In [21]:
dataset["Transmission"].unique()

array([0, 1], dtype=int64)

In [22]:
dataset["Vehicle Class"].unique()

array(['COMPACT', 'SUV - SMALL', 'MID-SIZE', 'TWO-SEATER', 'MINICOMPACT',
       'SUBCOMPACT', 'FULL-SIZE', 'STATION WAGON - SMALL',
       'SUV - STANDARD', 'VAN - CARGO', 'VAN - PAENGER',
       'PICKUP TRUCK - STANDARD', 'MINIVAN', 'SPECIAL PURPOSE VEHICLE',
       'STATION WAGON - MID-SIZE', 'PICKUP TRUCK - SMALL'], dtype=object)

In [23]:
dataset["Fuel Type"].unique()

array([1, 2, 0], dtype=int64)

In [24]:
dataset.head(10)

Unnamed: 0,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption Comb (kmpl),CO2 Emissions(g/km)
0,COMPACT,2.0,4,0,1,14.030612,196
1,COMPACT,2.4,4,1,1,12.329932,221
2,COMPACT,1.5,4,0,1,20.408163,136
3,SUV - SMALL,3.5,6,0,1,10.629252,255
4,SUV - SMALL,3.5,6,0,1,11.479592,244
5,MID-SIZE,3.5,6,0,1,11.904762,230
6,MID-SIZE,3.5,6,0,1,11.904762,232
7,MID-SIZE,3.7,6,0,1,10.629252,255
8,MID-SIZE,3.7,6,1,1,10.204082,267
9,COMPACT,2.4,4,0,1,13.180272,212


### Feature Scaling and Training the model

In [25]:
from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import StandardScaler

# sc = MinMaxScaler()

In [26]:
X = dataset.drop(columns=["Vehicle Class", "CO2 Emissions(g/km)"])

In [27]:
X.head(10)

Unnamed: 0,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption Comb (kmpl)
0,2.0,4,0,1,14.030612
1,2.4,4,1,1,12.329932
2,1.5,4,0,1,20.408163
3,3.5,6,0,1,10.629252
4,3.5,6,0,1,11.479592
5,3.5,6,0,1,11.904762
6,3.5,6,0,1,11.904762
7,3.7,6,0,1,10.629252
8,3.7,6,1,1,10.204082
9,2.4,4,0,1,13.180272


In [28]:
y = dataset["CO2 Emissions(g/km)"]

In [29]:
y.head(10)

0    196
1    221
2    136
3    255
4    244
5    230
6    232
7    255
8    267
9    212
Name: CO2 Emissions(g/km), dtype: int64

In [30]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=36, test_size=0.2)

In [31]:
X_test

Unnamed: 0,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption Comb (kmpl)
6898,5.3,8,0,0,8.928571
6780,2.0,4,0,0,11.904762
3354,5.2,10,0,1,8.503401
6503,2.4,4,0,1,14.030612
6470,1.4,4,0,0,16.156463
...,...,...,...,...,...
4022,2.0,4,0,0,16.156463
1440,1.4,4,0,0,14.030612
1990,3.8,6,1,1,10.629252
4417,5.2,10,0,1,8.078231


In [32]:
lr = LinearRegression()

In [33]:
y_train

1497    214
4172    230
185     262
4737    277
5953    197
       ... 
1385    251
3198    177
1018    163
697     301
5049    240
Name: CO2 Emissions(g/km), Length: 5611, dtype: int64

In [56]:
model = lr.fit(X_train, y_train)

In [57]:
model.predict(X_test[:1])

array([319.40366355])

In [58]:
model.score(X_test, y_test)

0.9271337066838352

### Saving the model using Pickle

In [59]:
import pickle

In [60]:
pickle.dump(model, open("model.pkl", "wb"))