## Concept: Predicting Daily CO₂ Emissions from Car Travel Based on Distance Driven
# Description:
#### This dataset models the relationship between the daily distance a person drives (in kilometers) and the resulting CO₂ emissions (in kilograms). This is a valuable sustainability metric for understanding the environmental impact of personal transportation and encouraging more sustainable commuting habits.

#### Columns:

##### DistanceKm (Distance driven per day in kilometers)
##### CO2Kg (CO₂ emissions per day in kilograms)

In [1]:
import pandas as pd

# Load the dataset

dataset = pd.read_csv("car_travel_co2_emissions.csv")

dataset

Unnamed: 0,DistanceKm,CO2Kg
0,4.7,1.04
1,5.2,1.11
2,5.5,1.23
3,6.1,1.29
4,6.6,1.39
...,...,...
98,55.3,12.67
99,55.8,12.82
100,56.2,12.96
101,56.8,13.09


In [2]:
# Split the dataset into (input & output) independent and dependent variables
independent = dataset[["DistanceKm"]]
dependent = dataset[["CO2Kg"]]

display(independent)
display(dependent)

Unnamed: 0,DistanceKm
0,4.7
1,5.2
2,5.5
3,6.1
4,6.6
...,...
98,55.3
99,55.8
100,56.2
101,56.8


Unnamed: 0,CO2Kg
0,1.04
1,1.11
2,1.23
3,1.29
4,1.39
...,...
98,12.67
99,12.82
100,12.96
101,13.09


In [4]:
# Train-test split
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(independent, dependent, test_size=0.3, random_state=0)
display(X_train, X_test, Y_train, Y_test)

Unnamed: 0,DistanceKm
82,47.1
89,50.7
92,52.1
61,36.9
27,18.9
...,...
99,55.8
67,39.8
64,38.2
47,29.7


Unnamed: 0,DistanceKm
26,18.5
60,36.5
2,5.5
51,31.7
71,41.7
76,44.3
16,13.1
66,39.2
56,34.2
48,30.1


Unnamed: 0,CO2Kg
82,10.69
89,11.51
92,11.88
61,8.14
27,4.19
...,...
99,12.82
67,8.86
64,8.49
47,6.62


Unnamed: 0,CO2Kg
26,4.07
60,8.19
2,1.23
51,7.08
71,9.32
76,9.99
16,2.82
66,8.71
56,7.61
48,6.66


In [5]:
#model creation
from sklearn.linear_model import LinearRegression
LinearRegression = LinearRegression()
LinearRegression.fit(X_train, Y_train)

In [8]:
# The weight is the coefficient of the linear regression model, which represents the change in the dependent variable (CO2Kg) for a one-unit change in the independent variable (DistanceKm).
weight = LinearRegression.coef_
print("Weight: ", weight)

# The bias is the intercept of the linear regression model, which represents the value of the dependent variable (CO2Kg) when the independent variable (DistanceKm) is zero.
bias = LinearRegression.intercept_
print("Bias: ", bias)

Weight:  [[0.23137392]]
Bias:  [-0.22893816]


In [9]:
#MODEL TRAINING

y_predict = LinearRegression.predict(X_test)
print("Predicted CO2Kg: ", y_predict)

Predicted CO2Kg:  [[ 4.05147934]
 [ 8.21620988]
 [ 1.0436184 ]
 [ 7.10561507]
 [ 9.41935426]
 [10.02092645]
 [ 2.80206018]
 [ 8.84091946]
 [ 7.68404987]
 [ 6.7354168 ]
 [ 7.42953856]
 [12.19584129]
 [12.56603956]
 [ 2.52441148]
 [ 1.66832798]
 [ 4.51422718]
 [ 3.56559411]
 [ 3.77383064]
 [ 4.97697502]
 [ 1.80715233]
 [ 6.22639418]
 [ 9.92837688]
 [ 1.18244275]
 [ 9.62759079]
 [11.03897169]
 [ 9.78955253]
 [ 1.50636623]
 [ 6.41149331]
 [12.28839085]
 [10.22916298]
 [ 8.35503423]]


In [10]:
#MODEL EVALUATION
from sklearn.metrics import r2_score
r2score = r2_score(Y_test, y_predict) 
print("R2 Score: ", r2score)
# R2 Score is nearlly 1, which indicates that the model is a good fit for the data.

R2 Score:  0.9995181118568254


In [11]:
#proceed to save the model
import pickle
# Save the model
good_fit_model_name = 'CO2Kg_linear_r_model.pkl'   # Save the model to disk #file_name = 'CO2Kg_linear_r_model.sav'

pickle.dump(LinearRegression, open(good_fit_model_name, 'wb'))

# Load the model
load_good_fit_model = pickle.load(open(good_fit_model_name, "rb"))

Result = load_good_fit_model.predict([[50]])
print("Predicted CO2Kg for 50 Km: ", Result)   

Predicted CO2Kg for 50 Km:  [[11.33975779]]


