In [None]:
import kagglehub
debajyotipodder_co2_emission_by_vehicles_path = kagglehub.dataset_download('debajyotipodder/co2-emission-by-vehicles')

print('Data source import complete.')


<h1 style="
  display: inline-block;
  padding: 15px 30px;
  background: linear-gradient(to right, #4A90E2, #357ABD);
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 50px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
">
  Predicting CO₂ Emissions of Cars Based on Engine Characteristics
</h1>

#  Objective
The purpose of the project is to build a regression model that has the capability for predicting the CO₂ emission values (g/km) of various car models based on engine and fuel characteristics. This question is on the characteristics of cars that lead to more greenhouse gas emissions and thereby assist the manufacturers, policymakers, and consumers in choosing in an eco-friendly manner.

#  Dataset Overview
This dataset contains information about various vehicles' carbon dioxide (CO₂) emissions and fuel consumption.
In the context of machine learning (ML), this dataset is often used to predict CO₂ emissions based on vehicle characteristics or to analyze the fuel efficiency of vehicles.
The goal could be to predict CO₂ emissions or fuel consumption based on the features of the vehicles.
There are a total of 7385 rows and 12 columns.

| Column                            | Description                                      |
| --------------------------------- | ------------------------------------------------ |
| `Make`                            | Car manufacturer                                 |
| `Model`                           | Car model                                        |
| `Vehicle Class`                   | SUV, Sedan, etc.                                 |
| `Engine Size (L)`                 | Engine displacement                              |
| `Cylinders`                       | Number of engine cylinders                       |
| `Transmission`                    | Type of transmission                             |
| `Fuel Type`                       | Fuel used (Regular, Diesel, etc.)                |
| `Fuel Consumption City (L/100km)` | Liters per 100 km in city                        |
| `Fuel Consumption Hwy (L/100km)`  | On highway                                       |
| `Fuel Consumption Comb (L/100km)` | Combined consumption                             |
| `CO2 Emissions (g/km)`            | (y) **Target variable** – carbon dioxide emissions |



<div style="font-family: Arial; font-size: 15px; line-height: 1.6; background-color: #f8f9fa; border-left: 5px solid #0d6efd; padding: 15px;">

<h3 style="color: #0d6efd;">🚗 Vehicle Attributes Reference</h3>

<b>🔧 Model Abbreviations:</b><br>
<ul>
  <li><b>4WD / 4X4</b> = Four-wheel drive</li>
  <li><b>AWD</b> = All-wheel drive</li>
  <li><b>FFV</b> = Flexible-fuel vehicle</li>
  <li><b>SWB</b> = Short wheelbase</li>
  <li><b>LWB</b> = Long wheelbase</li>
  <li><b>EWB</b> = Extended wheelbase</li>
</ul>

<b>⚙️ Transmission Types:</b><br>
<ul>
  <li><b>A</b> = Automatic</li>
  <li><b>AM</b> = Automated manual</li>
  <li><b>AS</b> = Automatic with select shift</li>
  <li><b>AV</b> = Continuously variable</li>
  <li><b>M</b> = Manual</li>
  <li><b>3 - 10</b> = Number of gears</li>
</ul>

<b>⛽ Fuel Type Codes:</b><br>
<ul>
  <li><b>X</b> = Regular gasoline</li>
  <li><b>Z</b> = Premium gasoline</li>
  <li><b>D</b> = Diesel</li>
  <li><b>E</b> = Ethanol (E85)</li>
  <li><b>N</b> = Natural gas</li>
</ul>

<b>📉 Fuel Consumption:</b><br>
City and highway fuel consumption ratings are shown in <b>litres per 100 km (L/100 km)</b>.  
The <b>combined rating</b> (55% city, 45% highway) is given in both:
<ul>
  <li><b>L/100 km</b> — lower is better (less fuel used)</li>
  <li><b>MPG (miles per gallon)</b> — higher is better (more efficient)</li>
</ul>

<b>🌿 CO₂ Emissions:</b><br>
Tailpipe emissions of <b>carbon dioxide</b> (in grams per kilometre) during combined city & highway driving.
</div>


# Importing Necessary Libraries

<h1 style="
  display: inline-block;
  width: 90%;
  padding: 30px 30px;
  background: linear-gradient(to right, #6EC6FF, #4A90E2); /* Soft blue gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
">
  📦 Importing Necessary Libraries
</h1>

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

import scipy.stats as stats
%matplotlib inline

from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, MinMaxScaler

from sklearn.linear_model import LinearRegression, Ridge, RidgeCV, Lasso, LassoCV, ElasticNet, ElasticNetCV

from sklearn.model_selection import cross_val_score,cross_validate
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

from yellowbrick.regressor import ResidualsPlot, PredictionError

import warnings
warnings.filterwarnings("ignore")


# Data Loading and Exploration

<h1 style="
  display: inline-block;
  width: 90%;
  padding: 30px 30px;
  background: linear-gradient(to right, #6EC6FF, #4A90E2); /* Soft blue gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
">
 ⚙️ Data Loading and Exploration
</h1>

In [None]:
carbon = pd.read_csv("/kaggle/input/co2-emission-by-vehicles/CO2 Emissions_Canada.csv")
print("Successfully_Loaded the dataset")

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  How does our data look...?
</h2>

In [None]:
carbon.head()

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Shape and Size?
</h2>

In [None]:
print(carbon.shape)
print(carbon.size)

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Dataset Columns and their info
</h2>

In [None]:
carbon.info()

**we don't have any null values, so we don't have too clean data**

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Check for duplicate..
</h2>

In [None]:
carbon.duplicated().sum()


**Since we model the general relationship with our independent variables (x) and target (y). We will keep these values.**

**it will help our model to fit the relationship.**


<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Unique values and count for categorical columns.
</h2>

In [None]:
cat_col = carbon.select_dtypes("object")
for col in cat_col.columns:
    print(f"name:{col}\n{carbon[col].unique()}")
    print(f"Count: {carbon[col].nunique()}")
    print("======"*15)

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Summary Stats
</h2>

In [None]:
carbon.describe()

## EDA(Exploratory Data Analysis)

<h1 style="
  display: inline-block;
  width: 90%;
  padding: 30px 30px;
  background: linear-gradient(to right, #6EC6FF, #4A90E2); /* Soft blue gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
">
 🔎 EDA(Exploratory Data Analysis)
</h1>

Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods.

In [None]:
df = carbon.copy()
df.head()

**Created a copy of the dataframe for EDA.**

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Univariate Analysis
</h2>

**Distributions of numerical columns.**

In [None]:
num_col = df.select_dtypes("number")

for col in num_col:
    plt.figure(figsize=(8, 4))
    sns.histplot(df[col], kde=True)
    plt.title(f'Distribution of {col}')
    plt.xlabel(col)
    plt.ylabel('Frequency')
    plt.tight_layout()
    plt.show()


**Distributions of Categorical columns**

In [None]:
cat_col = df.drop("Model", axis=1).select_dtypes("object")
sns.set_style("whitegrid")
for col in cat_col:
    plt.figure(figsize=(12,5))
    sns.countplot(data=df, x=col,order=df[col].value_counts().index)
    plt.xlabel(col,fontsize=14)
    plt.ylabel("count")
    plt.xticks(rotation=45, ha='right', fontsize=10)
    plt.show()

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Bivariate & Multivariate Analysis
</h2

In [None]:
sns.set(style="whitegrid", context="notebook", palette="muted")

sns.pairplot(
    df.select_dtypes(include="number"),
    diag_kind="kde",
    kind="reg",
    plot_kws={'line_kws':{'color':'crimson'}, 'scatter_kws': {'alpha': 0.6}},
    diag_kws={"shade": True},
    corner=True,
    height=2.8
)

plt.suptitle("📈 Relationship Matrix of Numeric Features", fontsize=12, y=1.02)
plt.show()


**Pair plot between numerical columns**

## Correaltion and Heatmap

In [None]:
corr_df  = num_col.corr()
corr_df.style.background_gradient("coolwarm")

In [None]:
plt.figure(figsize=(15,7))
sns.heatmap(corr_df,annot=True, cmap='coolwarm')

In [None]:
corr = pd.DataFrame(df[num_col.columns].corr()['CO2 Emissions(g/km)'].sort_values(ascending=False)).reset_index()
corr.columns = ['Feature', 'Correlation']
corr

**As we can see, all features are highly correlated with our target variable, except Fuel Consumption Comb (mpg), and that is okay cause it hase inverse relationship: lower the mpg, the higher the CO₂ consumption, and vice versa**

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 20px 40px;
  background: linear-gradient(to right, rgba(57, 62, 70, 0.3), rgba(34, 40, 49, 0.3)); /* Softer transparency */
  color: white;
  font-size: 28px;
  font-weight: bold;
  border-radius: 16px;
  box-shadow: 0px 6px 15px rgba(0, 0, 0, 0.25);
  text-align: center;
  text-shadow: 0 0 7px #8B5DFF, 0 0 12px #8B5DFF, 0 0 15px #8B5DFF;
  backdrop-filter: blur(15px);
  letter-spacing: 1.5px;
  border: 2px solid rgba(255, 255, 255, 0.3);
">
   Mean of CO₂ Emissions (g/km) by Categorical Columns.
</h2>


In [None]:
for col in cat_col:
    data_grp = df.groupby(col)['CO2 Emissions(g/km)'].mean().round(1).reset_index()
    grouped_ = data_grp.sort_values(by='CO2 Emissions(g/km)',ascending=False)
    plt.figure(figsize=(18,8))
    ax = sns.barplot(data=grouped_,x=col, y='CO2 Emissions(g/km)')
    ax.bar_label(ax.containers[0], rotation=90)
    plt.xlabel(col)
    plt.ylabel('target_mean CO₂')
    plt.xticks(rotation=45)
    sns.despine()
    plt.tight_layout()
    plt.show()

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 20px 40px;
  background: linear-gradient(to right, rgba(57, 62, 70, 0.3), rgba(34, 40, 49, 0.3)); /* Softer transparency */
  color: white;
  font-size: 28px;
  font-weight: bold;
  border-radius: 16px;
  box-shadow: 0px 6px 15px rgba(0, 0, 0, 0.25);
  text-align: center;
  text-shadow: 0 0 7px #8B5DFF, 0 0 12px #8B5DFF, 0 0 15px #8B5DFF;
  backdrop-filter: blur(15px);
  letter-spacing: 1.5px;
  border: 2px solid rgba(255, 255, 255, 0.3);
">
   Crosstab between categorical columns and Co2.
</h2>

In [None]:
df['CO2_Category'] = pd.qcut(df['CO2 Emissions(g/km)'], q=4, labels=['Low', 'Medium', 'High', 'Very High'])
for col in cat_col.columns:
    print(f"\n Crosstab between {col} and CO2_Category:\n")
    ct = pd.crosstab(df[col], df['CO2_Category'], margins=True)
    print(ct)

##  Feature Engineering -(Preprocessing)

<h1 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Feature Engineering -(Preprocessing)
</h1>

In [None]:
x = carbon.drop(columns= ["CO2 Emissions(g/km)","Model","Make"])
y = carbon["CO2 Emissions(g/km)"]

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Train_Test_Split
</h2>

In [None]:
#train_test_split
X_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [None]:
#select_dtypes columns for the further pipeline building
from sklearn.compose import make_column_selector as selector
num_features = selector(dtype_include="number")(x)
cat_features = selector(dtype_include="object")(x)

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Building Preprocessing Pipeline
</h2>

In [None]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

num_trans = StandardScaler()
cat_trans = OneHotEncoder(handle_unknown='ignore', sparse_output=False)

preprocessor = ColumnTransformer([
    ('num_f',num_trans,num_features),
    ('cat_f',cat_trans,cat_features)
])

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Building Model Pipeline with Cross-Validation
</h2>

In [None]:
models = {'linear':LinearRegression(),
          'Ridge' :Ridge(),
          'Lasso' :Lasso(),
          'Elastic_Net': ElasticNet()}
result = []
for name, model in models.items():
        pipe = Pipeline([
            ('preprocessor',preprocessor),
            ('model',model)
        ])
        scores = cross_val_score(pipe,X_train,y_train,scoring= 'neg_root_mean_squared_error', cv=5)
        result.append({'Name': name,'RMSE_CV_Avg': -scores.mean()})

rmse_avg_cv  = pd.DataFrame(result)
rmse_avg_cv

**Here, what it means is that our target is off by the same number of units as our target.**


**Example: Our RMSE is in the same unit.4.83 grams of CO₂/km**


## Modeling (choosing best and final model)

<h1 style="
  display: inline-block;
  width: 90%;
  padding: 30px 30px;
  background: linear-gradient(to right, #6EC6FF, #4A90E2); /* Soft blue gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
">
  🎯 Modeling (choosing best and final model)
</h1>

In [None]:
final = Pipeline([
    ('preprocessor',preprocessor),
    ('model', Ridge())
])

final.fit(X_train,y_train)
y_pred = final.predict(x_test)

print("Test RMSE:", mean_squared_error(y_test, y_pred, squared=False))
print("MAE:", mean_absolute_error(y_test, y_pred))
print("R2 Score:", r2_score(y_test, y_pred))

**I have chosen Ridge() because it handles multicollinearity well!**

<div style="font-family: 'Segoe UI', sans-serif; max-width:600px; margin:20px auto; border-radius:8px; overflow:hidden; box-shadow: 0 4px 12px rgba(0,0,0,0.1);">
  <table style="border-collapse: collapse; width:100%; text-align:left;">
    <thead style="background:#005b96; color:#fff;">
      <tr>
        <th style="padding:12px;">Metric</th>
        <th style="padding:12px;">Value</th>
        <th style="padding:12px;">What It Means</th>
      </tr>
    </thead>
    <tbody style="font-size:15px;">
      <tr style="background:#f9f9f9;">
        <td style="padding:10px;"><strong>RMSE</strong></td>
        <td style="padding:10px;">5.18 g/km</td>
        <td style="padding:10px;">Avg. deviation from actual CO₂ values — smaller means more accurate</td>
      </tr>
      <tr>
        <td style="padding:10px;"><strong>MAE</strong></td>
        <td style="padding:10px;">3.08 g/km</td>
        <td style="padding:10px;">Typical absolute error — model is usually off by ~3 g/km</td>
      </tr>
      <tr style="background:#f9f9f9;">
        <td style="padding:10px;"><strong>R² Score</strong></td>
        <td style="padding:10px;">0.9922</td>
        <td style="padding:10px;">Explains 99.2% of the variance — near-perfect predictive power</td>
      </tr>
    </tbody>
  </table>
</div>


## Visualizations

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Visualizations of Predicted Values and Residuals
</h2>

In [None]:
plt.figure(figsize=(8,6))
sns.scatterplot(x=y_test, y=y_pred)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], '--r')
plt.xlabel("Actual CO₂ Emissions (g/km)")
plt.ylabel("Predicted CO₂ Emissions (g/km)")
plt.title("Actual vs Predicted CO₂ Emissions")
plt.grid(True)
plt.show()

residual = y_test-y_pred
sns.histplot(residual, kde=True)
plt.title("Distribution of Residuals")
plt.xlabel("CO₂")
plt.show()


In [None]:
sns.residplot(x=y_pred, y= y_test - y_pred, lowess=True, line_kws={"color": "red"})
plt.xlabel("Predicted values")
plt.ylabel("Residuals")
plt.title("Residual Plot")
plt.axhline(0, color='gray', linestyle='--')
plt.show()

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Features—Contributions and Importance
</h2>

In [None]:
ohe = final.named_steps['preprocessor'].named_transformers_['cat_f']
cat_feature_names = ohe.get_feature_names_out(cat_features)

# Combine all feature names in correct order
all_features = num_features + list(cat_feature_names)
all_features

In [None]:
model = final.named_steps['model']
coef = pd.DataFrame({
    'Feature': all_features,
    'Coefficient': model.coef_
})
coef['abs_c'] = coef['Coefficient'].abs()
coef  = coef.sort_values(by='abs_c',ascending=False).drop('abs_c',axis=1).reset_index(drop=True)
coef.head(10)

In [None]:
plt.figure(figsize=(10,6))
sns.barplot(data=coef.head(10), x='Coefficient', y='Feature')
plt.title(f"Top {10} Most Important Features")
plt.grid(True)
plt.show()

<h2 style="
  display: inline-block;
  width: 90%;
  padding: 15px 30px;
  background: linear-gradient(to right, #393E46, #222831); /* Dark gradient */
  color: white;
  font-size: 24px;
  font-weight: bold;
  border-radius: 12px;
  box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.2);
  text-align: center;
  text-shadow: 0 0 5px #8B5DFF, 0 0 5px #8B5DFF, 0 0 7px #8B5DFF;
">
  Model Exporting.
</h2>

In [None]:
import joblib
joblib.dump(final, 'co2_pipeline_model.pkl')


**Now the model has encoding + scaling + model in it. so this way, at a time of deployment**

**input_dict = features in dictionary form**

**user_input = pd.DataFrame([input_dict])**
**prediction = model.predict(user_input)**

**we don't expose this many features to users; the pipeline handles automatically**

## End Message