# 碳排預測範例
程式碼參考: [Predicting Carbon Emission with 98.9% accuracy score](https://www.kaggle.com/code/pavankumarmantha/predicting-carbon-emission-with-98-9-accuracy/notebook) 如有侵權將立即撤下。

此 notebook 為一碳排預測示例，幫助使用者從現有資料推敲可能的未來碳排量。

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.decomposition import TruncatedSVD

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor

## Data Loading
[範例資料集來源](https://www.kaggle.com/datasets/rinichristy/2022-fuel-consumption-ratings)

此範例資料集提供了加拿大市售輕型車輛各車型的燃料消耗率和估計的碳排放量。

車型 Model:
* 4WD/4X4 = 四輪驅動 (Four-wheel drive)
* AWD = 全時四輪驅動 (All-wheel drive)
* FFV = 柔性燃料車 (Flexible-fuel vehicle)
* SWB = 短軸距 (Short wheelbase)
* LWB = 長軸距 (Long wheelbase)
* EWB = 加長軸距 (Extended wheelbase)

傳動系統 Transmission:
* A = 自動 (Automatic)
* AM = 自動手動 (Automated Manual)
* AS = 具有選擇擋位的自動 (Automatic with Select Shift)
* AV = 無段變速 (Continuously Variable)
* M = 手動 (Manual)
* 3 - 10 = 擋位數量 (Number of Gears)

燃料類型 Fuel type:
* X = 普通汽油 (Regular gasoline)
* Z = 高級汽油 (Premium gasoline)
* D = 柴油 (Diesel)
* E = 乙醇 (E85)
* N = 天然氣 (Natural gas)

燃料消耗 Fuel consumption: 含以每百公里的公升數 (L/100 km) 表示的城市和高速公路的燃料消耗率，以及以 L/100 km 和每英制加侖的英里數 (mpg) 表示的結合評級 (55%城市，45%高速公路)。

二氧化碳排放量 CO2 emissions: 城市和高速公路行駛的尾氣二氧化碳排放量以每公里克數 (g/km) 表示。

二氧化碳評級 CO2 rating: 排氣管碳排按照 1 (最差) 到 10 (最佳) 的評分標度進行評級。

煙霧評級 Smog rating: 排氣管形成的煙霧污染物按照 1 (最差) 到 10 (最佳) 的評分標度進行評級。

In [2]:
project_data = pd.read_csv("MY2022 Fuel Consumption Ratings.csv")

## Data Viewing

In [3]:
project_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 946 entries, 0 to 945
Data columns (total 15 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Model Year                         946 non-null    int64  
 1   Make                               946 non-null    object 
 2   Model                              946 non-null    object 
 3   Vehicle Class                      946 non-null    object 
 4   Engine Size(L)                     946 non-null    float64
 5   Cylinders                          946 non-null    int64  
 6   Transmission                       946 non-null    object 
 7   Fuel Type                          946 non-null    object 
 8   Fuel Consumption (City (L/100 km)  946 non-null    float64
 9   Fuel Consumption(Hwy (L/100 km))   946 non-null    float64
 10  Fuel Consumption(Comb (L/100 km))  946 non-null    float64
 11  Fuel Consumption(Comb (mpg))       946 non-null    int64  

In [4]:
project_data.describe()

Unnamed: 0,Model Year,Engine Size(L),Cylinders,Fuel Consumption (City (L/100 km),Fuel Consumption(Hwy (L/100 km)),Fuel Consumption(Comb (L/100 km)),Fuel Consumption(Comb (mpg)),CO2 Emissions(g/km),CO2 Rating,Smog Rating
count,946.0,946.0,946.0,946.0,946.0,946.0,946.0,946.0,946.0,946.0
mean,2022.0,3.198732,5.668076,12.506448,9.363319,11.092072,27.247357,259.172304,4.539112,4.950317
std,0.0,1.374814,1.93267,3.452043,2.285125,2.876276,7.685217,64.443149,1.471799,1.679842
min,2022.0,1.2,3.0,4.0,3.9,4.0,11.0,94.0,1.0,1.0
25%,2022.0,2.0,4.0,10.2,7.7,9.1,22.0,213.25,3.0,3.0
50%,2022.0,3.0,6.0,12.2,9.2,10.8,26.0,257.0,5.0,5.0
75%,2022.0,3.8,6.0,14.7,10.7,12.9,31.0,300.75,5.0,6.0
max,2022.0,8.0,16.0,30.3,20.9,26.1,71.0,608.0,10.0,7.0


In [5]:
project_data.columns

Index(['Model Year', 'Make', 'Model', 'Vehicle Class', 'Engine Size(L)',
       'Cylinders', 'Transmission', 'Fuel Type',
       'Fuel Consumption (City (L/100 km)', 'Fuel Consumption(Hwy (L/100 km))',
       'Fuel Consumption(Comb (L/100 km))', 'Fuel Consumption(Comb (mpg))',
       'CO2 Emissions(g/km)', 'CO2 Rating', 'Smog Rating'],
      dtype='object')

In [6]:
project_data.head()

Unnamed: 0,Model Year,Make,Model,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption (City (L/100 km),Fuel Consumption(Hwy (L/100 km)),Fuel Consumption(Comb (L/100 km)),Fuel Consumption(Comb (mpg)),CO2 Emissions(g/km),CO2 Rating,Smog Rating
0,2022,Acura,ILX,Compact,2.4,4,AM8,Z,9.9,7.0,8.6,33,200,6,3
1,2022,Acura,MDX SH-AWD,SUV: Small,3.5,6,AS10,Z,12.6,9.4,11.2,25,263,4,5
2,2022,Acura,RDX SH-AWD,SUV: Small,2.0,4,AS10,Z,11.0,8.6,9.9,29,232,5,6
3,2022,Acura,RDX SH-AWD A-SPEC,SUV: Small,2.0,4,AS10,Z,11.3,9.1,10.3,27,242,5,6
4,2022,Acura,TLX SH-AWD,Compact,2.0,4,AS10,Z,11.2,8.0,9.8,29,230,5,7


In [7]:
project_data.tail()

Unnamed: 0,Model Year,Make,Model,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption (City (L/100 km),Fuel Consumption(Hwy (L/100 km)),Fuel Consumption(Comb (L/100 km)),Fuel Consumption(Comb (mpg)),CO2 Emissions(g/km),CO2 Rating,Smog Rating
941,2022,Volvo,XC40 T5 AWD,SUV: Small,2.0,4,AS8,Z,10.7,7.7,9.4,30,219,5,5
942,2022,Volvo,XC60 B5 AWD,SUV: Small,2.0,4,AS8,Z,10.5,8.1,9.4,30,219,5,5
943,2022,Volvo,XC60 B6 AWD,SUV: Small,2.0,4,AS8,Z,11.0,8.7,9.9,29,232,5,7
944,2022,Volvo,XC90 T5 AWD,SUV: Standard,2.0,4,AS8,Z,11.5,8.4,10.1,28,236,5,5
945,2022,Volvo,XC90 T6 AWD,SUV: Standard,2.0,4,AS8,Z,12.4,8.9,10.8,26,252,5,7


## Missing data analysis

In [8]:
project_data.isna().sum()

Model Year                           0
Make                                 0
Model                                0
Vehicle Class                        0
Engine Size(L)                       0
Cylinders                            0
Transmission                         0
Fuel Type                            0
Fuel Consumption (City (L/100 km)    0
Fuel Consumption(Hwy (L/100 km))     0
Fuel Consumption(Comb (L/100 km))    0
Fuel Consumption(Comb (mpg))         0
CO2 Emissions(g/km)                  0
CO2 Rating                           0
Smog Rating                          0
dtype: int64

In [9]:
print("size:",project_data.size)
print("size:",project_data.shape)

size: 14190
size: (946, 15)


## Data Preprocessing pipeline

In [10]:
x_train=project_data.drop(columns="CO2 Emissions(g/km)")
y_train=project_data["CO2 Emissions(g/km)"]

### Classify into Numeric data & Categorical data 

In [11]:
numric_columns=x_train.select_dtypes(exclude='object').columns
print(numric_columns)
print('-'*100)
categorical_columns = x_train.select_dtypes(include='object').columns
print(categorical_columns)

Index(['Model Year', 'Engine Size(L)', 'Cylinders',
       'Fuel Consumption (City (L/100 km)', 'Fuel Consumption(Hwy (L/100 km))',
       'Fuel Consumption(Comb (L/100 km))', 'Fuel Consumption(Comb (mpg))',
       'CO2 Rating', 'Smog Rating'],
      dtype='object')
----------------------------------------------------------------------------------------------------
Index(['Make', 'Model', 'Vehicle Class', 'Transmission', 'Fuel Type'], dtype='object')


In [12]:
numeric_feature = Pipeline(steps=[('handlingmissing',SimpleImputer(strategy='median')),('scaling',StandardScaler(with_mean=False))])
print(numeric_feature)

categorical_feature = Pipeline(steps=[('handlingmissing',SimpleImputer(strategy='most_frequent')),('encoding',OneHotEncoder()),('scaling',StandardScaler(with_mean=False))])
print(categorical_feature)

processing = ColumnTransformer([('numeic',numeric_feature,numric_columns), ('cat',categorical_feature,categorical_columns)])
processing

Pipeline(steps=[('handlingmissing', SimpleImputer(strategy='median')),
                ('scaling', StandardScaler(with_mean=False))])
Pipeline(steps=[('handlingmissing', SimpleImputer(strategy='most_frequent')),
                ('encoding', OneHotEncoder()),
                ('scaling', StandardScaler(with_mean=False))])


## Model Creation

### Used Linear Regression, Adaboost Resgression, Gradientbooting Regression

In [13]:
model1=Pipeline(steps = [('processing',processing),("pca",TruncatedSVD(n_components=210,random_state=0)),('modeling',LinearRegression())])
model1.fit(x_train,y_train)


In [14]:
model2=Pipeline(steps = [('processing',processing),("pca",TruncatedSVD(n_components=210,random_state=0)),('modeling',AdaBoostRegressor())])
model2.fit(x_train,y_train)


In [15]:
model3=Pipeline(steps = [('processing',processing),("pca",TruncatedSVD(n_components=210,random_state=0)),('modeling',GradientBoostingRegressor())])
model3.fit(x_train,y_train)


## Model Validation and Score Computation

In [16]:
print(model1.score(x_train,y_train))
print(model2.score(x_train,y_train))
print(model3.score(x_train,y_train))

0.9891207726777261
0.9562022094561187
0.9943987033840603
