<img src="https://driving-tests.org/wp-content/uploads/2013/12/gas-pump.jpg" style="height: 400px; width: auto;"/>


**1. Importing Required Libraries**

In [18]:
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.preprocessing import LabelEncoder
import pandas as pd


**2. Loading Dataset**

In [2]:
df = pd.read_csv('FuelConsumption.csv')


**3. Exploring Dataset**

In [19]:
# display column nmaes in dataset
df.columns

Index(['MODELYEAR', 'MAKE', 'MODEL', 'VEHICLECLASS', 'ENGINESIZE', 'CYLINDERS',
       'TRANSMISSION', 'FUELTYPE', 'FUELCONSUMPTION_CITY',
       'FUELCONSUMPTION_HWY', 'FUELCONSUMPTION_COMB',
       'FUELCONSUMPTION_COMB_MPG', 'CO2EMISSIONS'],
      dtype='object')

In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1067 entries, 0 to 1066
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   MODELYEAR                 1067 non-null   int64  
 1   MAKE                      1067 non-null   object 
 2   MODEL                     1067 non-null   object 
 3   VEHICLECLASS              1067 non-null   object 
 4   ENGINESIZE                1067 non-null   float64
 5   CYLINDERS                 1067 non-null   int64  
 6   TRANSMISSION              1067 non-null   object 
 7   FUELTYPE                  1067 non-null   int64  
 8   FUELCONSUMPTION_CITY      1067 non-null   float64
 9   FUELCONSUMPTION_HWY       1067 non-null   float64
 10  FUELCONSUMPTION_COMB      1067 non-null   float64
 11  FUELCONSUMPTION_COMB_MPG  1067 non-null   int64  
 12  CO2EMISSIONS              1067 non-null   int64  
dtypes: float64(4), int64(5), object(4)
memory usage: 108.5+ KB


**4. Preparing Dataset for feeding to Model**

In [21]:
# One-hot encode categorical columns
df_encoded = pd.get_dummies(df, columns=['MAKE', 'MODEL', 'VEHICLECLASS', 'TRANSMISSION'])

# Assuming 'FUELTYPE' is the target column
label_encoder = LabelEncoder()
df_encoded['FUELTYPE'] = label_encoder.fit_transform(df_encoded['FUELTYPE'])

# Selecting features and target variable
features = df_encoded.drop(['FUELTYPE'], axis=1)
target = df_encoded['FUELTYPE']

**5. Splitting Dataset into Training and Testing**

In [None]:
# Splitting data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)



In [23]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Index: 853 entries, 333 to 860
Columns: 748 entries, MODELYEAR to TRANSMISSION_M7
dtypes: bool(740), float64(4), int64(4)
memory usage: 676.4 KB


In [25]:
y_test.info()

<class 'pandas.core.series.Series'>
Index: 214 entries, 732 to 941
Series name: FUELTYPE
Non-Null Count  Dtype
--------------  -----
214 non-null    int64
dtypes: int64(1)
memory usage: 3.3 KB


**6. Initialize classifiers with different kernels**

In [26]:
# 
kernels = ['rbf', 'sigmoid', 'poly', 'linear']
f1_scores = {}

**7. Making Predictions on different Kernals**

In [27]:
for kernel in kernels:
    clf = svm.SVC(kernel=kernel)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    f1 = f1_score(y_test, y_pred, average='weighted')
    f1_scores[kernel] = f1

**8. Display F1 scores for each kernel**

In [28]:
print("F1 Scores:")
for kernel, score in f1_scores.items():
    print(f"{kernel.capitalize()} Kernel: {score}")


F1 Scores:
Rbf Kernel: 0.3333333333333333
Sigmoid Kernel: 0.3333333333333333
Poly Kernel: 0.33125
Linear Kernel: 0.9532710280373832


## Kernel Selection for SVM Classifier

### Overview

The SVM classifier was trained on the dataset to predict fuel types using different kernels: RBF, Sigmoid, Polynomial, and Linear. The F1 scores were calculated to evaluate the performance of each kernel.

### Results

- **RBF Kernel**: F1 Score = 0.3333
- **Sigmoid Kernel**: F1 Score = 0.3333
- **Poly Kernel**: F1 Score = 0.3312
- **Linear Kernel**: F1 Score = 0.9532

### Conclusion

The F1 scores demonstrate the performance of the SVM classifier with different kernels. In this scenario, the Linear Kernel vastly outperformed the others, achieving an F1 score of 0.9533. This significant improvement suggests that a linear decision boundary is better suited for predicting fuel types based on the dataset's features. The other kernels, RBF, Sigmoid, and Polynomial, yielded substantially lower F1 scores, indicating their inadequacy in capturing the underlying patterns in the data effectively.