#  Importing Required Libraries

In this section, we import all the essential Python libraries required for data loading, preprocessing, encoding, model training, and evaluation.


In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")



# Loading the Dataset

Here, we read the Car Evaluation dataset using `pandas` and take a quick look at its structure.


In [3]:
df=pd.read_csv("data/car.csv")

#  Feature and Target Separation

We now separate the input features from the target label (`acceptability`) for further processing.


In [4]:
X=df.drop(columns=['acceptability'],axis=1)

In [9]:
Y=df['acceptability']

#  Encoding Categorical Features and Preprocessing Pipeline

All input features are categorical. We use `OneHotEncoder` to convert them into numerical format so that machine learning models can understand them.

We create a preprocessing pipeline using `ColumnTransformer` and `OneHotEncoder` to handle categorical feature transformation in a structured and repeatable way.

In [6]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

oh_transformer=OneHotEncoder()

cat_features=X.select_dtypes(include="object").columns
preprocessor=ColumnTransformer([
    ("OneHotEncoder",oh_transformer,cat_features)
]
)


In [7]:
X=preprocessor.fit_transform(X)


#  Splitting Data into Train and Test Sets

We split the dataset into training and testing subsets (80% train, 20% test) to evaluate the model’s generalization performance.


In [14]:
from sklearn.model_selection import train_test_split
X_Train,X_Test,Y_Train,Y_Test=train_test_split(X,Y,test_size=0.2,random_state=42)


#  Model Evaluation

We evaluate the model using accuracy score, classification report to check how well it performs on unseen data.


In [12]:
from sklearn.metrics import accuracy_score, classification_report

def evaluate(model, X, y_true):
    y_pred = model.predict(X)
    print("Accuracy:", accuracy_score(y_true, y_pred))
    print("Classification Report:\n", classification_report(y_true, y_pred))


#  Training the Machine Learning Model

We train a `RandomForestClassifier` on the processed training data. This model is suitable for handling categorical data and works well out-of-the-box.


In [13]:
from sklearn.ensemble import RandomForestClassifier
models={
    "Random Forest Classifier":RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
}
model_list=[]

for i in range(len(models)):
    model=list(models.values())[i]
    model.fit(X_Train,Y_Train)


    evaluate(model,X_Train,Y_Train)
    evaluate(model,X_Test,Y_Test)


Accuracy: 1.0
Classification Report:
               precision    recall  f1-score   support

         acc       1.00      1.00      1.00       301
        good       1.00      1.00      1.00        58
       unacc       1.00      1.00      1.00       975
       vgood       1.00      1.00      1.00        48

    accuracy                           1.00      1382
   macro avg       1.00      1.00      1.00      1382
weighted avg       1.00      1.00      1.00      1382

Accuracy: 0.953757225433526
Classification Report:
               precision    recall  f1-score   support

         acc       0.94      0.88      0.91        83
        good       0.60      0.82      0.69        11
       unacc       0.98      1.00      0.99       235
       vgood       0.93      0.76      0.84        17

    accuracy                           0.95       346
   macro avg       0.86      0.87      0.86       346
weighted avg       0.96      0.95      0.95       346

