# **Practical No. 16th: Confusion Matrix**

>  **Title:** *Write program to create confusion matrix to calculate different measures to quantify the quality of the model*

> **Domain:** Machine Learning
 <br> **Technology:** Python

> **Library**:
* Pandas
* Scikit-Learn

> **Description:** A confusion matrix compares the true labels with the model’s predictions, helping to evaluate classification performance by showing true positives, false positives, true negatives, and false negatives. The program trains a classifier, predicts on test data, creates the confusion matrix, and calculates metrics like accuracy, precision, recall, and F1-score to quantify model quality. This gives a clear picture of how well the model distinguishes between classes.

> **Resource:**
  
---
<center> </center>


### Import Library

In [16]:
import kagglehub
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score

### Import Dataset

In [2]:
path = kagglehub.dataset_download("ganeshborkar31/car-dekho")

Downloading from https://www.kaggle.com/api/v1/datasets/download/ganeshborkar31/car-dekho?dataset_version_number=1...


100%|██████████| 55.8k/55.8k [00:00<00:00, 20.0MB/s]

Extracting files...





In [3]:
df = pd.read_csv(f"{path}/cardekho.csv")

In [4]:
df.head()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner


### Label encoding

In [5]:
label_encoder = LabelEncoder()

In [6]:
df['fuel'] = label_encoder.fit_transform(df['fuel'])
df['seller_type'] = label_encoder.fit_transform(df['seller_type'])
df['transmission'] = label_encoder.fit_transform(df['transmission'])
df['owner'] = label_encoder.fit_transform(df['owner'])

In [7]:
df.head()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,4,1,1,0
1,Maruti Wagon R LXI Minor,2007,135000,50000,4,1,1,0
2,Hyundai Verna 1.6 SX,2012,600000,100000,1,1,1,0
3,Datsun RediGO T Option,2017,250000,46000,4,1,1,0
4,Honda Amaze VX i-DTEC,2014,450000,141000,1,1,1,2


# Categorical target for classification

In [8]:
df['price_category'] = pd.cut(df['selling_price'], bins=3, labels=['Low', 'Medium', 'High'])

### Define X and Y

In [10]:
features = ['year', 'km_driven', 'fuel', 'seller_type', 'transmission', 'owner']
target = 'price_category'

In [11]:
X = df[features]
y = df[target]

> Shape of X and y

In [12]:
print("Shape of X:", X.shape)
print("Shape of Y:", y.shape)

Shape of X: (4340, 6)
Shape of Y: (4340,)


### Split into train test
> test size 20%

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=50)

> Shape of test train

In [14]:
print('x_train:', X_train.shape, 'y_train:', y_train.shape)
print('x_test: ', X_test.shape,  'y_test: ', y_test.shape)

x_train: (3472, 6) y_train: (3472,)
x_test:  (868, 6) y_test:  (868,)


### Define Algorithm

> K Means

In [18]:
model = DecisionTreeClassifier(random_state=42)

### Fit algorithm

In [19]:
model.fit(X_train, y_train)

### Predict result

In [20]:
y_pred = model.predict(X_test)

### Check score

In [22]:
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[852  11]
 [  0   5]]
