<a href="https://colab.research.google.com/github/Mohammadi-Nilofer/ML-assignments/blob/Logistic-Regression/Car_LR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Car Acceptability Classification using Logistic Regression model


**Context:**

The automotive industry is highly competitive, and understanding customer preferences is crucial for success. Car manufacturers need to identify the key features that influence a car's acceptability to potential buyers.

**Objective:**

To build a Logistic Regression model that predicts a car's acceptability (e.g., unacceptable, acceptable, good, very good) based on criteria such as buying price, maintenance cost, number of doors, person capacity, luggage size, and safety rating. This model will help manufacturers understand customer preferences and design cars that are more likely to be accepted by the target market.





**Data Dictionary:**

Buying_Price - Categorical Data [vhigh, high, med, low]

Maintenance_Price - Categorical Data [vhigh, high, med, low]

No_of_Doors - Categorical Data [2, 3, 4, 5more]

Person_Capacity - Categorical Data [2, 4, more]

Size_of_Luggage - Categorical Data [small, med, big]

Safety - Categorical Data [low, med, high]

Car_Acceptability - Categorical Data [unacc, acc, good, vgood]

In [1]:
#Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
import warnings
warnings.filterwarnings('ignore')

In [2]:
#Loading the dataset
df=pd.read_csv('/content/car.csv')

In [3]:
df.head()

Unnamed: 0,Buying_Price,Maintenance_Price,No_of_Doors,Person_Capacity,Size_of_Luggage,Safety,Car_Acceptability
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


In [4]:
#checking info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1728 entries, 0 to 1727
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Buying_Price       1728 non-null   object
 1   Maintenance_Price  1728 non-null   object
 2   No_of_Doors        1728 non-null   object
 3   Person_Capacity    1728 non-null   object
 4   Size_of_Luggage    1728 non-null   object
 5   Safety             1728 non-null   object
 6   Car_Acceptability  1728 non-null   object
dtypes: object(7)
memory usage: 94.6+ KB


In [5]:
#Changing the data type for column No_of_Doors
df['No_of_Doors'] = pd.to_numeric(df['No_of_Doors'], errors='coerce').fillna(0).astype(int) # Converting 'No_of_Doors' to numeric, handling errors and filling NaNs
df['Person_Capacity'] = pd.to_numeric(df['Person_Capacity'], errors='coerce').fillna(0).astype(int)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1728 entries, 0 to 1727
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Buying_Price       1728 non-null   object
 1   Maintenance_Price  1728 non-null   object
 2   No_of_Doors        1728 non-null   int64 
 3   Person_Capacity    1728 non-null   int64 
 4   Size_of_Luggage    1728 non-null   object
 5   Safety             1728 non-null   object
 6   Car_Acceptability  1728 non-null   object
dtypes: int64(2), object(5)
memory usage: 94.6+ KB


In [7]:
#Checking for null values
df.isnull().sum()

Unnamed: 0,0
Buying_Price,0
Maintenance_Price,0
No_of_Doors,0
Person_Capacity,0
Size_of_Luggage,0
Safety,0
Car_Acceptability,0


In [8]:
df.columns

Index(['Buying_Price', 'Maintenance_Price', 'No_of_Doors', 'Person_Capacity',
       'Size_of_Luggage', 'Safety', 'Car_Acceptability'],
      dtype='object')

In [9]:
from sklearn.preprocessing import LabelEncoder

categorical_cols = ['Buying_Price', 'Maintenance_Price','Size_of_Luggage', 'Safety', 'Car_Acceptability']

# Creating a LabelEncoder object
le = LabelEncoder()

# Applying label encoding to each categorical column
for col in categorical_cols:
    df[col] = le.fit_transform(df[col])

# Displaying the updated DataFrame to verify changes
df.head()

Unnamed: 0,Buying_Price,Maintenance_Price,No_of_Doors,Person_Capacity,Size_of_Luggage,Safety,Car_Acceptability
0,3,3,2,2,2,1,2
1,3,3,2,2,2,2,2
2,3,3,2,2,2,0,2
3,3,3,2,2,1,1,2
4,3,3,2,2,1,2,2


In [10]:
# segregating the data into input and output variables
X=df.drop('Car_Acceptability',axis=1)
y=df['Car_Acceptability']

In [11]:
#Feature scaling
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X=sc.fit_transform(X)

In [12]:
#Splitting the data into train and test model
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=5)

In [13]:
#Creating the Logistic Regression Model
model=LogisticRegression()

#Fitting the data
model.fit(X_train,y_train)

In [14]:
#Taking the Predictions from the model
y_train_pred=model.predict(X_train)
y_test_pred=model.predict(X_test)

In [15]:
#model Evaluation
print("Training Accuracy:",metrics.accuracy_score(y_train,y_train_pred))
print("Testing Accuracy:",metrics.accuracy_score(y_test,y_test_pred))

Training Accuracy: 0.6936728395061729
Testing Accuracy: 0.7175925925925926


In [16]:
from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {
    'penalty': ['l1', 'l2'],
    'C': [0.001, 0.01, 0.1, 1, 10, 100],
    'solver': ['liblinear', 'saga'],
    'max_iter': [100, 200, 300]
}

# Create GridSearchCV object
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy')

# Fit the grid search to the data
grid_search.fit(X_train, y_train)

# Get the best parameters and best score
print("Best Parameters:", grid_search.best_params_)
print("Best Score:", grid_search.best_score_)

# Get the best model
best_model = grid_search.best_estimator_

Best Parameters: {'C': 10, 'max_iter': 100, 'penalty': 'l1', 'solver': 'saga'}
Best Score: 0.6936738936738938


In [17]:
#model Evaluation
print("Training Accuracy:",metrics.accuracy_score(y_train,y_train_pred))
print("Testing Accuracy:",metrics.accuracy_score(y_test,y_test_pred))

Training Accuracy: 0.6936728395061729
Testing Accuracy: 0.7175925925925926


**Observations :**

* The model is performing well on testing data compare to training data, but the over all model performance can be still optimized.