**Problem Statement:**

The problem at hand involves utilizing ensemble learning techniques, particularly the Random Forest Classifier, to predict the safety of a car based on a given dataset. The objective is to build a reliable model that can assess the safety level of a car using various features and attributes.

**Problem Description:**

Car safety is a critical consideration for both manufacturers and consumers. Assessing the safety of a car can be a complex task, as it depends on multiple factors such as design, engineering, and technological features. In this project, we aim to create a predictive model using Random Forest, a popular ensemble learning method, to evaluate car safety.

**Dataset:**

The dataset for this project is sourced from Kaggle and includes a collection of features and labels related to car safety. The dataset can be accessed at the following link: [Car Evaluation Data Set](https://www.kaggle.com/datasets/elikplim/car-evaluation-data-set).

In [1]:
# Importing the libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import LabelEncoder
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
# Load the dataset (replace 'your_dataset.csv' with your actual dataset file)
data = pd.read_csv('car_evaluation.csv')
data.head()

Unnamed: 0,vhigh,vhigh.1,2,2.1,small,low,unacc
0,vhigh,vhigh,2,2,small,med,unacc
1,vhigh,vhigh,2,2,small,high,unacc
2,vhigh,vhigh,2,2,med,low,unacc
3,vhigh,vhigh,2,2,med,med,unacc
4,vhigh,vhigh,2,2,med,high,unacc


In [3]:
data.shape

(1727, 7)

As we can see that there are 1728 instances and 7 variables in the data set

In [4]:
data.isnull().sum()

vhigh      0
vhigh.1    0
2          0
2.1        0
small      0
low        0
unacc      0
dtype: int64

In [5]:
# Renaming column names
col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']
data.columns = col_names

In [6]:
# Initializing the LabelEncoder
label_encoder = LabelEncoder()

In [7]:
# Iterating through the columns and encode categorical variables
for column in data.columns:
    if data[column].dtype == 'object':
        data[column] = label_encoder.fit_transform(data[column])

In [8]:
data.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,class
0,3,3,0,0,2,2,2
1,3,3,0,0,2,0,2
2,3,3,0,0,1,1,2
3,3,3,0,0,1,2,2
4,3,3,0,0,1,0,2


In [9]:
# Splitting the data into features (X) and the target variable (y)
X = data.drop('class', axis=1)
y = data['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [10]:
# Creating the Random Forest Classifier
classifier = RandomForestClassifier(n_estimators=100, random_state=42)

In [11]:
# Training the model on the training data and making predictions on the test data
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)


In [12]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Generate confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)

# Generate classification report
class_report = classification_report(y_test, y_pred)

In [13]:
# Print the results
print(f'Accuracy: {accuracy}')
print('\n')
print('Confusion Matrix:')
print(conf_matrix)
print('\n')
print('Classification Report:')
print(class_report)


Accuracy: 0.9624277456647399


Confusion Matrix:
[[ 72   1   3   1]
 [  2  10   0   3]
 [  1   0 236   0]
 [  2   0   0  15]]


Classification Report:
              precision    recall  f1-score   support

           0       0.94      0.94      0.94        77
           1       0.91      0.67      0.77        15
           2       0.99      1.00      0.99       237
           3       0.79      0.88      0.83        17

    accuracy                           0.96       346
   macro avg       0.91      0.87      0.88       346
weighted avg       0.96      0.96      0.96       346



In [14]:
new_input = pd.DataFrame({
    'buying': [2],
    'maint': [1],
    'doors': [3],
    'doors': [4],
    'persons': [2],
    'lug_boot': [2],
    'safety': [1],
})

# Make predictions
predictions = classifier.predict(new_input)

# Print the predictions
print(predictions)

[2]
