<a href="https://colab.research.google.com/github/2303A52144/ExplainableAI_Assignment/blob/main/Assignment3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Car Evaluation using Decision Tree & LIME**

**Introduction:**
This report focuses on predicting car acceptability based on categorical automotive attributes using a Decision Tree Classifier.
The objective is twofold:
1.	Train a Decision Tree model on the Car Evaluation dataset.
2.	Apply LIME (Local Interpretable Model-agnostic Explanations) to interpret feature contributions for specific predictions.
Dataset Description:
Source: UCI Car Evaluation Dataset
Size:
•	Samples: 1728 cars
•	Features: 6 categorical features
Features:
•	Buying
•	Maintenance (maint)
•	Doors
•	Persons (capacity)
•	Luggage Boot Size (lug_boot)
•	Safety
Target Variable:
•	Class (unacc, acc, good, vgood)

**Preprocessing Steps:**

Categorical Encoding:

Applied Label Encoding to all categorical features.
Data Splitting: 80% training and 20% testing using train_test_split(random_state=42).




**Model & Performance:**

Algorithm: Decision Tree Classifier
Parameters: Default (random_state=42)
Classification Report
               precision     recall     f1-score     support

           0       0.97         0.92          0.94             83
           1       0.62         0.91          0.74             11
           2       1.00         1.00         1.00             235
           3       1.00         0.94          0.97             17

    accuracy                                       0.97        346

   macro avg         0.90      0.94      0.91        346

  weighted avg       0.98      0.97      0.98       346

Accuracy: 97%

Class 2 (unacc/acc depending on encoding) predicted perfectly.
     
Class 1 had slightly lower precision (0.62).

**LIME Analysis:**
Instance Explained: Test sample #5
Predicted Class: acc
Feature Contributions
maint=vhigh   → -0.0678
safety=med    → +0.0507
persons=4     → +0.0496
buying=med    → +0.0263
lug_boot=med  → +0.0207
doors=4       → +0.0116



**Interpretation**

•	Positive Influences: Medium safety, seating for 4 persons, and medium luggage space increased the likelihood of the car being classified as "acceptable."

**Negative Influence:**

High maintenance cost (maint=vhigh) reduced the probability of acceptance.
•	Other features contributed slightly positively.

**Conclusion:**

The Decision Tree achieved 97% accuracy on the test data.
LIME explanations provided clear, local insights into the model’s reasoning.

**Key Takeaway:**

Safety and passenger capacity are strong indicators for car acceptability, while high maintenance costs discourage acceptability.


In [None]:
try:
    import lime
except ModuleNotFoundError:
    import sys
    !{sys.executable} -m pip install -q lime
    import lime

import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
import lime.lime_tabular

columns = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']
df = pd.read_csv('car.data', names=columns)

label_encoders = {}
for col in df.columns:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le

X = df.drop('class', axis=1)
y = df['class']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(" Classification Report:\n")
print(classification_report(y_test, y_pred))

categorical_names = {i: label_encoders[col].classes_ for i, col in enumerate(X.columns)}

explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train.values,
    feature_names=X.columns.tolist(),
    class_names=label_encoders['class'].classes_,
    categorical_features=list(range(X.shape[1])),
    categorical_names=categorical_names,
    mode='classification'
)

i = 5
instance = X_test.values[i]

exp = explainer.explain_instance(
    data_row=instance,
    predict_fn=clf.predict_proba,
    num_features=6
)

print("\n LIME Explanation for test instance #5:")
predicted_class = clf.predict([instance])[0]
decoded_class = label_encoders['class'].inverse_transform([predicted_class])[0]
print(f"Predicted class: {decoded_class}")
print("Feature contributions:")
for feature, weight in exp.as_list():
    print(f"{feature}: {weight:.4f}")


 Classification Report:

              precision    recall  f1-score   support

           0       0.97      0.92      0.94        83
           1       0.62      0.91      0.74        11
           2       1.00      1.00      1.00       235
           3       1.00      0.94      0.97        17

    accuracy                           0.97       346
   macro avg       0.90      0.94      0.91       346
weighted avg       0.98      0.97      0.98       346


 LIME Explanation for test instance #5:
Predicted class: acc
Feature contributions:
maint=vhigh: -0.0626
safety=med: 0.0441
persons=4: 0.0309
buying=med: 0.0282
lug_boot=med: 0.0111
doors=4: 0.0008


## **Mushroom Classification using Random Forest & LIME**
**Introduction:**


This project focuses on predicting whether a mushroom is edible or poisonous based on its physical and categorical attributes. The Mushroom dataset from the UCI Machine Learning Repository was used.
The goal of this study is twofold:

Train a Random Forest Classifier to distinguish between edible and poisonous mushrooms.


Apply LIME (Local Interpretable Model-agnostic Explanations) to interpret the feature contributions behind individual predictions.
Dataset Description:
   
**Source:**

UCI Machine Learning Repository – Agaricus and Lepiota Mushroom dataset
  
  
  Samples: ~8,124 mushrooms
  
  
  Features: 22 categorical attributes
  
  Target Variable:
•	class: edible (e) or poisonous (p)

**Tasks**
1.	Load dataset
2.	Train Random Forest
3.	Apply LIME
4.	Interpret results

**Preprocessing Steps**

•	Categorical Encoding: Applied Label Encoding to convert features into numeric form.

•	Data Splitting: 80% training and 20% testing (train_test_split(random_state=42)).

**Model & Performance**

Algorithm: Random Forest Classifier

•	Parameters:

n_estimators=100

random_state=42

**Classification Report**

Class	     Precision	   Recall	   F1-Score	  Support

0 (Edible)	1.00	  1.00	  1.00	  843

1 (Poisonous)	1.00	  1.00	  1.00	  782


Accuracy: 1.00

Macro Avg F1: 1.00

Weighted Avg F1: 1.00

The Random Forest model achieved perfect accuracy (100%), showing the dataset is highly separable based on features like odor, gill properties, and stalk surface.


**LIME Analysis**

Instance Explained: Test sample #5

Predicted Class: poisonous

Feature Contributions:

odor=y → +0.1129

gill-color=b → +0.1062

gill-size=n → +0.1044

stalk-surface-above-ring=k → +0.0873

gill-spacing=c → +0.0584

 bruises=f → +0.0468

**Interpretation:**

•	The odor (y = foul smell) is the strongest indicator for poisonous classification.

•	Other features like gill color and gill size also contribute positively towards predicting "poisonous".

**Conclusion:**

•	Key Insights:

The Mushroom dataset is highly predictable with Random Forest achieving 100% accuracy.

LIME explanations reveal that features like odor, gill size, and gill color are critical in identifying poisonous mushrooms.

**•	Limitations:**

The dataset is entirely categorical, so real-world continuous measurements are missing.

Model might not generalize if unseen mushroom species are introduced.

**•	Improvements:**

Compare with other interpretable models like Decision Trees.

Experiment with SHAP for global + local interpretability.

Validate with real-world mushroom identification cases.


In [None]:
try:
    import lime
except ModuleNotFoundError:
    import sys
    !{sys.executable} -m pip install -q lime
    import lime

import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
import lime.lime_tabular

# Load dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data"
columns = [
    'class','cap-shape','cap-surface','cap-color','bruises','odor',
    'gill-attachment','gill-spacing','gill-size','gill-color',
    'stalk-shape','stalk-root','stalk-surface-above-ring','stalk-surface-below-ring',
    'stalk-color-above-ring','stalk-color-below-ring','veil-type','veil-color',
    'ring-number','ring-type','spore-print-color','population','habitat'
]
df = pd.read_csv(url, names=columns)

# Encode categorical features
label_encoders = {}
for col in df.columns:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le

X = df.drop('class', axis=1)
y = df['class']

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Random Forest
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Evaluate
y_pred = clf.predict(X_test)
print(" Classification Report:\n")
print(classification_report(y_test, y_pred))

# LIME Explanation
categorical_names = {i: label_encoders[col].classes_ for i, col in enumerate(X.columns)}

explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train.values,
    feature_names=X.columns.tolist(),
    class_names=label_encoders['class'].classes_,
    categorical_features=list(range(X.shape[1])),
    categorical_names=categorical_names,
    mode='classification'
)

i = 5
instance = X_test.values[i]

exp = explainer.explain_instance(
    data_row=instance,
    predict_fn=clf.predict_proba,
    num_features=6
)

print("\n LIME Explanation for test instance #5:")
predicted_class = clf.predict([instance])[0]
decoded_class = label_encoders['class'].inverse_transform([predicted_class])[0]
print(f"Predicted class: {decoded_class}")
print("Feature contributions:")
for feature, weight in exp.as_list():
    print(f"{feature}: {weight:.4f}")


 Classification Report:

              precision    recall  f1-score   support

           0       1.00      1.00      1.00       843
           1       1.00      1.00      1.00       782

    accuracy                           1.00      1625
   macro avg       1.00      1.00      1.00      1625
weighted avg       1.00      1.00      1.00      1625


 LIME Explanation for test instance #5:
Predicted class: p
Feature contributions:
odor=y: 0.1129
gill-color=b: 0.1062
gill-size=n: 0.1044
stalk-surface-above-ring=k: 0.0873
gill-spacing=c: 0.0584
bruises=f: 0.0468
