<a href="https://colab.research.google.com/github/Antara999333/Assignment-4---Interpretable-ML-II/blob/main/Assignment_4_Interpretable_ML_II.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [39]:
# Install the necessary libraries
!pip install imodels
!pip install seaborn

import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from imodels import BoostedRulesClassifier, SkopeRulesClassifier, GreedyRuleListClassifier

# Load the Titanic dataset from seaborn
titanic = sns.load_dataset('titanic')

# Data Preprocessing
# Dropping irrelevant columns and handling missing values
titanic = titanic.drop(columns=['deck', 'embark_town', 'alive', 'who', 'adult_male'], errors='ignore')
titanic = titanic.dropna(subset=['age', 'embarked', 'fare', 'sex'])

# Convert categorical variables to numerical
titanic['sex'] = titanic['sex'].map({'male': 0, 'female': 1})
titanic['embarked'] = titanic['embarked'].map({'C': 0, 'Q': 1, 'S': 2})

# Define features and target (using selected features)
X = titanic[['pclass', 'sex', 'age', 'fare']]
y = titanic['survived']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Function to evaluate the model
def evaluate_model(model, X_train, y_train, X_test, y_test, model_name):
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    print(f"--- {model_name} ---")
    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("\nClassification Report:\n", classification_report(y_test, y_pred))




  and should_run_async(code)




In [40]:
# Check the number of rows and columns after transformations
num_rows, num_columns = titanic.shape
print(f"Number of rows: {num_rows}, Number of columns: {num_columns}")


Number of rows: 712, Number of columns: 10


  and should_run_async(code)


# The 3 models that I have used are Boosted Rules Classifier, Scope Rules Classifier and Greedy Rule Classifier

# Boosted Rules Classifier

In [36]:

# 1. Boosted Rules Classifier
boosted_model = BoostedRulesClassifier()
evaluate_model(boosted_model, X_train, y_train, X_test, y_test, "Boosted Rules Classifier")

--- Boosted Rules Classifier ---
Accuracy: 0.7552447552447552

Classification Report:
               precision    recall  f1-score   support

           0       0.79      0.76      0.78        80
           1       0.71      0.75      0.73        63

    accuracy                           0.76       143
   macro avg       0.75      0.75      0.75       143
weighted avg       0.76      0.76      0.76       143



  and should_run_async(code)


## Explanation:

Explanation of the Boosted Rules Classifier Results
The Boosted Rules Classifier is a powerful ensemble learning method that sequentially combines weak classifiers to improve overall model accuracy. In my evaluation, the model achieved an accuracy of approximately 75.5% on the test dataset.

## The Algorithm
The Boosted Rules Classifier is an ensemble method that builds a sequence of weak learners (often simple decision trees) to create a strong predictive model. By focusing on misclassified instances, it aims to improve overall accuracy.

## Initialization:

The algorithm starts with a weak learner, such as a shallow decision tree, initialized to make predictions on the training data (X_train and y_train). The first model might not capture complex patterns due to its simplicity.

## Fitting the Model:

For my  dataset, the model is fitted to the predictors:
pclass: Integer (1, 2, or 3) representing the passenger class.
sex: Binary (0 for male, 1 for female).
age: Float (the age of the passenger).
fare: Float (the ticket fare).
The model generates initial predictions based on these features, classifying passengers as either survived (1) or not survived (0).

## Error Calculation:

The algorithm calculates the error by comparing its predictions against the actual survival labels (y_train). Instances that were misclassified are noted for increased attention in the next iteration.

##Weight Adjustment:

The weights of the misclassified instances are increased, meaning the model will prioritize these cases in the next iteration. For example, if a female passenger in first class was misclassified, the next model will give more emphasis on learning from similar instances.

##Iterative Process:

 Each new model learns from the mistakes of previous models, refining its predictions.

##Final Prediction:

The final output combines the predictions of all weak learners. Models that performed better on misclassified instances contribute more heavily to the final classification.

# Skope Rules Classifier

In [37]:
# 2. Skope Rules Classifier
skope_model = SkopeRulesClassifier()
evaluate_model(skope_model, X_train, y_train, X_test, y_test, "Skope Rules Classifier")

  and should_run_async(code)


--- Skope Rules Classifier ---
Accuracy: 0.5594405594405595

Classification Report:
               precision    recall  f1-score   support

           0       0.56      1.00      0.72        80
           1       0.00      0.00      0.00        63

    accuracy                           0.56       143
   macro avg       0.28      0.50      0.36       143
weighted avg       0.31      0.56      0.40       143



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## Explanation:
The Skope Rules Classifier is a rule-based classification algorithm that aims to create interpretable rules for predicting outcomes based on feature values. In my evaluation of my dataset, the model achieved an accuracy of approximately 55.9% on the test dataset, but the results indicate a significant limitation in its predictive capability for the positive class (survival, which I have coded as 1).

# The Algorithm

## Feature Selection:

The algorithm evaluates the importance of the predictors:
pclass: Integer
sex: Binary
age: Float
fare: Float
It may use statistical measures (e.g., correlation, p-values) to determine which features have the most impact on survival.

## Rule Generation:

Based on selected features, the algorithm generates conditional rules. For instance, it might create a rule: "If pclass = 1 and sex = 1, then survived = 1," suggesting that a first-class female is likely to survive.

##Rule Evaluation:

Each generated rule is tested against the training dataset to calculate its accuracy. The algorithm checks how many instances are correctly predicted by each rule.

##Rule Pruning:

The model removes less significant or redundant rules that do not improve classification accuracy, ensuring that only the most effective rules remain.

##Final Prediction:

For new instances in the test set (X_test), the model applies the selected rules to predict survival outcomes. If multiple rules apply, the first applicable one is used.

# Greedy Rules List

In [38]:
# 3. Greedy Rule List Classifier
greedy_model = GreedyRuleListClassifier()
evaluate_model(greedy_model, X_train, y_train, X_test, y_test, "Greedy Rule List Classifier")

--- Greedy Rule List Classifier ---
Accuracy: 0.7622377622377622

Classification Report:
               precision    recall  f1-score   support

           0       0.77      0.82      0.80        80
           1       0.75      0.68      0.72        63

    accuracy                           0.76       143
   macro avg       0.76      0.75      0.76       143
weighted avg       0.76      0.76      0.76       143



  and should_run_async(code)


# Explanation

The Greedy Rule List Classifier is a rule-based algorithm that uses a greedy approach to sequentially construct a list of rules, optimizing for accuracy with each added rule. In my case in my dataset, the classifier achieved an accuracy of 76.2%, which means the model correctly predicted 76.2% of the survival outcomes on my dataset.

# Algorithm
The Greedy Rule List Classifier constructs a list of rules in a sequential manner, using a greedy approach to ensure that only the most accurate rules are retained.

# Rule Initialization:

The classifier starts with an empty list of rules.

# Rule Creation:

For each predictor (pclass, sex, age, fare), potential rules are identified. For instance, a rule might look like "If sex = 1, then survived = 1," indicating that females are more likely to survive.
Rule Evaluation:

Each rule is tested against the training dataset to determine its prediction accuracy. For example, it would calculate how many times the rule correctly predicts survival.

#Greedy Selection:

The rule with the highest accuracy is selected and added to the rule list. The algorithm then removes the instances from the training set that this rule correctly predicted.
Iteration:

Steps 3-4 are repeated until no new rules can be created that improve accuracy. This ensures the model is focused on the most predictive rules without unnecessary complexity.

#Final Prediction:

When predicting for the test set, the model traverses the rule list and applies the first applicable rule to make a prediction for each instance.