<a href="https://colab.research.google.com/github/Ps1231/Data-Science-Tutotial-Using-Python/blob/main/Model%20Evaluation/Model_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 5: Model Evaluation


In [None]:
# To perform model evaluation using Python, especially with a classification model,
# you can use various metrics such as accuracy, precision, recall, F1 score, and confusion matrix.

## 5.1 How to Perform Model Evaluation Using R/Python



In [6]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Load the dataset (replace 'bank.csv' with the actual path to your dataset)
bank_data = pd.read_csv('/content/bank.csv')

# Replace these columns with your actual predictor and response variable names
predictor_columns = ["age", "balance", "day", "duration", "campaign", "pdays", "previous"]
response_column = "deposit"

# Assuming categorical variables are already one-hot encoded
# If not, use bank_data = pd.get_dummies(bank_data, columns=["job", "marital", "education", "default", "housing", "loan", "contact", "month", "poutcome"])

# Convert 'yes' and 'no' to 1 and 0
bank_data[response_column] = bank_data[response_column].map({'yes': 1, 'no': 0})

# Split the data into training and test sets
bank_train, bank_test = train_test_split(bank_data, test_size=0.2, random_state=42)

# Create and train the decision tree model
model = DecisionTreeClassifier()
model.fit(bank_train[predictor_columns], bank_train[response_column])

# Make predictions on the test set
ypred = model.predict(bank_test[predictor_columns])

# Evaluate the model
accuracy = accuracy_score(bank_test[response_column], ypred)
precision = precision_score(bank_test[response_column], ypred, pos_label=1)  # Specify pos_label
recall = recall_score(bank_test[response_column], ypred)
f1 = f1_score(bank_test[response_column], ypred)
conf_matrix = confusion_matrix(bank_test[response_column], ypred)

print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")
print("Confusion Matrix:")
print(conf_matrix)


Accuracy: 0.7026421854008061
Precision: 0.692822966507177
Recall: 0.6785379568884724
F1 Score: 0.6856060606060607
Confusion Matrix:
[[845 321]
 [343 724]]


## 5.2 Accounting for Unequal Error Costs Using R/Python

In [7]:
# In Python, you can use the class_weight parameter in the DecisionTreeClassifier to account for unequal error costs.
# The class_weight parameter allows you to specify weights for each class,
# which is useful when dealing with imbalanced datasets or when you want to assign different costs to different types of errors.

# Here's how you can achieve this in Python using the Bank dataset:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix

# Load the dataset (replace 'bank.csv' with the actual path to your dataset)
bank_data = pd.read_csv('/content/bank.csv')

# Replace these columns with your actual predictor and response variable names
predictor_columns = ["age", "balance", "day", "duration", "campaign", "pdays", "previous"]
response_column = "deposit"

# Assuming categorical variables are already one-hot encoded
# If not, use bank_data = pd.get_dummies(bank_data, columns=["job", "marital", "education", "default", "housing", "loan", "contact", "month", "poutcome"])

# Convert 'yes' and 'no' to 1 and 0
bank_data[response_column] = bank_data[response_column].map({'yes': 1, 'no': 0})

# Split the data into training and test sets
bank_train, bank_test = train_test_split(bank_data, test_size=0.2, random_state=42)

# Define the class weights based on your cost matrix
class_weights = {0: 0, 1: 4}  # Replace with your actual cost matrix

# Create and train the decision tree model with class weights
model_cost = DecisionTreeClassifier(class_weight=class_weights)
model_cost.fit(bank_train[predictor_columns], bank_train[response_column])

# Make predictions on the test set
ypred_cost = model_cost.predict(bank_test[predictor_columns])

# Evaluate the model with confusion matrix
conf_matrix_cost = confusion_matrix(bank_test[response_column], ypred_cost)

print("Confusion Matrix with Class Weights:")
print(conf_matrix_cost)


# In this example, the class_weight parameter is used to assign different weights to the classes.
# The weights are specified in the class_weights dictionary, where the keys are the class labels (0 and 1),
# and the values are the corresponding weights.
# Replace the class_weights dictionary with the actual weights you want to assign based on your cost matrix.


Confusion Matrix with Class Weights:
[[   0 1166]
 [   0 1067]]
