# IS4487 Week 10 - Practice Code

This notebook is designed to help you follow along with the **Week 10 Lecture and Reading**, introducing you to Classification.

The practice code demos are intended to give you a chance to see working code and can be a source for your lap and assignment work.  Each section contains short explanations and annotated code that reflect the steps in the reading.

### Topics for this demo:
- Create a classification tree
- Visualize the tree output

<a href="https://colab.research.google.com/github/Stan-Pugsley/is_4487_base/blob/main/Demos/demo_10_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


### Context: Financial Services Marketing
We will use a classic UCI banking dataset.  Variables include:

| Feature     | Description                                          | Type        |
| ----------- | ---------------------------------------------------- | ----------- |
| `age`       | Age of the client                                    | Numeric     |
| `default`   | Has credit in default? (yes/no)                      | Categorical |
| `balance`   | Average yearly account balance in euros              | Numeric     |
| `housing`   | Has housing loan? (yes/no)                           | Categorical |
| `loan`      | Has personal loan? (yes/no)                          | Categorical |
| `y`         | Target: Subscribed to term deposit? (yes/no)         | Categorical |

Your task is to predict whether a client will subscribe to a term deposit (yes/no) based on various attributes from a bank marketing campaign.  A `term deposit` is a fixed-timeframe investment like a CD or bond.

### Classification Tree

This model will evaluate each variable to determine the best split and order to predict the target variable.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Load dataset with a semi column separator
url = "https://raw.githubusercontent.com/Stan-Pugsley/is_4487_base/refs/heads/main/DataSets/bank_subscription.csv"
df = pd.read_csv(url, sep=';')

Prepare Data, Split Dataset

In [None]:
# Columns to use
features = ['age', 'default', 'balance', 'housing', 'loan']
target = 'y'

# Replace 'yes'/'no' with 1/0 in features and target
binary_cols = ['default', 'housing', 'loan', 'y']
df[binary_cols] = df[binary_cols].replace({'yes': 1, 'no': 0})

# Features and label
X = df[features]
y = df[target]

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Create Model

In [None]:
# Train decision tree
clf = DecisionTreeClassifier(max_depth=4, random_state=42)
clf.fit(X_train, y_train)

Create Visualization

In [None]:
# Plot decision tree
plt.figure(figsize=(12, 6))
plot_tree(clf, filled=True, feature_names=X.columns, class_names=["No", "Yes"])
plt.title("Decision Tree: Predicting Term Deposit Subscription")
plt.show()

Evaluate the Tree

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

# Calculate precision, recall, and F1-score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
y_pred = clf.predict(X_test)


# Display the metrics
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-score: {f1:.2f}")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")