# Week 3: Classification - Support Vector Machines (SVM) and Decision Trees (Exercise)

This notebook is a brief introduction to **Support Vector Machines** (SVMs) and **Decision Trees**.

For both methods, we will show two examples. One after using PCA to reduce the dimensionality of the data, and one with just using two features from the original data.

In [None]:
# Load data
from matplotlib import pyplot as plt
from sklearn.datasets import load_iris
data = load_iris()
X = data.data # Shape: (150, 4)
y = data.target # Shape: (150,). Note there are three classes: 0, 1, 2
labels = data.feature_names

# PCA
from sklearn.decomposition import PCA

### Implement a PCA that reduces the dataset to 2 dimensions.
pca = None
pca_2d = None
###

# Two features
X_two_features = X[:, :2]

## SVMs using Scikit-Learn

Our code in this section shows how Support Vector Machines (SVMs) can be used with Scikit-Learn. SVM helps us find the optimal hyperplane to separate different classes. We will demonstrate how to create, train, and evaluate an SVM model, showcasing its versatility and ability to handle complex decision boundaries.

In [None]:
from sklearn.svm import SVC

# Train SVM

### Implement and train your svm using pca_2d, use random state 42.
svm = None
###

from sklearn.inspection import DecisionBoundaryDisplay
fig, ax = plt.subplots(figsize=(10, 10))
display = DecisionBoundaryDisplay.from_estimator(
    svm,
    pca_2d,
    response_method="predict",
    ax=ax,
    alpha=0.8,
    cmap=plt.cm.coolwarm
)
ax.scatter(pca_2d[:, 0], pca_2d[:, 1], c=y, s=75, cmap=plt.cm.coolwarm, edgecolor="k")
plt.show()


In [None]:
from sklearn.svm import SVC

# Train SVM

### Implement and train your svm using X_two_features, use random state 42.
svm2f = None
###

from sklearn.inspection import DecisionBoundaryDisplay
fig, ax = plt.subplots(figsize=(10, 10))
display = DecisionBoundaryDisplay.from_estimator(
    svm2f,
    X_two_features,
    response_method="predict",
    ax=ax,
    alpha=0.8,
    cmap=plt.cm.coolwarm
)
ax.scatter(X_two_features[:, 0], X_two_features[:, 1], c=y, s=75, cmap=plt.cm.coolwarm, edgecolor="k")
plt.show()


## Decision Trees using Scikit-Learn

With this code demonstration, we will illustrate how Decision Trees can be implemented using Scikit-Learn. Scikit-Learn's decision tree classifier will be used to build and visualize a simple decision tree model, illustrating its effectiveness for making decisions.

The DecisionTreeClassifier recursively partitions the dataset into subsets based on the values of its input features. By choosing the most appropriate feature to split on at each step, it aims to minimize impurities within subsets. This process continues until a stopping criterion is reached, such as a maximum depth or nodes with pure, unambiguous classes. By following the branches of the tree based on the input features, the resulting tree structure can be used for making predictions.

In [None]:
# Decision Tree in sklearn
from sklearn.tree import DecisionTreeClassifier, plot_tree

# Train Decision Tree

### Implement and train your svm using pca_2d, use random state 42.
dt = None
###

fig, ax = plt.subplots(figsize=(10, 10))
display = DecisionBoundaryDisplay.from_estimator(
    dt,
    pca_2d,
    response_method="predict",
    ax=ax,
    alpha=0.8,
    cmap=plt.cm.coolwarm,
    xlabel="PCA1",
    ylabel="PCA2"
)
ax.scatter(pca_2d[:, 0], pca_2d[:, 1], c=y, s=75, cmap=plt.cm.coolwarm, edgecolor="k")
plt.show()

# Plot tree
fig, ax = plt.subplots(figsize=(10, 10))
plot_tree(dt, ax=ax, filled=True, rounded=True, fontsize=10, feature_names=["PCA1", "PCA2"])
plt.show()


In [None]:
# Decision Tree in sklearn
from sklearn.tree import DecisionTreeClassifier, plot_tree

# Train Decision Tree

### Implement and train your svm using X_two_features, use random state 42.
dt2f = None
###

fig, ax = plt.subplots(figsize=(10, 10))
display = DecisionBoundaryDisplay.from_estimator(
    dt2f,
    X_two_features,
    response_method="predict",
    ax=ax,
    alpha=0.8,
    cmap=plt.cm.coolwarm,
    xlabel=labels[0],
    ylabel=labels[1]
)
ax.scatter(X_two_features[:, 0], X_two_features[:, 1], c=y, s=75, cmap=plt.cm.coolwarm, edgecolor="k")
plt.show()

# Plot tree
# NOTE - this will produce a big image with very small font. You can zoom in to see the tree.
fig, ax = plt.subplots(figsize=(50, 50))
plot_tree(dt2f, ax=ax, filled=True, rounded=True, fontsize=10, feature_names=[labels[0], labels[1]])
plt.show()
