# Classifiers
In this lab session we will use the library scikit-learn to apply the following classifier algorithms:
1. Support Vector Machines (SVM)
2. K-Nearest Neighbor
3. Naive Bayes
4. Decision Trees

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, svm, tree
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.inspection import DecisionBoundaryDisplay

We load the iris data from scikit learn, the data contains information about the Iris plant, there are three types/classes: Setosa, Versicolor, and Virginica. The data contains four features: sepal length, sepal width, petal length, and petal width all in centimeters.

In [None]:
iris = datasets.load_iris()
X = iris.data
C = iris.target
print(f'Iris types : {iris.target_names}')
print(f'Features : {iris.feature_names}')
print("Number of datapoints:", len(C))

### Question 1
Make a plot to visualize the data. To do so, we will choose two features from the four possible. 

In [None]:
feature1 = 0
feature2 = 1

plt.figure()

# ...

plt.xlabel(iris.feature_names[feature1])
plt.ylabel(iris.feature_names[feature2])

plt.show()

### Test/Train Split

### Question 2
Perform a Test/Train split by leaving 25 data samples for testing. For that, we will use the function `np.random.permutations()`.

In [None]:
# Xtrain =
# Ctrain =
# Xtest = 
# Ctest = 

### Support Vector Machines (SVM)
The function below is used to plot the decision boundaries predicted by a model, together with the datapoints in the 2D plane (sepal length, sepal width).

In [None]:
def plot_decision_boundaries(model, X, C):                                                                      
    _, ax = plt.subplots()
    DecisionBoundaryDisplay.from_estimator(
        model,
        X[:, :2],
        ax=ax,
        cmap='cool',
        response_method="predict",
        plot_method="pcolormesh",
        xlabel=iris.feature_names[0],
        ylabel=iris.feature_names[1],
        shading="auto",
    )

    # Plot also the training points
    plt.scatter(
        X[:, 0],
        X[:, 1],
        c=C,
        cmap='cool',
        edgecolors='k'
    )

### Question 3
Use sklearn to fit an SVM model, use the first two feature vectors only and plot the decision boundaries. <br>
Change the kernel to check how that affects the decision boundary.

In [None]:
# Example for a linear kernel
svc1 = svm.SVC(kernel='linear')
svc1.fit(Xtrain[:, :2], Ctrain)
plot_decision_boundaries(svc1, Xtrain, Ctrain)
plt.title("3-Class classification SVM linear")

### Question 4
Use the trained model to predict the labels of the testing set, and evaluate the prediction.

### K-Nearest Neighbors

### Question 5
We now use the KNN model. Fit the first two features of the data and plot the boundaries again for this model. 
This time change the neighbors parameter to see how that affects the decision boundary.

In [None]:
# Create and fit a nearest-neighbor classifier
n_neighbors = 3
knn = KNeighborsClassifier(n_neighbors, weights='uniform')

# ...

### Naive Bayes Classifier

### Question 6
Same question for a Gaussian naive Bayes model.

In [None]:
model = GaussianNB()

# ...

### Decision Trees

### Question 7
Same question for a decision tree model (playing with the parameter `max_depth`). 
Use all the features this time, and create a graph that shows the decision tree.

In [None]:
dectree = tree.DecisionTreeClassifier(max_depth=2)
dectree.fit(Xtrain, Ctrain)
tree.plot_tree(dectree);

# ...