# Learning models & methods

In this notebook you'll get used with the notions of classification & regression

---

## Task 1: Classification & Regression

### What is Classification?
Classification involves predicting a discrete category or class label. For example:
- Predicting whether an email is **spam** or **not spam**.
- Predicting the species of a flower.

### What is Regression?
Regression involves predicting a continuous value. For example:
- Predicting house prices.
- Estimating the temperature for tomorrow.

In this section, you will:
1. Implement two classification methods: Decision Trees and k-Nearest Neighbors (k-NN) using the Iris dataset.
2. Implement one regression method: Linear Regression using the diabete dataset.

For each model:
- Split the dataset into training and testing sets.
- Train the model on the training set.
- Evaluate the model on the testing set using appropriate metrics.

In [None]:
!pip install numpy
!pip install matplotlib
!pip install scikit-learn

In [2]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split

In this section, you will:
1. Implement two classification methods: Decision Trees and k-Nearest Neighbors (k-NN) using the Iris dataset.
2. Implement one regression method: Linear Regression using the California Housing dataset.

For each model:
- Split the dataset into training and testing sets.
- Train the model on the training set.
- Evaluate the model on the testing set using appropriate metrics.

---

### Classification Tasks
#### Exercise 1: Decision Tree Classifier
Load the Iris dataset and split it into train and test, use only the first two features of the dataset (sepal length & width). 

In [3]:
# Load the Iris dataset

# Split the Iris dataset


Create a Decision Tree Classifier with `sklearn.tree.DecisionTreeClassifier()` and train it using the `fit()` function. You can try different values for the maximum depth.

In [None]:
from sklearn.tree import DecisionTreeClassifier

# Train a Decision Tree Classifier


Make predictions using the `predict()` function and compare the results with the target values.

In [None]:
# Make predictions and compare


Here is a function to visualuze decision boundaries, try to understand what it's doing and use it to plot the boundaries of your decision tree model.

You can also use the `sklearn.tree.plot_tree` function to visualize the tree architecture and try to understand what is each node doing.

In [None]:
from sklearn.tree import plot_tree

def plot_decision_boundary(model, X, y, title):
    h = 0.02  # Step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu)
    plt.xlabel('Sepal Length (cm)')
    plt.ylabel('Sepal Width (cm)')
    plt.title(title)
    plt.show()

# Visualize the boundaries and the tree architecture
    

#### Exercise 2: k-Nearest Neighbors Classifier
Replicate the steps above with a a KNN classifier from `sklearn.neighbors.KNeighborsClassifier`. You can try different number of neighbors.

In [None]:
from sklearn.neighbors import KNeighborsClassifier

# Train the Classifier

# Make predictions and compare


Use once again the `plot_decision_boundary()` function to plot the boundaries of your KNN model.

In [None]:
# Visualize the boundaries


### Regression Task

#### Exercise 3: Linear regression
Load the diabete dataset from sklearn and split it into train and test. Use only the first feature (check the dataset parameters to see which one it is).

In [None]:
# Use the diabete dataset

# Split the diabete dataset


Create a Linear Regression Model with `sklearn.linear_model.LinearRegression()` and train it using the `fit()` function.

In [None]:
from sklearn.linear_model import LinearRegression

# Train a Linear Regression model


Make predictions using the `predict()` function and compare the results with the target values.

In [None]:
# Make predictions and compare


## Task 2: Understanding Metrics

### Classification Metrics
- **Accuracy**: The proportion of correct predictions out of all predictions.
- **Precision**: The proportion of true positives among the predicted positives.
- **Recall**: The proportion of true positives among the actual positives.
- **F1-Score**: The harmonic mean of precision and recall, useful for imbalanced datasets.

### Regression Metrics
- **Mean Squared Error (MSE)**: The average squared difference between predicted and actual values. Penalizes larger errors more.
- **Mean Absolute Error (MAE)**: The average absolute difference between predicted and actual values.
- **R² Score (Coefficient of Determination)**: Indicates how well the model explains the variance in the target. Values closer to 1 mean better fit.

---

### Exercise 4: Classification metrics

Calculate and interpret the following metrics: Accuracy, Precision, Recall, and F1-Score using the Iris dataset with k-NN model.


In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

# Load the Iris dataset

# Split the Iris dataset

# Train the Classifier

# Make predictions

# Calculate and print Accuracy, Precision, Recall and F1-Score

# Calculate all of these using Classification Report


### Exercise 5: Regression metrics

Calculate and interpret MSE, MAE, and R² Score using the diabete dataset with Linear Regression.

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# ULoad the diabete dataset

# Split the diabete Housing dataset

# Train a Linear Regression model

# Make predictions

# Calculate the mean Squared Error (MSE)

# Calculate the mean Absolute Error (MAE)

# Calculate the R^2 Score
