
# Machine Learning with Python

Welcome to the **Machine Learning** course! This course is designed to give you hands-on experience with the foundational concepts and advanced techniques in machine learning. You will explore:

1. **Supervised Learning**
    - Regression algorithms
    - Classification algorithms
2. **Unsupervised Learning**
    - Clustering algorithms
    - Dimensionality reduction
3. **Fairness and Interpretability**
    - Interpretable methods
    - Bias evaluation
    
Throughout the course, you'll engage in projects to solidify your understanding and gain practical skills in implementing machine learning algorithms.  

Instructor: Dr. Adrien Dorise  
Contact: adrien.dorise@hotmail.com  

---


## Part1.1: Supervised learning - Regression on a synthetic dataset
In this project, you will compare multiple regression model on synthetic dataset. The taks will include:  

1. **Import and Understand a Dataset**: Learn how to load, preprocess, and explore a dataset to prepare it for training.
2. **Train a regression model**: Select and train a regression model using scikit-learn.
3. **Evaluate and plot the model performance**: Select a criterion to which you can evaluate the model, and plot its result.
4. **Compare multiple regression model, and get the best performance**: Compare multiple models, and find the best model to fit the data.

By the end of this project, you'll have a solid understanding of the different regression methods.

---

## Dataset

This exercise will start by importing a simple 2D dataset. The dataset can be found in this repo under:  
 `part1_supervised_learning/1_regression/regression_dataset.arff`  

The code snippet below allows you to load the dataset.

In [3]:
import arff
import pandas as pd


# Load dataset from ARFF
with open('regression_dataset.arff', 'r') as f:
    dataset = arff.load(f)

# Convert to DataFrame
df_loaded = pd.DataFrame(dataset['data'], columns=[attr[0] for attr in dataset['attributes']])
df_loaded = df_loaded.astype(float)  # convert object to float if needed


X = df_loaded[['x']].values
y = df_loaded['y'].values

TypeError: 'generator' object is not subscriptable

## Data visualisation

**Your job**:
- Print the first five samples of the dataset.
- Plot the dataset using matplotlib plt.scatter method.


In [None]:
import matplotlib.pyplot as plt



## Data preparation

**Your job**:
- Normalise the dataset between [0,1] using **MinMaxScaler** (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html).
    - Normalization ensures that all features contribute equally by scaling them to a common range, preventing models from being biased toward features with larger values. It also helps gradient-based algorithms converge faster and improves overall model performance.
- Plot the normalised dataset using matplotlib.

In [None]:
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt





## Training regression models

**Your job**:
- Using the sklearn library (https://scikit-learn.org/stable/supervised_learning.html):
    - Train a linear model (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html).
    - Train a polynomial model (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html).
- Print the prediction on the first 10 samples for the two models alongside the target value.

In [1]:
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.preprocessing import PolynomialFeatures



## Evaluating regression models

**Your job**:
- Using sklearn metrics (https://scikit-learn.org/stable/api/sklearn.metrics.html):
    - Select at least two metrics.
    - Evaluate your trained models on these metrics.
    - Compare and conclude about the performance of the models.


In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error


## Improve regression 

**Your job**:
- Modify the regression models and parameters.
    - Train at least **three** more models.
- Apply the pipeline created before.
- Try to achieve the best performance with the lightest model.
- Display your different architectures.

## Plot regression models

**Your job**:
- On the same plot:
    - Plot the dataset using plt.scatter.
    - Plot at least three regression line using plt.plot.
- *Don't forget to add a legend, axis names and title.*