# Introduction to scikit-learn

scikit-learn is a popular Python library for machine learning, providing tools for classification, regression, clustering, dimensionality reduction, and more. In this notebook, you will learn to load datasets, perform supervised and unsupervised learning, and explore self-supervised learning.

---

## Part 1: Loading and Visualizing Datasets

In this part, you'll learn how to load and visualize datasets using scikit-learn.

In [None]:
!pip install numpy
!pip install matplotlib
!pip install scikit-learn




[notice] A new release of pip available: 22.3 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting matplotlib
  Using cached matplotlib-3.9.2-cp311-cp311-win_amd64.whl (7.8 MB)
Collecting contourpy>=1.0.1
  Using cached contourpy-1.3.1-cp311-cp311-win_amd64.whl (219 kB)
Collecting cycler>=0.10
  Using cached cycler-0.12.1-py3-none-any.whl (8.3 kB)
Installing collected packages: cycler, contourpy, matplotlib
Successfully installed contourpy-1.3.1 cycler-0.12.1 matplotlib-3.9.2



[notice] A new release of pip available: 22.3 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip available: 22.3 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.decomposition import PCA
from sklearn.neural_network import MLPClassifier
import numpy as np

### Exercise 1: Load the Iris dataset

- Load the Iris dataset using `datasets.load_iris()`.
- Visualize the first two features using a scatter plot (`plt.scatter()`), with each species represented by a different color.

In [None]:
# (Write your code below)

# Load the Iris dataset
iris = datasets.load_iris()
X = 
y = 

# Visualize the first two features

### Exercise 2: Load the Wine dataset

* Load the Wine dataset using `datasets.load_wine()`.
* Create a 2D scatter plot of the first two features of the dataset, using different colors for the three wine classes.

In [None]:
# (Write your code below)

# Load the Wine dataset
wine = 
X = 
y = 

# Visualize the first two features

## Part 2: Supervised Learning

In this part, you'll perform classification and regression using scikit-learn models.

Classification: Decision Tree and Logistic Regression

### Exercise 3: Classification with Decision Tree

* Load the Iris dataset.
* Train a Decision Tree classifier (`DecisionTreeClassifier()`) on the dataset.
* Visualize the decision boundary using the given `visualize_DB()` function.

In [None]:
def visualize_DB(X, y, model, dataset):
    xx, yy = np.meshgrid(np.linspace(X[:, 0].min(), X[:, 0].max(), 100),
                     np.linspace(X[:, 1].min(), X[:, 1].max(), 100))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.contourf(xx, yy, Z, alpha=0.3, cmap='viridis')
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
    plt.title('Decision Tree - Iris Dataset')
    plt.xlabel(dataset.feature_names[0])
    plt.ylabel(dataset.feature_names[1])
    plt.show()

# (Write your code below)

### Exercise 4: Classification with Logistic Regression

* Load the Wine dataset.
* Train a Logistic Regression model (`LogisticRegression()`) on the dataset.
* Plot the decision boundary and visualize the predictions.

In [None]:
# (Write your code below)

### Exercise 5: Linear Regression on the Diabetes dataset

- Load the Diabetes dataset using `datasets.load_diabetes()`.
- Train a Linear Regression model (`LinearRegression()`) to predict house prices.
- Plot the predicted values vs. the true values using `plt.scatter()`.

In [None]:
# (Write your code below)

## Part 3: Unsupervised Learning (PCA)

Principal Component Analysis (PCA) is used to reduce the dimensionality of data.

### Exercise 6: PCA on the Iris dataset

* Load the Iris dataset.
* Apply PCA (`PCA()`) to reduce the dataset to 2 components.
* Plot the two principal components, coloring the points by species.

In [None]:
# (Write your code below)

## Part 4: Self-Supervised Learning (Time Series Prediction)

In this exercise, you will work with a synthetic time series signal and use a regression model to predict the next five values in the series. This is an example of self-supervised learning, as the target is directly derived from the input data.

### Exercise 7: Self-Supervised Learning with Time Series Prediction

- Create a synthetic sine wave as the signal using `np.sin()`.
- Create the input features (`X`) from the signal by taking sliding windows of 5 consecutive values.
- The target (`y`) will be the next value in the sequence, i.e., the 6th value after the sliding window.
- Train a regression model (`LinearRegression()`) to predict the next value in the sequence based on the sliding window of previous values.
- Plot the true signal and the predicted values for the next 5 steps on the same plot.

In [None]:
# (Write your code below)