# Introduction to scikit-learn

scikit-learn is a popular Python library for machine learning, providing tools for classification, regression, clustering, dimensionality reduction, and more. In this notebook, you will learn to load datasets, perform supervised and unsupervised learning, and explore self-supervised learning.

---

## Part 1: Loading and Visualizing Datasets

In this part, you'll learn how to load and visualize datasets using scikit-learn.

In [None]:
!pip install numpy
!pip install matplotlib
!pip install scikit-learn

In [None]:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.decomposition import PCA
from sklearn.neural_network import MLPClassifier
import numpy as np

### Exercise 1: Load the Iris dataset

- Load the Iris dataset using `datasets.load_iris()`.
- Visualize the first two features using a scatter plot (`plt.scatter()`), with each species represented by a different color.

In [None]:
from sklearn import datasets
import matplotlib.pyplot as plt

# (Write your code below)

### Exercise 2: Load the Wine dataset

* Load the Wine dataset using `datasets.load_wine()`.
* Create a 2D scatter plot of the first two features of the dataset, using different colors for the three wine classes.

In [None]:
# (Write your code below)

## Part 2: Supervised Learning

In this part, you'll perform classification and regression using scikit-learn models.

Classification: Decision Tree and Logistic Regression

### Exercise 3: Classification with Decision Tree

* Load the Iris dataset.
* Train a Decision Tree classifier (`DecisionTreeClassifier()`) on the dataset.
* Visualize the decision boundary using a scatter plot.

In [None]:
import numpy as np
from sklearn.tree import DecisionTreeClassifier

# (Write your code below)

### Exercise 4: Classification with Logistic Regression

* Load the Wine dataset.
* Train a Logistic Regression model (`LogisticRegression()`) on the dataset.
* Plot the decision boundary and visualize the predictions.

In [None]:
# (Write your code below)

### Exercise 5: Linear Regression on the California Housing dataset

- Load the California Housing dataset using `datasets.fetch_california_housing()`.
- Train a Linear Regression model (`LinearRegression()`) to predict house prices.
- Plot the predicted values vs. the true values using `plt.scatter()`.

In [None]:
# (Write your code below)

## Part 3: Unsupervised Learning (PCA)

Principal Component Analysis (PCA) is used to reduce the dimensionality of data.

### Exercise 6: PCA on the Iris dataset

* Load the Iris dataset.
* Apply PCA (`PCA()`) to reduce the dataset to 2 components.
* Plot the two principal components, coloring the points by species.

In [None]:
# (Write your code below)

## Part 4: Self-Supervised Learning (Regression)

In self-supervised learning, the target is extracted directly from the data rather than being provided externally. In this exercise, you'll create a synthetic dataset and extract a target signal from the data itself. You'll then train a regression model to predict this target signal.

### Exercise 7: Self-Supervised Learning with Regression

- Create a synthetic dataset with `datasets.make_regression()` that contains multiple features.
- Use one feature (e.g., the first feature) as the target signal (`y`), and use the other features as inputs (`X`).
- Train a regression model (`LinearRegression()`) to predict the target signal.
- Plot the predicted vs. actual values for the target signal using `plt.scatter()`.

```python
# (Write your code below)
