# Jour 13 : Introduction au Machine Learning avec `scikit-learn`

Aujourd’hui, nous découvrons les bases du Machine Learning supervisé avec la bibliothèque `scikit-learn`.

📌 Objectifs :
- Comprendre les étapes d’un pipeline ML
- Implémenter une régression linéaire et une classification
- Évaluer les performances avec précision, matrice de confusion, etc.

In [None]:
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, accuracy_score, confusion_matrix
import matplotlib.pyplot as plt

## 📥 Chargement du dataset `tips`

In [None]:
df = sns.load_dataset('tips')
df['sex'] = df['sex'].map({'Male': 1, 'Female': 0})
df.dropna(inplace=True)
df.head()

## ✂️ Séparation des données en train/test

In [None]:
X = df[['total_bill', 'size']]
y = df['tip']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 📈 Régression Linéaire

In [None]:
model_lr = LinearRegression()
model_lr.fit(X_train, y_train)
y_pred = model_lr.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f'MSE: {mse:.2f}')

## 🧠 Classification : fumeur ou pas ?

In [None]:
df['smoker'] = df['smoker'].map({'Yes': 1, 'No': 0})
X = df[['total_bill', 'tip', 'size']]
y = df['smoker']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
clf = LogisticRegression()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Confusion Matrix:\n', confusion_matrix(y_test, y_pred))