### In this tutorial, you will perform both regression and classification tasks using [scikit-learn](https://scikit-learn.org/stable/getting_started.html). Before starting, make sure you have set up your conda environment according to the instructions provided.

In [None]:
# Numerical and plotting libraries
import numpy as np
import matplotlib.pyplot as plt
import matplotlib

# Scikit-learn: datasets
from sklearn.datasets import make_moons, make_classification, fetch_openml

# Scikit-learn: models
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.neural_network import MLPClassifier

# Scikit-learn: model evaluation and preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    mean_squared_error,
    mean_absolute_error,
    r2_score,
    accuracy_score
)

# Set plot font size for better visibility
font = {'size': 16}
matplotlib.rc('font', **font)

## Exercise 1: Predicting Fuel Efficiency with Linear Regression

**Problem Description**

You are given real-world data on automobile fuel efficiency. Each record contains information about a car’s technical specifications and its fuel efficiency measured in miles per gallon (MPG).

In this exercise, you will build a linear regression model to predict a car’s fuel efficiency (MPG) based on its engine displacement (displacement).

#### Your Tasks
1. Load the dataset using fetch_openml:

    ```python
    from sklearn.datasets import fetch_openml
    data = fetch_openml("autoMpg", version=1, as_frame=True)
    df = data.frame
    ```
    
2. Extract:
    - Features (X): use only the displacement column.
    - Target (y): use the target values provided by data.target.

3. Visualize the data:
Create a scatter plot showing displacement (engine size) vs MPG (fuel efficiency).

4. Split the data into train and test sets:
Use 80% of the data for training and 20% for testing.

5. Train a Linear Regression model to predict MPG from displacement.

6. Evaluate your model using:
    - Mean Squared Error (MSE): MSE = (1 / N) * sum ( (y_i - y_hat_i)^2 ) --> Penalizes large errors more.
    - Mean Absolute Error (MAE): MAE = (1 / N) * sum ( |y_i - y_hat_i| ) --> More robust to outliers.
    - R² score --> Proportion of variance explained by the model. Values close to 1 are better.

7. Visualize the model predictions:
    - Plot the test data points.
    - Plot the predicted values from your model.

### Step 1: Load and Inspect the Data

In [None]:
# Load the dataset
data = fetch_openml("autoMpg", version=1, as_frame=True)
df = data.frame

# Print the first few rows
print(df.head())

### Step 2: Extract Features and Target

In [None]:
# TODO: Select displacement column as feature X
X = ...

# TODO: Use the target provided in the dataset
y = ...

###  Step 3: Visualize the Data

In [None]:
# TODO: Create a scatter plot of displacement vs MPG
plt.figure(figsize=(8,6))
...
plt.xlabel("Engine Displacement (cubic inches)")
plt.ylabel("Fuel Efficiency (MPG)")
plt.title("Engine Size vs Fuel Efficiency")
plt.show()

### Step 4: Split into Training and Test Sets

In [None]:
# TODO: Use 80% for training and 20% for testing
X_train, X_test, y_train, y_test = ...

### Step 5: Train Linear Regression

In [None]:
# TODO: Create and train the model
model = ...
model.fit(...)

### Step 6: Evaluate the Model

In [None]:
# TODO: Predict on the test set
y_pred = ...

# TODO: Compute MSE, MAE, R²
mse = ...
mae = ...
r2 = ...

print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"Mean Absolute Error (MAE): {mae:.2f}")
print(f"R² Score: {r2:.3f}")

### Step 7: Visualize Predictions

In [None]:
# TODO: Plot test data and predicted values
plt.figure(figsize=(8,6))
plt.scatter(X_test, y_test, label="Test data", alpha=0.7)
plt.scatter(X_test, y_pred, label="Predictions", color='red')
plt.xlabel("Engine Displacement (cubic inches)")
plt.ylabel("Fuel Efficiency (MPG)")
plt.title("Model Predictions vs Actual Values")
plt.legend()
plt.show()

## Exercise 2: Non-linear Classification using Logistic Regression and Neural Network

Problem Description: You are given a non-linear binary classification problem where two classes are shaped like interleaving moons (non-linearly separable data).

Your goal is to:
- Train two classifiers:
- Logistic Regression
- Neural Network (MLPClassifier)
- Compare their performance and visualize their decision boundaries.

1. Generate the dataset using make_moons() function from sklearn.datasets:
Use n_samples=500, noise=0.2, and random_state=42.

2. Visualize the dataset using scatter plot, color-coded by class.

3. Train a Logistic Regression model on the full dataset and compute its accuracy.

4. Train a Neural Network (MLPClassifier) on the same data with:
MLPClassifier(hidden_layer_sizes=(10, 10), max_iter=2000, random_state=42)
and compute its accuracy.

5. Build a meshgrid covering the input space and visualize the decision boundaries of both models side-by-side.


In [None]:
# Your code here