### Question 1

In [3]:
from sklearn.datasets import fetch_california_housing
ds = fetch_california_housing()
X, y = ds.data, ds.target

### Question 2

In [49]:
print(ds.DESCR)

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

:Number of Instances: 20640

:Number of Attributes: 8 numeric, predictive attributes and the target

:Attribute Information:
    - MedInc        median income in block group
    - HouseAge      median house age in block group
    - AveRooms      average number of rooms per household
    - AveBedrms     average number of bedrooms per household
    - Population    block group population
    - AveOccup      average number of household members
    - Latitude      block group latitude
    - Longitude     block group longitude

:Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived from the 1990 U.S. census, using one row per ce

In [18]:
print(f"Shape of X: {X.shape}")
print(f"Size of data (n): {X.shape[0]}")

Shape of X: (20640, 8)
Size of data (n): 20640


X has shape (20640, 8) which means 20,640 samples and 8 features. The size of data is 20,640 samples. X contains the feature values, y contains median house values (target. Based on print(ds.DESCR), the 8 dimensions in X are  MedInc - median income, HouseAge - house age, AveRooms - average rooms, AveBedrms - average bedrooms, Population - block population, AveOccup - average occupancy, Latitude - latitude, Longitude - longitude

### Question 3

In [22]:
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV

# Apply models with cross-validation
ridge = RidgeCV().fit(X, y)
lasso = LassoCV().fit(X, y)
elastic = ElasticNetCV().fit(X, y)

### Question 4

In [25]:
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

# Make predictions
y_pred_ridge = ridge.predict(X)
y_pred_lasso = lasso.predict(X)
y_pred_elastic = elastic.predict(X)

# Calculate metrics for each model
print("Ridge - R²:", r2_score(y, y_pred_ridge), 
      "MSE:", mean_squared_error(y, y_pred_ridge),
      "MAE:", mean_absolute_error(y, y_pred_ridge))

print("Lasso - R²:", r2_score(y, y_pred_lasso),
      "MSE:", mean_squared_error(y, y_pred_lasso),
      "MAE:", mean_absolute_error(y, y_pred_lasso))

print("ElasticNet - R²:", r2_score(y, y_pred_elastic),
      "MSE:", mean_squared_error(y, y_pred_elastic),
      "MAE:", mean_absolute_error(y, y_pred_elastic))

Ridge - R²: 0.6062301017653273 MSE: 0.5243244261575613 MAE: 0.5312451161715875
Lasso - R²: 0.5902088471774516 MSE: 0.5456575325117343 MAE: 0.545236349411631
ElasticNet - R²: 0.5863093677864052 MSE: 0.5508498854650449 MAE: 0.5477715057550929


### Question 5

In [28]:
from sklearn.datasets import fetch_openml
import numpy as np

mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X = mnist['data'].astype('float32') / 255.0 # shape: (70000, 784)
y = mnist['target'].astype('int64')


### Question 6

In [31]:
print(mnist.DESCR)

**Author**: Yann LeCun, Corinna Cortes, Christopher J.C. Burges  
**Source**: [MNIST Website](http://yann.lecun.com/exdb/mnist/) - Date unknown  
**Please cite**:  

The MNIST database of handwritten digits with 784 features, raw data available at: http://yann.lecun.com/exdb/mnist/. It can be split in a training set of the first 60,000 examples, and a test set of 10,000 examples  

It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image b

In [33]:
print(f"Shape of X: {X.shape}")
print(f"Size of data (n): {X.shape[0]}")

Shape of X: (70000, 784)
Size of data (n): 70000


X has shape (70000, 784) which means 70000 samples and 784 features. The size of data is 70000 samples.  Each of the 784 features represents one pixel in a 28×28 grayscale image of a handwritten digit. The pixels are flattened into a 1D array. Value of X is the pixel intensities ranging from 0.0 to 1.0, and value of y is digit labels from 0 to 9.

### Question 7

In [37]:
# Apply models with cross-validation
ridge = RidgeCV().fit(X, y)
lasso = LassoCV().fit(X, y)
elastic = ElasticNetCV().fit(X, y)

### Question 8

In [43]:
# Make predictions
y_pred_ridge = ridge.predict(X)
y_pred_lasso = lasso.predict(X)
y_pred_elastic = elastic.predict(X)

# Calculate metrics for each model
print("Ridge - R²:", r2_score(y, y_pred_ridge), 
      "MSE:", mean_squared_error(y, y_pred_ridge),
      "MAE:", mean_absolute_error(y, y_pred_ridge))

print("Lasso - R²:", r2_score(y, y_pred_lasso),
      "MSE:", mean_squared_error(y, y_pred_lasso),
      "MAE:", mean_absolute_error(y, y_pred_lasso))

print("ElasticNet - R²:", r2_score(y, y_pred_elastic),
      "MSE:", mean_squared_error(y, y_pred_elastic),
      "MAE:", mean_absolute_error(y, y_pred_elastic))

Ridge - R²: 0.6173716585915862 MSE: 3.196136008011445 MAE: 1.4003080465201567
Lasso - R²: 0.6154890060424805 MSE: 3.211861941007048 MAE: 1.4059022361448832
ElasticNet - R²: 0.6149933338165283 MSE: 3.216002492568397 MAE: 1.407977912558828


### Question 9

No, linear regression is not appropriate to apply for the MNIST dataset because MNIST is a classification problem (not a regression problem). We want to predict continuous values through linear regression but target variable y contains categorical labels (digits 0-9). The model will assume number 9 is bigger than the number 1 which is actually different categories. Also, the relationship between pixel values and the number of categories is nolinear which imply linear regression may not the ability to capture.