### **Question 1: Linear Regression**
a) Load the "Boston Housing" dataset from scikit-learn's built-in datasets.

b) Split the data into training and testing sets.

if your roll number is even then
(80% training, 20% testing).

if your roll number is odd then
(70% training, 30% testing).

c) Train a linear regression model on the training data and make predictions on the testing data.

d) Calculate the mean squared error (MSE) between the predicted and actual values.

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]

X = data
y = target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42 )

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = model.predict(X_test)
# Calculate the mean squared error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)


Mean Squared Error: 21.51744423117753


# **Question 2: L1 Regularization (Lasso)**
a) Load the "Diabetes" dataset from scikit-learn's built-in datasets.

b) Split the data into training and testing sets.

if your roll number is even then (80% training, 20% testing).

if your roll number is odd then (70% training, 30% testing).

c) Train a Lasso regression model on the training data with an alpha value of 0.1.

***Model name should be your first name***

d) Evaluate the model's performance using the mean squared error (MSE) on the testing data.

e) Identify the features that were selected (non-zero coefficients) by the Lasso model.

In [None]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error

diabetes_data = load_diabetes()

X_train, X_test, y_train, y_test = train_test_split(diabetes_data.data, diabetes_data.target, test_size=0.3, random_state=42)

arvind = Lasso(alpha=0.1)
arvind.fit(X_train, y_train)

y_pred = arvind.predict(X_test)
MSE = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", MSE)

selected_features = [feature for feature, coef in zip(diabetes_data.feature_names, arvind.coef_) if coef != 0]
print("Selected Features:", selected_features)



Mean Squared Error: 2775.165076183445
Selected Features: ['sex', 'bmi', 'bp', 's1', 's3', 's5', 's6']


# **Question 3: L2 Regularization (Ridge)**
a) Load the "California Housing" dataset from an online source (e.g., Kaggle).
*housing.csv* written

b) Perform any necessary preprocessing steps, such as handling missing values or scaling the features.

c) Split the data into training and testing sets.

if your roll number is prime (last two digits) then (85% training, 15% testing).

if your roll number is not prime (last two digits) then (75% training, 35% testing).

d) Train a Ridge regression model on the training data with an alpha value of 0.01.

e) Calculate the mean squared error (MSE) on the testing data to assess the model's performance.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer

# Load the California Housing dataset from an online source (e.g., Kaggle)
data = pd.read_csv("/content/Housing.csv")

data.dropna(inplace=True)


X = data.drop('price', axis=1)
y = data['price']

numeric_features = X.select_dtypes(include=['float64', 'int64']).columns
numeric_transformer = StandardScaler()

categorical_features = X.select_dtypes(include=['object']).columns
categorical_transformer = OneHotEncoder(drop='first')

preprocessor = ColumnTransformer(
    transformers=[
        ('numeric', numeric_transformer, numeric_features),
        ('categorical', categorical_transformer, categorical_features)
    ])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)

X_train_processed = preprocessor.fit_transform(X_train)

X_test_processed = preprocessor.transform(X_test)

ridge_model = Ridge(alpha=0.01)
ridge_model.fit(X_train_processed, y_train)

y_pred = ridge_model.predict(X_test_processed)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)


Mean Squared Error: 1349747569880.33
