## Part-I: Linear Regression

### Task 1: Simple Linear Regression

In this task, we are going to use the `California Housing` dataset to predict house prices using a single feature, like the average number of rooms. We'll build and train a simple linear regression model and visualize the result.

In [1]:
from sklearn.datasets import fetch_california_housing
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Load dataset
housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target

# We'll use only one feature: 'AveRooms'
X_simple = X[['AveRooms']]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X_simple, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Plotting
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel('Average Rooms')
plt.ylabel('House Value')
plt.title('Simple Linear Regression')
plt.legend()
plt.show()

KeyboardInterrupt: 

### Task 2: Multiple Linear Regression

Now, we’ll use multiple features from the dataset to predict the house price and evaluate our model using R², MSE, and RMSE.

In [None]:
from sklearn.metrics import mean_squared_error, r2_score
from math import sqrt

# Using all features for multiple linear regression
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

multi_model = LinearRegression()
multi_model.fit(X_train, y_train)

y_pred_multi = multi_model.predict(X_test)

# Evaluation
r2 = r2_score(y_test, y_pred_multi)
mse = mean_squared_error(y_test, y_pred_multi)
rmse = sqrt(mse)

print(f"R-squared: {r2:.4f}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")

# Coefficients
coef_df = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': multi_model.coef_
})
print(coef_df)

### Task 3: Feature Scaling and Normalization

Here we’ll scale our features using StandardScaler and compare model performance before and after scaling.

In [None]:
from sklearn.preprocessing import StandardScaler

# Without scaling already done above

# Now with scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train_scaled, X_test_scaled, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

scaled_model = LinearRegression()
scaled_model.fit(X_train_scaled, y_train)

y_pred_scaled = scaled_model.predict(X_test_scaled)

r2_scaled = r2_score(y_test, y_pred_scaled)
mse_scaled = mean_squared_error(y_test, y_pred_scaled)
rmse_scaled = sqrt(mse_scaled)

print(f"After Scaling -> R-squared: {r2_scaled:.4f}, MSE: {mse_scaled:.4f}, RMSE: {rmse_scaled:.4f}")

### Task 4: Model Interpretation

We’ll visualize the correlation matrix using a heatmap and discuss feature relationships and multicollinearity.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Correlation Matrix
plt.figure(figsize=(10, 8))
sns.heatmap(X.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

# Top correlated features with target
correlation_with_target = X.corrwith(pd.Series(y)).sort_values(ascending=False)
print("Features with highest correlation to target:", correlation_with_target)

## Part-II: Logistic Regression

### Task 5: Binary Classification with Logistic Regression

We will use the Breast Cancer dataset to build a binary classification model that predicts whether a tumor is benign or malignant.

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
clf = LogisticRegression(max_iter=10000)
clf.fit(X_train, y_train)

# Predict
y_pred = clf.predict(X_test)
y_proba = clf.predict_proba(X_test)[:, 1]

# Evaluation
print("Classification Report:", classification_report(y_test, y_pred))
print("Confusion Matrix:", confusion_matrix(y_test, y_pred))
print(f"ROC-AUC Score: {roc_auc_score(y_test, y_proba):.4f}")

# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test, y_proba)
plt.plot(fpr, tpr, label='ROC Curve')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve - Logistic Regression")
plt.legend()
plt.grid()
plt.show()

### Task 6: Threshold Tuning and Probability Interpretation

Let’s change the threshold value to see how it impacts performance metrics like F1-score and the confusion matrix.

In [None]:
from sklearn.metrics import f1_score

thresholds_to_test = [0.3, 0.5, 0.7]

for threshold in thresholds_to_test:
    y_thresh = (y_proba >= threshold).astype(int)
    print(f"--- Threshold: {threshold} ---")
    print("Confusion Matrix:", confusion_matrix(y_test, y_thresh))
    print("F1-Score:", f1_score(y_test, y_thresh))
    print()

# Optimal Threshold from ROC Curve
plt.plot(fpr, tpr, label='ROC Curve')
plt.plot([0, 1], [0, 1], linestyle='--', color='gray')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve with Diagonal")
plt.legend()
plt.grid()
plt.show()

### Task 7 (Optional): Multiclass Classification

We use the Iris dataset to perform multiclass classification using logistic regression and evaluate the model using accuracy and classification report.

In [None]:
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

multi_clf = LogisticRegression(multi_class='ovr', max_iter=1000)
multi_clf.fit(X_train, y_train)
y_pred_multi = multi_clf.predict(X_test)

print("Classification Report:", classification_report(y_test, y_pred_multi))
print("Confusion Matrix:", confusion_matrix(y_test, y_pred_multi))