# Q27: Regression and Classification Model Comparison

- Select one regression and one classification problem (from separate datasets).
- For each, implement two different models (e.g., Linear vs. Non-linear Regression; KNN vs. Logistic Regression).
- Compare their performance (R² for regression, Accuracy for classification).
- Analyze strengths and weaknesses of each model based on dataset characteristics.

In [1]:
# Install required packages (uncomment if needed)
!pip install numpy pandas scikit-learn



In [2]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing, load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.metrics import r2_score, mean_squared_error, accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

In [3]:
# --- Regression Problem: California Housing ---
housing = fetch_california_housing()
X_reg = pd.DataFrame(housing.data, columns=housing.feature_names)[['MedInc']]
y_reg = housing.target
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)
scaler_reg = StandardScaler()
X_train_reg_scaled = scaler_reg.fit_transform(X_train_reg)
X_test_reg_scaled = scaler_reg.transform(X_test_reg)

In [4]:
# Linear Regression
lin_reg = LinearRegression()
lin_reg.fit(X_train_reg_scaled, y_train_reg)
y_pred_lin = lin_reg.predict(X_test_reg_scaled)

In [5]:
# Polynomial Regression (degree=2)
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train_reg_scaled)
X_test_poly = poly.transform(X_test_reg_scaled)
poly_reg = LinearRegression()
poly_reg.fit(X_train_poly, y_train_reg)
y_pred_poly = poly_reg.predict(X_test_poly)

In [6]:
print('Regression Results:')
print('Linear Regression R²:', r2_score(y_test_reg, y_pred_lin))
print('Polynomial Regression R²:', r2_score(y_test_reg, y_pred_poly))

Regression Results:
Linear Regression R²: 0.45885918903846656
Polynomial Regression R²: 0.46331772769346224


In [7]:
# --- Classification Problem: Iris Dataset ---
iris = load_iris()
X_clf = iris.data
y_clf = iris.target
X_train_clf, X_test_clf, y_train_clf, y_test_clf = train_test_split(X_clf, y_clf, test_size=0.2, random_state=42, stratify=y_clf)
scaler_clf = StandardScaler()
X_train_clf_scaled = scaler_clf.fit_transform(X_train_clf)
X_test_clf_scaled = scaler_clf.transform(X_test_clf)


In [8]:
# Logistic Regression
log_reg = LogisticRegression(max_iter=200)
log_reg.fit(X_train_clf_scaled, y_train_clf)
y_pred_log = log_reg.predict(X_test_clf_scaled)


In [9]:
# KNN Classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_clf_scaled, y_train_clf)
y_pred_knn = knn.predict(X_test_clf_scaled)

In [10]:
print('Classification Results:')
print('Logistic Regression Accuracy:', accuracy_score(y_test_clf, y_pred_log))
print('KNN Accuracy:', accuracy_score(y_test_clf, y_pred_knn))

Classification Results:
Logistic Regression Accuracy: 0.9333333333333333
KNN Accuracy: 0.9333333333333333


In [11]:
# Strengths and Weaknesses Discussion
print('Linear regression is simple and interpretable, but may underfit non-linear data. Polynomial regression can fit non-linear patterns but may overfit. Logistic regression is robust for linearly separable classes, while KNN can capture complex boundaries but is sensitive to noise and scaling.')

Linear regression is simple and interpretable, but may underfit non-linear data. Polynomial regression can fit non-linear patterns but may overfit. Logistic regression is robust for linearly separable classes, while KNN can capture complex boundaries but is sensitive to noise and scaling.
