### 1. Q&A 

#### 1. Analytical Solution to the Regression Problem (Vector Form)

**The Loss Function (MSE)** in linear regression is defined as the squared norm of the residuals between the predicted and actual values:

$$L(w) = ||Xw - y||^2$$

Where:
- X is the design matrix of input features of shape n x m with n being the number of samples and m the number of features.
- w is the vector of weights of shape m x 1.
- y is the target vector of shape n x 1.

The norm $||\cdot||^2$ is the **squared Euclidean norm**, which is equivalent to the dot product of the vector with itself, hence:

$$L(w) = (Xw - y)^T(Xw -y)$$

Apply the distributive property:

$$L(w)= (Xw)^T (Xw) - (Xw)^T y - y^T (Xw) + y^T y$$

Since $(Xw)^Ty$ is a scalar (a single number), and scalars are equal to their transpose:

$$L(w)= w^T X^T X w - 2 y^T X w + y^T y$$

To find the optimal weights, we need to take the derivative of the loss function with respect to w:

$$\frac{\partial L(w)}{\partial w} = 2 X^T X w - 2 X^T y$$

Set the derivative to zero:

$$2 X^T X w - 2 X^T y = 0$$

Simplify:

$$X^T X w = X^T y$$

Solve for w: 
<!-- assuming $$X^T X$$ is invertible: -->
$$w = (X^T X)^{-1} X^T y$$

This is the **Normal Equation** for linear regression.

---

#### 2. Changes When L1 and L2 Regularizations Are Added

**L1 Regularization (Lasso Regression)**

Adds the sum of the absolute values of the weights to the loss function:

$$L(w) = ||Xw - y||^2 + \lambda ||w||_1$$

Where $||w||_1$ is the L1 norm:

$$||w||_1 = \sum_{i=1}^{m} |w_i|$$

L1 regularization encourages sparsity in the model weights, i.e., many weights become exactly zero.

**L2 Regularization (Ridge Regression)**

Adds the sum of the squared values of the weights to the loss function:

$$L(w) = ||Xw - y||^2 + \lambda ||w||_2^2$$

Where $||w||_2^2$ is the squared L2 norm:

$$||w||_2^2 = \sum_{i=1}^{m} w_i^2$$

L2 regularization penalizes large weight values but does not set them exactly to zero.

---

#### 3. Why is L1 Regularization Often Used for Feature Selection?

L1 regularization encourages sparsity in the weight vector because it applies equal penalty regardless of the weight sign. The optimization process often drives some weights to exactly **0**, effectively excluding the corresponding features from the model.

**Why Many Weights are 0 After Model Fitting**
- L1 regularization creates a sharp corner at zero in the loss function landscape, leading to many coefficients being driven precisely to zero.
- This results in a sparse model that selects only the most relevant features.


---

#### 4. Using Linear Models for Nonlinear Dependencies

Linear models such as Linear Regression, Ridge, and Lasso can handle nonlinear relationships by transforming the input features. Some common techniques include:

**Feature Engineering with Polynomial Features**

Transform the original features into polynomial terms:

$$x_1, x_1^2, x_2, x_1 \cdot x_2, \dots$$

This can be done using libraries such as `PolynomialFeatures` from `sklearn.preprocessing`.

### 2. Introduction - preprocessing 

In [80]:
import pandas as pd
import numpy as np
import sklearn as scikit

In [81]:
df_train = pd.read_json('./data/train.json')
df_test = pd.read_json('./data/test.json')

In [82]:
df_train.shape

(49352, 15)

In [83]:
df_test.shape

(74659, 14)

In [84]:
missing_cols = set(df_train.columns) - set(df_test.columns)
print('Missing columns in test dataset:', missing_cols)

Missing columns in test dataset: {'interest_level'}


In [85]:
df_train['interest_level'].head(5)

4     medium
6        low
9     medium
10    medium
15       low
Name: interest_level, dtype: object

In [86]:
mapping = {'low': 0, 'medium': 1, 'high': 2}
df_train['interest_level'] = df_train['interest_level'].map(mapping)

In [87]:
df_train['interest_level'].head(5)

4     1
6     0
9     1
10    1
15    0
Name: interest_level, dtype: int64

In [88]:
df_test['interest_level'] = np.zeros(len(df_test)).astype(int)

In [89]:
df_test['interest_level'].head(5)

0    0
1    0
2    0
3    0
5    0
Name: interest_level, dtype: int64

In [90]:
df_test.shape

(74659, 15)

### 3. Data analysis part 2

In [91]:
df_train['features']

4         [Dining Room, Pre-War, Laundry in Building, Di...
6         [Doorman, Elevator, Laundry in Building, Dishw...
9         [Doorman, Elevator, Laundry in Building, Laund...
10                                                       []
15        [Doorman, Elevator, Fitness Center, Laundry in...
                                ...                        
124000              [Elevator, Dishwasher, Hardwood Floors]
124002    [Common Outdoor Space, Cats Allowed, Dogs Allo...
124004    [Dining Room, Elevator, Pre-War, Laundry in Bu...
124008    [Pre-War, Laundry in Unit, Dishwasher, No Fee,...
124009    [Dining Room, Elevator, Laundry in Building, D...
Name: features, Length: 49352, dtype: object

In [92]:
df_train.shape

(49352, 15)

In [93]:
df_test.shape

(74659, 15)

Remove unused symbols ([,], ', ", and space) from the column.

In [94]:
df_train['features'] = df_train['features'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
df_train['features'] = df_train['features'].replace(r"[\[\]'\"]", "", regex=True)

df_test['features'] = df_test['features'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
df_test['features'] = df_test['features'].replace(r"[\[\]'\"]", "", regex=True)

Split values in each row with the separator "," and collect the result in one huge list for the whole dataset.

In [95]:
all_features = []
for _, row in df_train.iterrows():
    features = row['features'].split(',') if isinstance(row['features'], str) else []
    all_features.extend([feature.strip() for feature in features if feature.strip()])

In [96]:
all_features.__len__()

267906

How many unique values does a result list contain?

In [97]:
unique_features = set(all_features)
len(unique_features)

1553

Count the most popular functions from our huge list and take the top 20 for this moment.

In [98]:
from collections import Counter

feature_counts = Counter(all_features)
top_20_features = feature_counts.most_common(20)

In [99]:
top_20_features

[('Elevator', 25915),
 ('Cats Allowed', 23540),
 ('Hardwood Floors', 23527),
 ('Dogs Allowed', 22035),
 ('Doorman', 20898),
 ('Dishwasher', 20426),
 ('No Fee', 18062),
 ('Laundry in Building', 16344),
 ('Fitness Center', 13252),
 ('Pre-War', 9148),
 ('Laundry in Unit', 8738),
 ('Roof Deck', 6542),
 ('Outdoor Space', 5268),
 ('Dining Room', 5136),
 ('High Speed Internet', 4299),
 ('Balcony', 2992),
 ('Swimming Pool', 2730),
 ('Laundry In Building', 2593),
 ('New Construction', 2559),
 ('Terrace', 2283)]

In [100]:
for feature, _ in top_20_features:
  df_train[feature] = df_train['features'].apply(lambda x: 1 if feature in x else 0) 
  df_test[feature] = df_test['features'].apply(lambda y: 1 if feature in y else 0) 

top_20_feature_names = [feature_name for feature_name, _ in top_20_features]
additional_features = ['bathrooms', 'bedrooms', 'interest_level']

feature_list = top_20_feature_names + additional_features

In [101]:
target = 'price'

X_train = df_train[feature_list]
y_train = df_train[target]

X_test = df_test[feature_list]
y_test = df_test[target]

### 4. Models implementation — Linear regression

In [102]:
class LinearRegressionSGD():
  def __init__(self, learning_rate=0.01, n_epochs=10):
    self.learning_rate = learning_rate
    self.n_epochs = n_epochs
    self.weights = None
    self.bias = 0

  def fit(self, X, y):
    X = np.array(X)
    y = np.array(y)

    n_samples, n_features = X.shape
    self.weights = np.zeros(n_features)

    # Stochastic gradient descent
    for epoch in range(self.n_epochs):
      for i in range(n_samples):
        # Prediction for current sample
        y_pred = np.dot(X[i], self.weights) + self.bias

        # Calculate gradients
        error = y_pred - y[i]
        dW = 2 * X[i] * error
        db = 2 * error

        # Update weights and bias
        self.weights -= self.learning_rate * dW
        self.bias -= self.learning_rate * db

  def predict(self, X):
    X = np.array(X)
    return np.dot(X, self.weights) + self.bias
  

class LinearRegressionAnalytical():
    def __init__(self):
      self.weights = None
      self.bias = 0

    def fit(self, X, y):
      X = np.array(X)
      y = np.array(y)

      # Analytical solution using Normal Equation
      X_b = np.c_[np.ones((X.shape[0], 1)), X]  # Add bias column
      self.weights = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

    def predict(self, X):
      X = np.array(X)
      X_b = np.c_[np.ones((X.shape[0], 1)), X]  # Add bias column
      return X_b.dot(self.weights)

In [103]:
from sklearn.metrics import mean_absolute_error, root_mean_squared_error

def r2_score(y_true, y_pred):
  ss_total = np.sum((y_true - np.mean(y_true)) ** 2)
  ss_residual = np.sum((y_true - y_pred) ** 2)
  r2 = (1 - (ss_residual / ss_total))
  return r2

def train_predict_and_evaluate(model, X_train, y_train, X_test, y_test):
    model.fit(X_train, y_train)
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)

    metrics = {
        "MAE": (mean_absolute_error(y_train, y_train_pred), mean_absolute_error(y_test, y_test_pred)),
        "RMSE": (root_mean_squared_error(y_train, y_train_pred), root_mean_squared_error(y_test, y_test_pred)),
        "R2": (r2_score(y_train, y_train_pred), r2_score(y_test, y_test_pred))
    }
    
    return metrics

In [104]:
from sklearn.linear_model import LinearRegression as SklearnLinearRegression

models = [
    ('SGD', LinearRegressionSGD()),
    ('Analytical', LinearRegressionAnalytical()),
    ('Sklearn', SklearnLinearRegression())
]

results_1 = {metric: {'Model': [], 'Train': [], 'Test': []} for metric in ["MAE", "RMSE", "R2"]}

# Train models and store results
for model_name, model in models:
    metrics = train_predict_and_evaluate(model, X_train, y_train, X_test, y_test)

    for metric, values in metrics.items():
        results_1[metric]['Model'].append(model_name)
        results_1[metric]['Train'].append(values[0])
        results_1[metric]['Test'].append(values[1])

    print(f'{model_name}: Training and evaluation completed')

SGD: Training and evaluation completed
Analytical: Training and evaluation completed
Sklearn: Training and evaluation completed


In [105]:
results_MAE = pd.DataFrame(results_1["MAE"])
results_RMSE = pd.DataFrame(results_1["RMSE"])
results_R2Score = pd.DataFrame(results_1["R2"])

print("\nMean Absolute Error (MAE):\n", results_MAE)
print("\nRoot Mean Squared Error (RMSE):\n", results_RMSE)
print("\nR2 Score:\n", results_R2Score)


Mean Absolute Error (MAE):
         Model        Train         Test
0         SGD  1660.872537  1617.860103
1  Analytical  1135.890034  1123.607781
2     Sklearn  1135.890034  1123.607781

Root Mean Squared Error (RMSE):
         Model         Train         Test
0         SGD  22037.194417  9705.770805
1  Analytical  21992.794926  9618.787306
2     Sklearn  21992.794926  9618.787306

R2 Score:
         Model     Train      Test
0         SGD  0.002667  0.001493
1  Analytical  0.006682  0.019311
2     Sklearn  0.006682  0.019311


### 5. Regularized models implementation — Ridge, Lasso, ElasticNet


In [106]:
class RidgeRegression:
    def __init__(self, alpha=1.0, learning_rate=0.01, epochs=1000):
        self.alpha = alpha  # Regularization strength (L2)
        self.learning_rate = learning_rate  # Learning rate
        self.epochs = epochs
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.epochs):
            y_pred = np.dot(X, self.weights) + self.bias
            dw = (1 / n_samples) * (np.dot(X.T, (y_pred - y)) + self.alpha * self.weights)
            db = (1 / n_samples) * np.sum(y_pred - y)

            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        return np.dot(X, self.weights) + self.bias


class LassoRegression:
    def __init__(self, alpha=1.0, learning_rate=0.01, epochs=1000):
        self.alpha = alpha  # Regularization strength (L1)
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.epochs):
            y_pred = np.dot(X, self.weights) + self.bias
            dw = (1 / n_samples) * (np.dot(X.T, (y_pred - y))) + self.alpha * np.sign(self.weights)  # L1 term
            db = (1 / n_samples) * np.sum(y_pred - y)

            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        return np.dot(X, self.weights) + self.bias


class ElasticNetRegression:
    def __init__(self, alpha=1.0, l1_ratio=0.5, learning_rate=0.01, epochs=1000):
        self.alpha = alpha  # Regularization strength
        self.l1_ratio = l1_ratio  # L1 vs L2 ratio
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.epochs):
            y_pred = np.dot(X, self.weights) + self.bias
            l1_term = self.l1_ratio * np.sign(self.weights)  # L1
            l2_term = (1 - self.l1_ratio) * self.weights  # L2
            dw = (1 / n_samples) * np.dot(X.T, (y_pred - y)) + self.alpha * (l1_term + l2_term)
            db = (1 / n_samples) * np.sum(y_pred - y)

            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        return np.dot(X, self.weights) + self.bias


In [107]:
from sklearn.linear_model import Ridge, Lasso, ElasticNet

models = [
    ('Ridge (Custom)', RidgeRegression(alpha=1.0)),
    ('Lasso (Custom)', LassoRegression(alpha=1.0)),
    ('ElasticNet (Custom)', ElasticNetRegression(alpha=1.0, l1_ratio=0.5)),
    ('Ridge (Sklearn)', Ridge(alpha=1.0)),
    ('Lasso (Sklearn)', Lasso(alpha=1.0)),
    ('ElasticNet (Sklearn)', ElasticNet(alpha=1.0, l1_ratio=0.5))
]

results_2 = {metric: {'Model': [], 'Train': [], 'Test': []} for metric in ["MAE", "RMSE", "R2"]}

# Train models and store results
for model_name, model in models:
    metrics = train_predict_and_evaluate(model, X_train, y_train, X_test, y_test)

    for metric, values in metrics.items():
        results_2[metric]['Model'].append(model_name)
        results_2[metric]['Train'].append(values[0])
        results_2[metric]['Test'].append(values[1])

    print(f'{model_name}: Training and evaluation completed')


Ridge (Custom): Training and evaluation completed
Lasso (Custom): Training and evaluation completed
ElasticNet (Custom): Training and evaluation completed
Ridge (Sklearn): Training and evaluation completed
Lasso (Sklearn): Training and evaluation completed
ElasticNet (Sklearn): Training and evaluation completed


In [108]:
df_MAE = pd.DataFrame(results_2["MAE"])
df_RMSE = pd.DataFrame(results_2["RMSE"])
df_R2 = pd.DataFrame(results_2["R2"])

print("\nMean Absolute Error (MAE):\n", df_MAE)
print("\nRoot Mean Squared Error (RMSE):\n", df_RMSE)
print("\nR2 Score:\n", df_R2)


Mean Absolute Error (MAE):
                   Model        Train         Test
0        Ridge (Custom)  1097.540267  1112.070263
1        Lasso (Custom)  1095.980136  1111.063318
2   ElasticNet (Custom)  1066.756122  1075.869141
3       Ridge (Sklearn)  1135.844098  1123.574955
4       Lasso (Sklearn)  1132.050014  1120.776753
5  ElasticNet (Sklearn)  1090.801634  1108.878278

Root Mean Squared Error (RMSE):
                   Model         Train         Test
0        Ridge (Custom)  21994.203113  9611.089481
1        Lasso (Custom)  21994.249651  9611.049927
2   ElasticNet (Custom)  22010.835837  9610.589277
3       Ridge (Sklearn)  21992.794927  9618.778133
4       Lasso (Sklearn)  21992.804021  9618.658296
5  ElasticNet (Sklearn)  22012.208427  9613.895725

R2 Score:
                   Model     Train      Test
0        Ridge (Custom)  0.006555  0.020880
1        Lasso (Custom)  0.006551  0.020888
2   ElasticNet (Custom)  0.005052  0.020982
3       Ridge (Sklearn)  0.006682  0.01931

### 6. Feature normalization

When is Feature Normalization Mandatory?

1. Gradient Descent Optimization

   - If features are on different scales, gradient descent might converge slowly or get stuck.
   - Example: Predicting house prices with area (sq ft) ranging from 500–5000 and number of bedrooms ranging from 1–5.

2. Distance-Based Algorithms (e.g., KNN, K-Means)

   - Features with larger scales dominate distance computations, leading to biased results.
   - Example: In K-Nearest Neighbors (KNN), Euclidean distance depends on feature magnitude.


When is Feature Normalization NOT Needed?

1. Tree-Based Models (e.g., Decision Trees, Random Forest, XGBoost)

   - Tree models split on feature values rather than using distances or gradients, so scaling doesn’t impact performance.

2. Already Normalized Data

   - If the data is already on a similar scale (e.g., percentages ranging from 0 to 100), normalization may not be necessary.

**Min-Max Scaling**

The **MinMaxScaler** rescales features to a range of **[0,1]** using the following formula:

$$X_{\text{scaled}} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}$$

- $X_{min}$ : Minimum value of the feature  
- $X_{max}$ : Maximum value of the feature  

In [109]:
from sklearn.preprocessing import MinMaxScaler

class MinMax_Scaler():
    def __init__(self):
        self.X_min = None
        self.X_max = None
    
    def fit_transform(self, X_train):
        self.X_min = np.min(X_train, axis = 0)
        self.X_max = np.max(X_train, axis = 0)
        X_scaled = (X_train - self.X_min) / (self.X_max - self.X_min)
        return X_scaled
    
    def transform(self, X_test):
        X_scaled = (X_test - self.X_min) / (self.X_max - self.X_min)
        return X_scaled

# Custom
custom_scaler = MinMax_Scaler()
X_train_scaled_custom = custom_scaler.fit_transform(X_train)
X_test_scaled_custom = custom_scaler.transform(X_test)

# Sklearn method
scaler = MinMaxScaler()
X_train_scaled_sklearn = scaler.fit_transform(X_train)
X_test_scaled_sklearn = scaler.transform(X_test)

print(np.allclose(X_train_scaled_custom, X_train_scaled_sklearn))
print(np.allclose(X_test_scaled_custom, X_test_scaled_sklearn))

True
True


In [110]:
from sklearn.preprocessing import StandardScaler

class Standard_Scaler:
    def __init__(self):
        self.mean = None
        self.std = None

    def fit_transform(self, X_train):
        self.mean = np.mean(X_train, axis=0)
        self.std = np.std(X_train, axis=0)
        X_scaled = (X_train - self.mean) / self.std
        return X_scaled

    def transform(self, X_test):
        X_scaled = (X_test - self.mean) / self.std
        return X_scaled

# Custom
scaler_custom = Standard_Scaler()
X_train_standardized_custom = scaler_custom.fit_transform(X_train)
X_test_standardized_custom = scaler_custom.transform(X_test)

# Sklearn method
scaler_std = StandardScaler()
X_train_standardized_sklearn = scaler_std.fit_transform(X_train)
X_test_standardized_sklearn = scaler_std.transform(X_test)

print(np.allclose(X_train_standardized_custom, X_train_standardized_sklearn))
print(np.allclose(X_test_standardized_custom, X_test_standardized_sklearn))

True
True


### 7. Fit models with normalization

In [111]:
def train_predict_and_evaluate(model, X_train, y_train, X_test, y_test):
    model.fit(X_train, y_train)
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)

    metrics = {
        "MAE": (mean_absolute_error(y_train, y_train_pred), mean_absolute_error(y_test, y_test_pred)),
        "RMSE": (root_mean_squared_error(y_train, y_train_pred), root_mean_squared_error(y_test, y_test_pred)),
        "R2": (r2_score(y_train, y_train_pred), r2_score(y_test, y_test_pred))
    }
    
    return metrics

def machine_learning(scalers, models, results):
  for scaler_name, scaler in scalers.items():
    for model_name, model in models:
      
          if scaler_name == 'Default':
              metrics = train_predict_and_evaluate(model, X_train, y_train, X_test, y_test)
          
          elif scaler_name == 'MinMaxScaler' or scaler_name == 'StandardScaler':
              X_train_scaled = scaler.fit_transform(X_train)
              X_test_scaled = scaler.transform(X_test)
              metrics = train_predict_and_evaluate(model, X_train_scaled, y_train, X_test_scaled, y_test)
         
          elif scaler_name == 'Polynomial':
              X_train_poly = scaler.fit_transform(X_train[['bathrooms', 'bedrooms', 'interest_level']])
              X_test_poly = scaler.transform(X_test[['bathrooms', 'bedrooms', 'interest_level']])
              metrics = train_predict_and_evaluate(model, X_train_poly, y_train, X_test_poly, y_test)
          
          for metric, values in metrics.items():
              results[metric]['Model'].append(model_name + f' {scaler_name}')
              results[metric]['Train'].append(values[0])
              results[metric]['Test'].append(values[1])

          print(f'{model_name} {scaler_name}: Training and evaluation completed')

In [112]:
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.preprocessing import MinMaxScaler, StandardScaler

models = [
    ('Linreg', LinearRegression()),
    ('Ridge', Ridge()),
    ('Lasso', Lasso()),
    ('ElasticNet', ElasticNet())
]

scalers = {
    'Default': None,
    'MinMaxScaler': MinMaxScaler(),
    'StandardScaler': StandardScaler()
}

results = {metric: {'Model': [], 'Train': [], 'Test': []} for metric in ["MAE", "RMSE", "R2"]}

machine_learning(scalers, models, results)

Linreg Default: Training and evaluation completed
Ridge Default: Training and evaluation completed
Lasso Default: Training and evaluation completed
ElasticNet Default: Training and evaluation completed
Linreg MinMaxScaler: Training and evaluation completed
Ridge MinMaxScaler: Training and evaluation completed
Lasso MinMaxScaler: Training and evaluation completed
ElasticNet MinMaxScaler: Training and evaluation completed
Linreg StandardScaler: Training and evaluation completed
Ridge StandardScaler: Training and evaluation completed
Lasso StandardScaler: Training and evaluation completed
ElasticNet StandardScaler: Training and evaluation completed


In [113]:
result_MAE = pd.DataFrame(results["MAE"])
result_RMSE = pd.DataFrame(results["RMSE"])
result_R2 = pd.DataFrame(results["R2"])

print("\nMAE Results:\n", result_MAE)
print("\nRMSE Results:\n", result_RMSE)
print("\nR2 Results:\n", result_R2)


MAE Results:
                         Model        Train         Test
0              Linreg Default  1135.890034  1123.607781
1               Ridge Default  1135.844098  1123.574955
2               Lasso Default  1132.050014  1120.776753
3          ElasticNet Default  1090.801634  1108.878278
4         Linreg MinMaxScaler  1135.890034  1123.607781
5          Ridge MinMaxScaler  1135.838563  1123.883187
6          Lasso MinMaxScaler  1131.372213  1120.726491
7     ElasticNet MinMaxScaler  1436.094703  1389.342907
8       Linreg StandardScaler  1135.890034  1123.607781
9        Ridge StandardScaler  1135.879137  1123.599364
10       Lasso StandardScaler  1134.175075  1122.270001
11  ElasticNet StandardScaler  1051.474456  1066.145360

RMSE Results:
                         Model         Train         Test
0              Linreg Default  21992.794926  9618.787306
1               Ridge Default  21992.794927  9618.778133
2               Lasso Default  21992.804021  9618.658296
3          El

### 8. Overfit models



In [114]:
from sklearn.preprocessing import PolynomialFeatures

models = [
    ('Linreg', LinearRegression()),
    ('Ridge', Ridge(alpha=1.0, max_iter=5000, tol=1e+7)),
    ('Lasso', Lasso(alpha=0.01, max_iter=5000, tol=1e+7)),
    ('ElasticNet', ElasticNet(alpha=0.01, max_iter=5000, tol=1e+7))
]

scalers = {
    'Default': None,
    'MinMaxScaler': MinMaxScaler(),
    'StandardScaler': StandardScaler(),
    'Polynomial' : PolynomialFeatures(degree=3)
}

new_results = {metric: {'Model': [], 'Train': [], 'Test': []} for metric in ["MAE", "RMSE", "R2"]}

machine_learning(scalers, models, new_results)

Linreg Default: Training and evaluation completed
Ridge Default: Training and evaluation completed
Lasso Default: Training and evaluation completed
ElasticNet Default: Training and evaluation completed
Linreg MinMaxScaler: Training and evaluation completed
Ridge MinMaxScaler: Training and evaluation completed
Lasso MinMaxScaler: Training and evaluation completed
ElasticNet MinMaxScaler: Training and evaluation completed
Linreg StandardScaler: Training and evaluation completed
Ridge StandardScaler: Training and evaluation completed
Lasso StandardScaler: Training and evaluation completed
ElasticNet StandardScaler: Training and evaluation completed
Linreg Polynomial: Training and evaluation completed
Ridge Polynomial: Training and evaluation completed
Lasso Polynomial: Training and evaluation completed
ElasticNet Polynomial: Training and evaluation completed


In [115]:
new_result_MAE = pd.DataFrame(new_results["MAE"])
new_result_RMSE = pd.DataFrame(new_results["RMSE"])
new_result_R2 = pd.DataFrame(new_results["R2"])

print("\nMAE Results:\n", new_result_MAE)
print("\nRMSE Results:\n", new_result_RMSE)
print("\nR2 Results:\n", new_result_R2)


MAE Results:
                         Model        Train         Test
0              Linreg Default  1135.890034  1123.607781
1               Ridge Default  1135.844098  1123.574955
2               Lasso Default  1215.067096  1190.027368
3          ElasticNet Default  1203.315777  1181.054646
4         Linreg MinMaxScaler  1135.890034  1123.607781
5          Ridge MinMaxScaler  1135.838563  1123.883187
6          Lasso MinMaxScaler  1215.038720  1190.006705
7     ElasticNet MinMaxScaler  1178.110858  1190.503046
8       Linreg StandardScaler  1135.890034  1123.607781
9        Ridge StandardScaler  1135.879137  1123.599364
10       Lasso StandardScaler  1215.087176  1190.043330
11  ElasticNet StandardScaler  1212.664431  1188.129744
12          Linreg Polynomial  1031.797352  4133.626544
13           Ridge Polynomial  1031.725328  4124.360428
14           Lasso Polynomial  1096.174484  1446.850512
15      ElasticNet Polynomial  1094.056284  1452.540141

RMSE Results:
                  

### 9. Native models

In [116]:
for metric in results:
    mean_train = np.mean(results[metric]['Train'])
    median_train = np.median(results[metric]['Train'])

    mean_test = np.mean(results[metric]['Test'])
    median_test = np.median(results[metric]['Test'])

    results[metric]['Model'].append('Native mean')
    results[metric]['Train'].append(mean_train)
    results[metric]['Test'].append(mean_test)

    results[metric]['Model'].append('Native median')
    results[metric]['Train'].append(median_train)
    results[metric]['Test'].append(median_test)

In [117]:
result_MAE = pd.DataFrame(results["MAE"])
result_RMSE = pd.DataFrame(results["RMSE"])
result_R2 = pd.DataFrame(results["R2"])

print("\nMAE Results:\n", result_MAE)
print("\nRMSE Results:\n", result_RMSE)
print("\nR2 Results:\n", result_R2)


MAE Results:
                         Model        Train         Test
0              Linreg Default  1135.890034  1123.607781
1               Ridge Default  1135.844098  1123.574955
2               Lasso Default  1132.050014  1120.776753
3          ElasticNet Default  1090.801634  1108.878278
4         Linreg MinMaxScaler  1135.890034  1123.607781
5          Ridge MinMaxScaler  1135.838563  1123.883187
6          Lasso MinMaxScaler  1131.372213  1120.726491
7     ElasticNet MinMaxScaler  1436.094703  1389.342907
8       Linreg StandardScaler  1135.890034  1123.607781
9        Ridge StandardScaler  1135.879137  1123.599364
10       Lasso StandardScaler  1134.175075  1122.270001
11  ElasticNet StandardScaler  1051.474456  1066.145360
12                Native mean  1149.266666  1139.168387
13              Native median  1135.841331  1123.587160

RMSE Results:
                         Model         Train         Test
0              Linreg Default  21992.794926  9618.787306
1              

### 10. Compare results

In [118]:

# Determine the best model (lowest MAE and RMSE, highest R2)
best_mae_model = result_MAE.loc[result_MAE['Test'].idxmin()]
best_rmse_model = result_RMSE.loc[result_RMSE['Test'].idxmin()]
best_r2_model = result_R2.loc[result_R2['Test'].idxmax()]

# Calculate the overall ranking of models based on MAE, RMSE, and R2
mae_rank = result_MAE['Test'].rank()
rmse_rank = result_RMSE['Test'].rank()
r2_rank = result_R2['Test'].rank(ascending=False)  # Higher R2 is better, so rank in descending order

# Add ranks to the DataFrames for comparison
result_MAE['Rank'] = mae_rank
result_RMSE['Rank'] = rmse_rank
result_R2['Rank'] = r2_rank

# Combine ranks for the overall model rank
combined_rank = (result_MAE['Rank'] + result_RMSE['Rank'] + result_R2['Rank']) / 3

# Find the model with the lowest combined rank
best_model_overall = result_MAE.iloc[combined_rank.idxmin()]

print(f"\nBest Model (Overall): {best_model_overall['Model']}")

# Determine the most stable model (smallest difference between train and test)
stability = result_MAE['Train'] - result_MAE['Test']
most_stable_model = result_MAE.iloc[stability.abs().idxmin()]

print(f"Most Stable Model: {most_stable_model['Model']}")



Best Model (Overall): ElasticNet StandardScaler
Most Stable Model: Native mean


### 11. Addition Task

#### Log Transformation

**Why Apply Log Transformation?**

**Reduces Skewness:**

- If the target variable has a right-skewed (long-tailed) distribution, taking the logarithm makes it closer to a normal distribution, which many machine learning models handle better.

**Stabilizes Variance:**

- Reduces the impact of outliers by compressing large values.

**Improves Model Performance:**

- Many regression models (like Linear Regression) assume homoscedasticity (constant variance). Log transformation helps satisfy this assumption.

In [124]:
from sklearn.metrics import mean_squared_error 

# Apply log transformation to target
y_train_log = np.log(y_train)
y_test_log = np.log(y_test)

model_log = LinearRegression()
model_log.fit(X_train, y_train_log)

# Make predictions on the test set (in the log-transformed space)
y_pred_log = model_log.predict(X_test)

# Inverse transformation to get predictions back to the original scale
y_pred = np.exp(y_pred_log)

# Metrics
mae_log = mean_absolute_error(y_test, y_pred)
rmse_log = np.sqrt(mean_squared_error(y_test, y_pred))
r2_log = r2_score(y_test, y_pred)

print(f"MAE (Log-transformed): {mae_log}")
print(f"RMSE (Log-transformed): {rmse_log}")
print(f"R² (Log-transformed): {r2_log}")


MAE (Log-transformed): 20906901046043.13
RMSE (Log-transformed): 5712559650092251.0
R² (Log-transformed): -3.4590141858145414e+23


#### Remove outliers from Training Data

**Why Remove Outliers Only from Training Data?**


**Prevent Model Bias**

- Outliers pull the regression line towards them, leading to a biased model that does not generalize well.
- By removing outliers from training data, we allow the model to learn a more stable, representative relationship between features and target variables.

**Maintain Real-World Testing Conditions**

- If you remove outliers from the test set, you create a false sense of model performance.
- The model should be evaluated on real-world data, which includes outliers, since they may appear when making future predictions.

**Avoid Data Leakage**

- Removing outliers from both training and test sets could introduce data leakage, making the model artificially accurate.

In [126]:
# Calculate IQR for the target variable
Q1 = np.percentile(y_train, 25)
Q3 = np.percentile(y_train, 75)
IQR = Q3 - Q1

# Define bounds for outliers (outside 1.5*IQR range)
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Remove outliers from training data
mask = (y_train >= lower_bound) & (y_train <= upper_bound)
X_train_clean, y_train_clean = X_train[mask], y_train[mask]

# Train model on the cleaned training data
model_clean = LinearRegression()
model_clean.fit(X_train_clean, y_train_clean)

# Make predictions on the test set (no outliers removed from test data)
y_pred_clean = model_clean.predict(X_test)

# Calculate evaluation metrics
mae_clean = mean_absolute_error(y_test, y_pred_clean)
rmse_clean = np.sqrt(mean_squared_error(y_test, y_pred_clean))
r2_clean = r2_score(y_test, y_pred_clean)

print(f"MAE (Cleaned Training Data): {mae_clean}")
print(f"RMSE (Cleaned Training Data): {rmse_clean}")
print(f"R² (Cleaned Training Data): {r2_clean}")


MAE (Cleaned Training Data): 953.758551133676
RMSE (Cleaned Training Data): 9614.606682368274
R² (Cleaned Training Data): 0.020162870226281715


#### Implementing Linear Regression with Batch Training (Gradient Descent)

In **batch training**, the model updates its weights using the **entire dataset** (or a large subset) in each iteration. This is useful when working with large datasets and avoids the inefficiencies of updating after every single data point (as in stochastic gradient descent).

The **linear regression equation** is:

$$
y = Xw + b
$$

where:

- $X$ is the feature matrix,
- $w$ is the weight vector (coefficients),
- $b$ is the bias (intercept),
- $y$ is the target variable.

To find the best parameters $w$ and $b$, we minimize the **Mean Squared Error (MSE)**:

$${MSE} = \frac{1}{m} \sum (y_i - \hat{y}_i)^2$$

where $m$ is the number of samples.

**Implement Batch Gradient Descent**

Update rules:

$$w = w - \alpha \frac{1}{m} X^T (Xw + b - y)$$

$$b = b - \alpha \frac{1}{m} \sum (Xw + b - y)$$

where $\alpha$ (alpha) is the learning rate.

In [130]:
class BatchLinearRegression:
    def __init__(self, learning_rate=0.01, epochs=1000):
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        m, n = X.shape
        self.weights = np.zeros(n)
        self.bias = 0

        for epoch in range(self.epochs):
            y_pred = np.dot(X, self.weights) + self.bias

            # Compute gradients
            dw = (1 / m) * np.dot(X.T, (y_pred - y))
            db = (1 / m) * np.sum(y_pred - y)

            # Update weights and bias
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

# Train model using batch gradient descent
model_batch = BatchLinearRegression(learning_rate=0.01, epochs=1000)
model_batch.fit(X_train, y_train)

y_pred_batch = model_batch.predict(X_test)

# Metrics
mae_batch = mean_absolute_error(y_test, y_pred_batch)
rmse_batch = np.sqrt(mean_squared_error(y_test, y_pred_batch))
r2_batch = r2_score(y_test, y_pred_batch)

print(f"MAE (Batch Gradient Descent): {mae_batch}")
print(f"RMSE (Batch Gradient Descent): {rmse_batch}")
print(f"R² (Batch Gradient Descent): {r2_batch}")


MAE (Batch Gradient Descent): 1112.0820351168907
RMSE (Batch Gradient Descent): 9611.092557891541
R² (Batch Gradient Descent): 0.0208789973648551
