# Imports
<hr>

In [14]:
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_absolute_error

import numpy as np
import pandas as pd

import warnings
from sklearn.exceptions import ConvergenceWarning

from joblib import dump

# Ignore both user warnings and convergence warnings
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=ConvergenceWarning)

# Load Data
<hr>

In [2]:
df = pd.read_csv('DATA/final_df.csv')

# Splitting Data
<hr>

In [3]:
X = df.drop('SalePrice', axis=1)
y = df.SalePrice

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=101)

# Evaluation Metric
<hr>

In [5]:
def custom_mean_absolute_error(model, X_test, y_test):
    """
    Calculate the Mean Absolute Error (MAE) for a regression model.

    Parameters:
    - model: A trained regression model.
    - X_test: The test input data.
    - y_test: The true target values for the test data.

    Returns:
    - The MAE for the model's predictions.
    """
    preds = model.predict(X_test)

    # Convert predictions and true values back from log-transformed format
    preds_exp = np.exp(preds) - 1
    y_test_exp = np.exp(y_test) - 1

    # Calculate the MAE
    mae = mean_absolute_error(y_test_exp, preds_exp)

    return mae.round(3)

# ElasticNet
<hr>

### Set Up the Model

In [6]:
linear_model = ElasticNet(max_iter=10000,
                          warm_start=True,
                          random_state=101)

params = {'l1_ratio':np.linspace(0, 1, 20),
          'alpha':np.arange(0,100,5)}

grid_search = GridSearchCV(estimator=linear_model,
                           param_grid=params,
                           scoring="neg_mean_squared_error", 
                           n_jobs=-1,
                           cv=5,
                           verbose=2)

### Fitting the Model

In [7]:
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 400 candidates, totalling 2000 fits


### Models Ranking - Top 25

In [9]:
pd.DataFrame(grid_search.cv_results_).sort_values('rank_test_score')[:25]

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_alpha,param_l1_ratio,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,26.448099,1.946356,0.014793,0.002305,0,0.0,"{'alpha': 0, 'l1_ratio': 0.0}",-0.015659,-0.017701,-0.013111,-0.022526,-0.021818,-0.018163,0.003589,1
19,22.801443,1.939693,0.019244,0.007451,0,1.0,"{'alpha': 0, 'l1_ratio': 1.0}",-0.015659,-0.017701,-0.013111,-0.022526,-0.021818,-0.018163,0.003589,1
18,22.948343,1.859966,0.018142,0.002982,0,0.947368,"{'alpha': 0, 'l1_ratio': 0.9473684210526315}",-0.015659,-0.017701,-0.013111,-0.022526,-0.021818,-0.018163,0.003589,1
17,23.988342,1.791202,0.015548,0.001017,0,0.894737,"{'alpha': 0, 'l1_ratio': 0.894736842105263}",-0.015659,-0.017701,-0.013111,-0.022526,-0.021818,-0.018163,0.003589,1
16,24.288048,2.166158,0.016078,0.001572,0,0.842105,"{'alpha': 0, 'l1_ratio': 0.8421052631578947}",-0.015659,-0.017701,-0.013111,-0.022526,-0.021818,-0.018163,0.003589,1
15,24.63072,2.196367,0.015748,0.000747,0,0.789474,"{'alpha': 0, 'l1_ratio': 0.7894736842105263}",-0.015659,-0.017701,-0.013111,-0.022526,-0.021818,-0.018163,0.003589,1
14,24.110188,1.791248,0.017159,0.004244,0,0.736842,"{'alpha': 0, 'l1_ratio': 0.7368421052631579}",-0.015659,-0.017701,-0.013111,-0.022526,-0.021818,-0.018163,0.003589,1
12,24.470965,1.963144,0.023459,0.014187,0,0.631579,"{'alpha': 0, 'l1_ratio': 0.631578947368421}",-0.015659,-0.017701,-0.013111,-0.022526,-0.021818,-0.018163,0.003589,1
11,24.385789,1.963482,0.015657,0.001529,0,0.578947,"{'alpha': 0, 'l1_ratio': 0.5789473684210527}",-0.015659,-0.017701,-0.013111,-0.022526,-0.021818,-0.018163,0.003589,1
10,23.827146,2.30884,0.015349,0.001491,0,0.526316,"{'alpha': 0, 'l1_ratio': 0.5263157894736842}",-0.015659,-0.017701,-0.013111,-0.022526,-0.021818,-0.018163,0.003589,1


From the results, it is evident that the ElasticNet model performed best when the l1_ratio is less than 1. The best-performing configuration has an l1_ratio value of 0, indicating that L1 regularization is dominant around this value. This observation suggests that the model's performance benefits from more L2 (Ridge) regularization compared to L1 (Lasso) when l1_ratio is close to 0.

### Best Model

In [10]:
grid_search.best_params_

{'alpha': 0, 'l1_ratio': 0.0}

In [11]:
best_model = grid_search.best_estimator_

### Mean Absolute Error

In [15]:
MAE = custom_mean_absolute_error(best_model, X_test, y_test)

In [16]:
MAE

13254.906

# Saving The Model
<hr>

In [17]:
dump(best_model, "Model/0-alpha 0-l1_ratio elasticnet")

['Model/0-alpha 0-l1_ratio elasticnet']

# Conclusion


In this analysis of the Ames Housing dataset, we developed and evaluated a regression model to predict housing prices. After extensive data preprocessing, feature engineering, and model selection, our best-performing model achieved a Mean Absolute Error (MAE) of approximately 13,254.906 on the test data.

The MAE value of 13,254.906 indicates that, on average, our model's predictions deviate from the true housing prices by approximately $13,254.906. While this MAE value provides valuable insights into the model's accuracy, it's essential to consider domain-specific factors and project requirements to determine whether this level of accuracy is satisfactory.

This analysis underscores the importance of fine-tuning and further model improvement. Future work may involve experimenting with different features, hyperparameter tuning, and exploring more advanced regression techniques to reduce prediction errors.

Overall, this project serves as a foundation for housing price prediction, and there is room for further refinement to enhance the model's accuracy and utility.

Thank you for joining this journey through the Ames Housing dataset analysis!
