The main goal for this notebook is to hone in on what predictive model should be further explored.

**Brief Descriptions of the ML Models that will be used**

* *Linear Regression* - A supervised learning model used for predicting a continuous target variable. It does this by fitting a line (or a hyperplane in the case of multiple variables) to the data, minimizing the difference between predicted and actual values (errors). It works best when the relationship between the input features and the target variable is linear.
* *Support Vector Machine Regression* - A supervised learning model used for predicting a continuous target variable. According to Géron in "Hands-On Machine Learning with scikit-Learn & Tensorflow", this method fits as many data points as possible within a specified margin, while limiting the number of data points that fall outside this margin (Géron, 2017).
* *Decision Tree Regressor* - A supervised learning model used for predicting a continuous target variable. According to Géron, it works by splitting data based on yes/no questions (Géron, 2017). Predictions are made by averaging the target values of the remaining data points when no further splits can be made (Géron, 2017).
* *Random Forest Regressor* - A supervised learning model used for predicting a continuous target variable. As described by Aurelien Geron, it combines multiple decision trees using a technique called bagging (Hands-On ML, 2017). Bagging allows training data points to be sampled multiple times across different trees (Hands-On ML, 2017). The final prediction is the average of the predicted values from all the decision trees (Hands-On ML, 2017).
* *AdaBoost Regressor* - A supervised learning model used for predicting a continuous target variable. According to scikit-learn documentation, it is a meta-estimator that initially trains a regressor on the training data (AdaBoostRegressor). Additional regressors are then trained with adjustments to focus more on harder-to-predict data points (AdaBoostRegressor).
* *Gradient Boost Regressor* - A supervised learning model used for predicting a continuous target variable. According to Masui in "All You Need to Know about Gradient Boost - Part 1. Regression", this is achieved by iteratively building models to minimize the errors from previous iterations (Masui, 2024).

## Loading in the training and testing data frames

In [18]:
# Importing pandas library and reading in the training and testing CSV files.
import pandas as pd
train = pd.read_csv('train_df.csv')
test = pd.read_csv('test_df.csv')

## Verifying the training and testing data frames look correct

In [19]:
# Viewing first five rows of the train data frame.
train.head()

Unnamed: 0.1,Unnamed: 0,Price,Beds,Baths,Square feet,Lot size,Year built,Hoa/month,City_Gardner,City_Leawood,...,Zip or postal code_66215.0,Zip or postal code_66216.0,Zip or postal code_66218.0,Zip or postal code_66219.0,Zip or postal code_66220.0,Zip or postal code_66221.0,Zip or postal code_66223.0,Zip or postal code_66224.0,Zip or postal code_66225.0,Zip or postal code_66226.0
0,1048,350000.0,3.0,3.0,1876.0,11761.0,1995.0,13.0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,406,297000.0,2.0,2.5,1559.0,15001.0,1990.0,0.0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1027,279950.0,3.0,2.0,1344.0,7405.0,1982.0,0.0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,486,299000.0,3.0,1.0,1404.0,11353.0,1950.0,0.0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1485,531000.0,4.0,3.5,3542.0,12553.0,1994.0,20.0,0,0,...,1,0,0,0,0,0,0,0,0,0


In [20]:
# Viewing first five rows of the test data frame.
test.head()

Unnamed: 0.1,Unnamed: 0,Price,Beds,Baths,Square feet,Lot size,Year built,Hoa/month,City_Gardner,City_Leawood,...,Zip or postal code_66215.0,Zip or postal code_66216.0,Zip or postal code_66218.0,Zip or postal code_66219.0,Zip or postal code_66220.0,Zip or postal code_66221.0,Zip or postal code_66223.0,Zip or postal code_66224.0,Zip or postal code_66225.0,Zip or postal code_66226.0
0,917,749500.0,6.0,4.0,2884.0,9587.0,2023.0,156.0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,134,939000.0,4.0,4.5,3880.0,11944.0,2016.0,131.0,0,0,...,0,0,0,0,0,0,0,1,0,0
2,390,265000.0,3.0,2.0,1136.0,9877.0,1962.0,0.0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,790,308000.0,3.0,2.0,1808.0,12207.0,1964.0,0.0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1498,345000.0,3.0,2.5,1766.0,14249.0,1972.0,0.0,0,0,...,1,0,0,0,0,0,0,0,0,0


Indexes from previous data frame became new columns in both the testing and training data frames.

In [21]:
# Deleting the previous indexes column in both the train and test data frames. 
del train['Unnamed: 0']
del test['Unnamed: 0']

In [22]:
# Viewing the first five rows of the train data frame.
train.head()

Unnamed: 0,Price,Beds,Baths,Square feet,Lot size,Year built,Hoa/month,City_Gardner,City_Leawood,City_Lenexa,...,Zip or postal code_66215.0,Zip or postal code_66216.0,Zip or postal code_66218.0,Zip or postal code_66219.0,Zip or postal code_66220.0,Zip or postal code_66221.0,Zip or postal code_66223.0,Zip or postal code_66224.0,Zip or postal code_66225.0,Zip or postal code_66226.0
0,350000.0,3.0,3.0,1876.0,11761.0,1995.0,13.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,297000.0,2.0,2.5,1559.0,15001.0,1990.0,0.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,279950.0,3.0,2.0,1344.0,7405.0,1982.0,0.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,299000.0,3.0,1.0,1404.0,11353.0,1950.0,0.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,531000.0,4.0,3.5,3542.0,12553.0,1994.0,20.0,0,0,1,...,1,0,0,0,0,0,0,0,0,0


In [23]:
# Viewing the first five rows of the test data frame.
test.head()

Unnamed: 0,Price,Beds,Baths,Square feet,Lot size,Year built,Hoa/month,City_Gardner,City_Leawood,City_Lenexa,...,Zip or postal code_66215.0,Zip or postal code_66216.0,Zip or postal code_66218.0,Zip or postal code_66219.0,Zip or postal code_66220.0,Zip or postal code_66221.0,Zip or postal code_66223.0,Zip or postal code_66224.0,Zip or postal code_66225.0,Zip or postal code_66226.0
0,749500.0,6.0,4.0,2884.0,9587.0,2023.0,156.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,939000.0,4.0,4.5,3880.0,11944.0,2016.0,131.0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2,265000.0,3.0,2.0,1136.0,9877.0,1962.0,0.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,308000.0,3.0,2.0,1808.0,12207.0,1964.0,0.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,345000.0,3.0,2.5,1766.0,14249.0,1972.0,0.0,0,0,1,...,1,0,0,0,0,0,0,0,0,0


## Importing libraries/functions for the regressor models.

In [24]:
# Importing predictive models and metrics from scikit-learn.
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import BayesianRidge
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score

## Separating target variable from train and test data frames.

In [25]:
# Creating data frame to allow easy comparison for each predictive model.
df_scoring = pd.DataFrame(columns=['method', 'mae', 'r2'])

# Isolating the target variables.
train_target = train['Price']
test_target = test['Price']

# Deleting target variable from train and test dataframes.
del train['Price']
del test['Price']

## Creating First Models

In [26]:
# Linear Regression Cell
lr1 = LinearRegression()
# Fitting the LinearRegression model.
lr1.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = lr1.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['lr1', mae, r2]


# Support Vector Regressor
svr1 = SVR()
# Fitting the Support Vector Regressor model.
svr1.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = svr1.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['svr1', mae, r2]


# Decision Tree Regressor
dt1 = DecisionTreeRegressor(random_state=42)
# Fitting the Decision Tree Regressor model.
dt1.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = dt1.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['dt1', mae, r2]


# Random Forest Regressor
rf1 = RandomForestRegressor(random_state=42)
# Fitting the Random Forest Regressor model.
rf1.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = rf1.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['rf1', mae, r2]


# Ada Boost Regressor
abr1 = AdaBoostRegressor(random_state=42)
# Fitting the AdaBoost Regressor model.
abr1.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = abr1.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['abr1', mae, r2]


# Gradient Boost Regressor
gb1 = GradientBoostingRegressor(random_state=42)
# Fitting the Gradient Boosting Regressor model.
gb1.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = gb1.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['gb1', mae, r2]

For the first six regressors, we did it without transforming the predictive model hyperparamters except for those that have a random_state as this allows the same results to occur every time the model is retrained as long as the data is the same.

In [27]:
# Sorting through scoring data frame by 'mae' to view the best model
df_scoring.sort_values(by='mae')

Unnamed: 0,method,mae,r2
3,rf1,73200.433223,0.650028
5,gb1,74790.902238,0.652966
2,dt1,104059.510417,0.475586
0,lr1,108526.958143,0.589164
1,svr1,187945.811692,-0.046302
4,abr1,189474.587733,0.401013


After an inital fitting for each of the six regressor models, the one with thet lowest mean absolute error was the Random Forest regressor, followed by the Gradient Boost regressor. Every other model had an mean absolute error greater than 100000, with the Ada Boost and Support Vector Machine regressor performing the worst.

## Pre model fitting, version two.

In [28]:
# Imporing packages/libraries for transformations.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# Creating transformation pipeline with a logrithmic transformation first, followed by a standard scaling transformation.
pipe1 = Pipeline([
    ('log', FunctionTransformer(func=np.log1p)),
    ('scaler', StandardScaler()),
    ])

# Retreiving the column names.
c_names = train.columns

# Fitting the pipeline with the train data frame, transforming the train dataframe using the fitted pipeline, and transforming the test data frame using the fitted pipeline.
train = pipe1.fit_transform(train)
test = pipe1.transform(test)

# Creating new train and test data frames with column names the same as before.
train = pd.DataFrame(train, columns=c_names)
test = pd.DataFrame(test, columns=c_names)

## Creating second iteration regressors

In [29]:
# Linear Regression Model 2
lr2 = LinearRegression()
# Fitting the LinearRegression model.
lr2.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = lr2.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['lr2', mae, r2]


# Support Vector Regressor Model 2
svr2 = SVR()
# Fitting the Support Vector Regressor model.
svr2.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = svr2.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['svr2', mae, r2]


# Decision Tree Regressor Model 2
dt2 = DecisionTreeRegressor(random_state=42)
# Fitting the Decision Tree Regressor model.
dt2.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = dt2.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['dt2', mae, r2]


# Random Forest Regressor Model 2
rf2 = RandomForestRegressor(random_state=42)
# Fitting the Random Forest Regressor model.
rf2.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = rf2.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['rf2', mae, r2]


# AdaBoost Regressor Model 2
abr2 = AdaBoostRegressor(random_state=42)
# Fitting the AdaBoost Regressor model.
abr2.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = abr2.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['abr2', mae, r2]


# Gradient Boosting Regressor Model 2
gb2 = GradientBoostingRegressor(random_state=42)
# Fitting the Gradient Boosting Regressor model.
gb2.fit(train, train_target)
# Predicting the target values for the test data frame.
pred_target = gb2.predict(test)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['gb2', mae, r2]


For the next 6 regressors, we transformed each predictor variable first via a logarithmic transformation using NumPy, and then via a StandardScaler transformation function via sklearn. For the standard scale transformation, the mean for each variable is subtracted by the data point, which is divided by the standard deviation for that variable. This will approximately produce variables with a mean of zero and a standard deviation of one for each variable. The goal of each was first to transform the variable to more of a normal distribution with the logarithmic transformation, and then transformation to the same scale. This will hopefully improve the performance of each regressor.

In [30]:
#Sorting the scoring data frame by 'mae'.
df_scoring.sort_values(by='mae')

Unnamed: 0,method,mae,r2
9,rf2,73133.843056,0.650328
3,rf1,73200.433223,0.650028
5,gb1,74790.902238,0.652966
11,gb2,75122.177386,0.648561
2,dt1,104059.510417,0.475586
8,dt2,105455.625,0.469367
0,lr1,108526.958143,0.589164
6,lr2,117268.574522,0.526956
7,svr2,187921.613031,-0.046173
1,svr1,187945.811692,-0.046302


Once again, the random forest and gradient boost models performed the best, with remaining models have a mean absolute error greater than 100000. However, the second ada boost model figured a way to have a mean absolute error greater than 200000.

In [35]:
# Specifying list of potential first letters for ml model names.
m_types = ['r','g','d','l','s','a']

# Creating for loop to go through every entry in the m_types list.
for i in m_types:
    # Locating all entries with method that starts with the value inside i.
    df_m = df_scoring[df_scoring['method'].str.startswith(i)]
    # Printign all the entries sorted by 'mae'.
    print(df_m.sort_values(by='mae'))
    # Printing out blank line
    print()
    

   method           mae        r2
15    rf3  72499.832648  0.625106
9     rf2  73133.843056  0.650328
3     rf1  73200.433223  0.650028

   method           mae        r2
17    gb3  69954.638629  0.648150
5     gb1  74790.902238  0.652966
11    gb2  75122.177386  0.648561

   method            mae        r2
14    dt3  101483.406561  0.506332
2     dt1  104059.510417  0.475586
8     dt2  105455.625000  0.469367

   method            mae        r2
12    lr3   81196.452519  0.632296
0     lr1  108526.958143  0.589164
6     lr2  117268.574522  0.526956

   method            mae        r2
13   svr3   73780.775082  0.656028
7    svr2  187921.613031 -0.046173
1    svr1  187945.811692 -0.046302

   method            mae        r2
16   abr3   97565.528133  0.595593
4    abr1  189474.587733  0.401013
10   abr2  201291.427928  0.357117



Unlike what I was expecting, not every model performed better with the transformed training data predictor value. While the Random Forest and Support Vector Machine regressor had a smaller mean absolute error, the remaining regressors all had a higher mean absolute error when fed with the transformed predictor variables.

## Creating third iteration regressors

In [32]:
# Applying log transformation to the target variable
train_target = np.log(train_target)

# Linear Regression Model 3
lr3 = LinearRegression()
# Fitting the LinearRegression model.
lr3.fit(train, train_target)
# Predicting the target values (log-transformed) for the test data frame.
pred_target_log = lr3.predict(test)
# Reversing the log transformation to get the actual predictions.
pred_target = np.exp(pred_target_log)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['lr3', mae, r2]


# Support Vector Regressor Model 3
svr3 = SVR()
# Fitting the Support Vector Regressor model.
svr3.fit(train, train_target)
# Predicting the target values (log-transformed) for the test data frame.
pred_target_log = svr3.predict(test)
# Reversing the log transformation to get the actual predictions.
pred_target = np.exp(pred_target_log)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['svr3', mae, r2]


# Decision Tree Regressor Model 3
dt3 = DecisionTreeRegressor(random_state=42)
# Fitting the Decision Tree Regressor model.
dt3.fit(train, train_target)
# Predicting the target values (log-transformed) for the test data frame.
pred_target_log = dt3.predict(test)
# Reversing the log transformation to get the actual predictions.
pred_target = np.exp(pred_target_log)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['dt3', mae, r2]


# Random Forest Regressor Model 3
rf3 = RandomForestRegressor(random_state=42)
# Fitting the Random Forest Regressor model.
rf3.fit(train, train_target)
# Predicting the target values (log-transformed) for the test data frame.
pred_target_log = rf3.predict(test)
# Reversing the log transformation to get the actual predictions.
pred_target = np.exp(pred_target_log)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['rf3', mae, r2]


# AdaBoost Regressor Model 3
abr3 = AdaBoostRegressor(random_state=42)
# Fitting the AdaBoost Regressor model.
abr3.fit(train, train_target)
# Predicting the target values (log-transformed) for the test data frame.
pred_target_log = abr3.predict(test)
# Reversing the log transformation to get the actual predictions.
pred_target = np.exp(pred_target_log)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['abr3', mae, r2]


# Gradient Boosting Regressor Model 3
gb3 = GradientBoostingRegressor(random_state=42)
# Fitting the Gradient Boosting Regressor model.
gb3.fit(train, train_target)
# Predicting the target values (log-transformed) for the test data frame.
pred_target_log = gb3.predict(test)
# Reversing the log transformation to get the actual predictions.
pred_target = np.exp(pred_target_log)
# Calculating the mean absolute error.
mae = mean_absolute_error(test_target, pred_target)
# Calculating the r2 score.
r2 = r2_score(test_target, pred_target)
# Updating our scoring dataframe.
df_scoring.loc[len(df_scoring.index)] = ['gb3', mae, r2]

In [33]:
# Sorting the scoring data frame by 'mae'.
df_scoring.sort_values(by='mae')

Unnamed: 0,method,mae,r2
17,gb3,69954.638629,0.64815
15,rf3,72499.832648,0.625106
9,rf2,73133.843056,0.650328
3,rf1,73200.433223,0.650028
13,svr3,73780.775082,0.656028
5,gb1,74790.902238,0.652966
11,gb2,75122.177386,0.648561
12,lr3,81196.452519,0.632296
16,abr3,97565.528133,0.595593
14,dt3,101483.406561,0.506332


The difference here was that the predictor variable was transformed logarithmically prior to fitting. As we can see, we now have multiple different types of regressor models with mean absolute errors under 100, with the only exception being the Decision Tree regressor. Additionally, the third iteration of the Gradient Boost regressor outperformed the third iteration of the random forest regressor. However, the random forest model stood out as it has 3 of the 4 lowest mean absolute errors.

In [34]:
# Creating for loop to go through every entry in the m_types list.
for i in m_types:
    # Locating all entries with method that starts with the value inside i.
    df_m = df_scoring[df_scoring['method'].str.startswith(i)]
    # Printign all the entries sorted by 'mae'.
    print(df_m.sort_values(by='mae'))

   method           mae        r2
15    rf3  72499.832648  0.625106
9     rf2  73133.843056  0.650328
3     rf1  73200.433223  0.650028
   method           mae        r2
17    gb3  69954.638629  0.648150
5     gb1  74790.902238  0.652966
11    gb2  75122.177386  0.648561
   method            mae        r2
14    dt3  101483.406561  0.506332
2     dt1  104059.510417  0.475586
8     dt2  105455.625000  0.469367
   method            mae        r2
12    lr3   81196.452519  0.632296
0     lr1  108526.958143  0.589164
6     lr2  117268.574522  0.526956
   method            mae        r2
13   svr3   73780.775082  0.656028
7    svr2  187921.613031 -0.046173
1    svr1  187945.811692 -0.046302
   method            mae        r2
16   abr3   97565.528133  0.595593
4    abr1  189474.587733  0.401013
10   abr2  201291.427928  0.357117


Additionally, we can tell the third iteration for every type of regressor model performed better than the first two. The biggest surprise was seeing that the third iteration for the Support Vector Machine regressor decreased in mean absolute error by over 100000 dollars. 

## Conclusion

Following the results produced, the methods that will further be explored are Support Vector Machine, Gradient Boost, and Random Forest regressor in the next notebook.

## References
Géron, A. (2017). Hands-On Machine Learning with Scikit-Learn and TensorFlow: Techniques and Tools to Build Learning Machines. O’Reilly Media.

AdaBoostRegressor. Scikit-learn. https://scikit-learn.org/dev/modules/generated/sklearn.ensemble.AdaBoostRegressor.html

Masui, T. (2024, February 18). All You Need to Know about Gradient Boosting Algorithm − Part 1. Regression. Medium. https://towardsdatascience.com/all-you-need-to-know-about-gradient-boosting-algorithm-part-1-regression-2520a34a502
