<a href="https://colab.research.google.com/github/GabrielSBotelho/Linear-Regression/blob/main/car_Price.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Autor: Gabriel de Sousa Botelho

Formação: Cientista da Computação pela UFC

Contato:
  
  📧 gabrielsbotelho12@gmail.com
  
  👉 [LinkedIn](https://www.linkedin.com/in/gabriel-botelhoo/)
  
  👉 [Github](https://github.com/GabrielSBotelho)


<h1>Projeto Regressão Linear</h1>

A regressão linear é um dos algoritmos mais utilizados na área de ciência de dados. Desta forma, este projeto tem como objetivo apronfundar os conhecimentos a respeito do modelo de regressão linear. Este projeto realiza a criação do modelo de regressão linear multivariada, ou seja, são utilizadas mais de uma variável independente para previr o valor da variável *target*. Como forma de aprofundamento nos algoritmos de *machine learning*, este projeto realiza a técnica de *cross validation*, que ajuda a entender a qualidade do modelo criado. A partir do *cross validation* é possível testar o modelo de regressão linar ao variar
  * o conjunto de dados de treino e teste, como forma de verificar se o modelo se mantém consistente mesmo com diferentes partes sendo usadas como treino e teste;
  * o modelo de regressão utilizado, neste projeto é testado o LassoLars;
  * as *features* utilizadas.

Ao final espera-se que seja obtido um modelo com uma performance satisfatória e um entendimento de qual modelo e *features* são melhores para a solução do problema.

--- 

## *Dataset* 

O conjunto de dados utilizado para esse projeto foi o [*Car Price Prediction*](https://www.kaggle.com/hellbuoy/car-price-prediction), que contém informações sobre as caracteristicas de carros e o seus valores. 


## Imports

In [55]:
# Importing libriries
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

In [56]:
path = '/content/drive/MyDrive/Data Science/Dados/Car Price/CarPrice_Assignment.csv'
dataframe = pd.read_csv(path)

In [57]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## Data Preparation

In [58]:
dataframe.head()

Unnamed: 0,car_ID,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,carwidth,carheight,curbweight,enginetype,cylindernumber,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,1,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,2,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,3,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,171.2,65.5,52.4,2823,ohcv,six,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,4,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,176.6,66.2,54.3,2337,ohc,four,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,5,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,176.6,66.4,54.3,2824,ohc,five,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


In [59]:
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   car_ID            205 non-null    int64  
 1   symboling         205 non-null    int64  
 2   CarName           205 non-null    object 
 3   fueltype          205 non-null    object 
 4   aspiration        205 non-null    object 
 5   doornumber        205 non-null    object 
 6   carbody           205 non-null    object 
 7   drivewheel        205 non-null    object 
 8   enginelocation    205 non-null    object 
 9   wheelbase         205 non-null    float64
 10  carlength         205 non-null    float64
 11  carwidth          205 non-null    float64
 12  carheight         205 non-null    float64
 13  curbweight        205 non-null    int64  
 14  enginetype        205 non-null    object 
 15  cylindernumber    205 non-null    object 
 16  enginesize        205 non-null    int64  
 1

In [60]:
dataframe.describe()

Unnamed: 0,car_ID,symboling,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
count,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0
mean,103.0,0.834146,98.756585,174.049268,65.907805,53.724878,2555.565854,126.907317,3.329756,3.255415,10.142537,104.117073,5125.121951,25.219512,30.75122,13276.710571
std,59.322565,1.245307,6.021776,12.337289,2.145204,2.443522,520.680204,41.642693,0.270844,0.313597,3.97204,39.544167,476.985643,6.542142,6.886443,7988.852332
min,1.0,-2.0,86.6,141.1,60.3,47.8,1488.0,61.0,2.54,2.07,7.0,48.0,4150.0,13.0,16.0,5118.0
25%,52.0,0.0,94.5,166.3,64.1,52.0,2145.0,97.0,3.15,3.11,8.6,70.0,4800.0,19.0,25.0,7788.0
50%,103.0,1.0,97.0,173.2,65.5,54.1,2414.0,120.0,3.31,3.29,9.0,95.0,5200.0,24.0,30.0,10295.0
75%,154.0,2.0,102.4,183.1,66.9,55.5,2935.0,141.0,3.58,3.41,9.4,116.0,5500.0,30.0,34.0,16503.0
max,205.0,3.0,120.9,208.1,72.3,59.8,4066.0,326.0,3.94,4.17,23.0,288.0,6600.0,49.0,54.0,45400.0


## Exploratory Data Analysis

In [61]:
dataframe.columns

Index(['car_ID', 'symboling', 'CarName', 'fueltype', 'aspiration',
       'doornumber', 'carbody', 'drivewheel', 'enginelocation', 'wheelbase',
       'carlength', 'carwidth', 'carheight', 'curbweight', 'enginetype',
       'cylindernumber', 'enginesize', 'fuelsystem', 'boreratio', 'stroke',
       'compressionratio', 'horsepower', 'peakrpm', 'citympg', 'highwaympg',
       'price'],
      dtype='object')

In [62]:
px.histogram(x=dataframe['fueltype'].unique(), 
             y=dataframe['fueltype'].value_counts(),
             title='Cars fuel type').update_layout(xaxis_title='Fuel Type', yaxis_title='Quantity per type')

In [63]:
px.histogram(x=dataframe['carbody'].unique(), 
             y=dataframe['carbody'].value_counts(),
             title='Car Body Type').update_layout(xaxis_title='Car Body Type', yaxis_title='Quantity per type')

In [64]:
px.histogram(x=dataframe['enginetype'].unique(), 
             y=dataframe['enginetype'].value_counts(),
             title='Car Engine Type').update_layout(xaxis_title='Car Engine Type', yaxis_title='Quantity per type')

In [65]:
# Selecting only the continuous variables
data = dataframe.drop(columns=['car_ID', 'symboling', 'CarName', 'aspiration', 'drivewheel', 'enginelocation', 'enginetype', 'cylindernumber', 'fuelsystem', 'fueltype', 'doornumber', 'carbody'], axis=1)
data.head()

Unnamed: 0,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,111,5000,21,27,13495.0
1,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,111,5000,21,27,16500.0
2,94.5,171.2,65.5,52.4,2823,152,2.68,3.47,9.0,154,5000,19,26,16500.0
3,99.8,176.6,66.2,54.3,2337,109,3.19,3.4,10.0,102,5500,24,30,13950.0
4,99.4,176.6,66.4,54.3,2824,136,3.19,3.4,8.0,115,5500,18,22,17450.0


In [66]:
data.shape

(205, 14)

In [67]:
data.describe()

Unnamed: 0,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
count,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0
mean,98.756585,174.049268,65.907805,53.724878,2555.565854,126.907317,3.329756,3.255415,10.142537,104.117073,5125.121951,25.219512,30.75122,13276.710571
std,6.021776,12.337289,2.145204,2.443522,520.680204,41.642693,0.270844,0.313597,3.97204,39.544167,476.985643,6.542142,6.886443,7988.852332
min,86.6,141.1,60.3,47.8,1488.0,61.0,2.54,2.07,7.0,48.0,4150.0,13.0,16.0,5118.0
25%,94.5,166.3,64.1,52.0,2145.0,97.0,3.15,3.11,8.6,70.0,4800.0,19.0,25.0,7788.0
50%,97.0,173.2,65.5,54.1,2414.0,120.0,3.31,3.29,9.0,95.0,5200.0,24.0,30.0,10295.0
75%,102.4,183.1,66.9,55.5,2935.0,141.0,3.58,3.41,9.4,116.0,5500.0,30.0,34.0,16503.0
max,120.9,208.1,72.3,59.8,4066.0,326.0,3.94,4.17,23.0,288.0,6600.0,49.0,54.0,45400.0


## Linear Regression Model

In [68]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [69]:
# Selecting features
x = data.drop(columns=['price']).values

In [70]:
# Selecting target
y = data['price'].values

In [71]:
# Dividing data in train and test
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, random_state=42)

In [72]:
# Standardization the data
scaler = StandardScaler()

x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)

In [73]:
# Creating the linear regression model
model = LinearRegression()
model.fit(x_train_scaled, y_train)

LinearRegression()

In [74]:
model.coef_

array([  512.81361609,  -951.69482247,  1612.78858662,   270.18690345,
         445.04183548,  4788.55444886,  -165.42871907,  -948.11075244,
        1070.89750016,   485.21461048,  1042.41652368, -2178.73411873,
         660.00007323])

In [75]:
model.intercept_

13408.5034965035

In [76]:
# Values predicted by the model
pred = model.predict(x_test_scaled)
pred

array([26306.39481293, 17869.05337976, 10315.49253762, 14184.60705675,
       25249.43237642,  6021.60477112,  8494.54215476,  7280.15399672,
       11912.05495491,  8834.26133415, 16527.38809285,  6360.29944744,
       16728.32596198,  9224.13556698, 40241.32895073,  5960.02255313,
       -3554.67545146, 15427.70058575, 10452.99255046, 11628.15814509,
       10869.36951186, 22493.82427958,  6301.65949576,   -84.93706776,
        6486.16027088, 26588.26356153, 13432.51631821, 17060.7984111 ,
        5697.08372496, 16814.36906232, 24807.99873453,  6467.35106126,
        6692.05074053, 24762.329656  ,  8608.90171314, 24598.08724223,
       11346.97491596,  9791.27734775,  4743.84135756, 15501.11653283,
        9764.78057974, 13334.32246758, 18745.30329835,  5736.49713057,
        6127.15518938, 10348.496847  ,  6467.35106126,  8861.34424835,
       17765.98698874, 15272.90825593,  4713.91761498, 21005.26779571,
        5169.14820467,  8731.58517361,  5685.96919201, 16010.683779  ,
      

<h4>Metrics of evaluation</h4>

In [77]:
mae = mean_absolute_error(y_test, pred)
mae

2616.768519801948

In [78]:
mse = mean_squared_error(y_test, pred, squared=False)
mse

3745.7074478716495

In [79]:
r2 = r2_score(y_test, pred)
r2

0.7974965704208908

In [80]:
px.scatter(x=y_test, 
           y=pred, 
           trendline='ols',
           trendline_color_override = 'red',
           title='Visualização da linha de regressão').update_layout(xaxis_title='Valores de teste reais', yaxis_title='Valores previstos no modelo')

## Cross Validation

In [81]:
from sklearn.model_selection import  cross_validate

<h3> Evalueting the data division </h3>

* Goal: Understand the model performance with differents parts of the train and test data

In [82]:
# Making the cross validation dividing the data in 4 parts
scores = cross_validate(model, x, y, cv=4,
                        scoring=('r2', 'neg_mean_absolute_error'),
                        return_train_score=True)

In [83]:
print("Mean absolute Errors:\n", -scores['test_neg_mean_absolute_error'])
print("\nScores:\n", scores['train_r2'])

Mean absolute Errors:
 [2991.74505393 2570.64165405 3623.90427332 2617.26058303]

Scores:
 [0.86373834 0.83248781 0.87568143 0.86938376]


In [84]:
print("Mean of all scores of the mean absolute errors metric: \n", -scores['test_neg_mean_absolute_error'].mean(), "\n")
print("Mean of all scores of the r2 score metric: \n", scores['train_r2'].mean())

Mean of all scores of the mean absolute errors metric: 
 2950.8878910831972 

Mean of all scores of the r2 score metric: 
 0.8603228323666832


> Ao dividir o conjunto de dados em 4 partes e variar essas partes como treino e teste, temos que o modelo obteve uma média de 86% de precisão *(r2 score)* e um erro absoluto médio de 2.950 (á mais ou menos do preço esperado).

<h3>Model Selection</h3>

* Goal: Understand which model has a better performance.

In [85]:
from sklearn.linear_model import LassoLars

In [86]:
lasso = LassoLars()
lasso.fit(x_train_scaled, y_train)


The default of 'normalize' will be set to False in version 1.2 and deprecated in version 1.4.
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LassoLars())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * np.sqrt(n_samples). 



LassoLars()

In [87]:
pred_lasso = lasso.predict(x_test_scaled)
pred_lasso

array([26224.78349641, 17814.60661217, 10341.32438608, 14245.1610016 ,
       24976.36614907,  6096.59575652,  8080.46378629,  7272.63008492,
       11909.81420557,  8409.68410861, 16521.86090967,  6550.14113752,
       16274.03231577,  9397.59894921, 40136.50515891,  5697.29369725,
       -3525.04606543, 15661.41678633, 10452.8978583 , 11637.44438529,
       10986.57495413, 22418.7046493 ,  6281.89552454,   249.73149793,
        6282.97816276, 26619.45789412, 13567.53267699, 17108.54487793,
        5883.19827723, 16913.54330237, 24629.66920738,  6270.10872137,
        6689.82673152, 24530.65543502,  8504.51838095, 24467.38245946,
       11466.30371303,  9739.02581112,  4797.79482218, 15710.94354106,
       10055.98010633, 12780.24243068, 18427.7230028 ,  5711.49101045,
        6192.87758088, 10285.60179729,  6270.10872137,  8779.27526904,
       17539.52273207, 15521.05579036,  4777.32071088, 21161.36511325,
        5301.99900823,  8950.01920328,  5875.59360732, 15922.59038666,
      

In [88]:
px.scatter(x=y_test, 
           y=pred_lasso, 
           trendline='ols',
           trendline_color_override = 'red',
           title='Visualização da linha de regressão').update_layout(xaxis_title='Valores de teste reais', yaxis_title='Valores previstos no modelo')

In [None]:
scores = cross_validate(lasso, x, y, cv=4,
                        scoring=('r2', 'neg_mean_absolute_error'),
                        return_train_score=True)

In [90]:
print("Mean absolute Errors:\n", -scores['test_neg_mean_absolute_error'], "\n")
print("R2 Score:\n", scores['train_r2'])

Mean absolute Errors:
 [2912.27265083 2603.01339454 3635.24418169 2622.53372006] 

R2 Score:
 [0.86324948 0.83207759 0.87525057 0.86900503]


In [91]:
print("Mean of all scores of the mean absolute errors metric: \n", -scores['test_neg_mean_absolute_error'].mean(), "\n")
print("Mean of all scores of the r2 score metric: \n", scores['train_r2'].mean())

Mean of all scores of the mean absolute errors metric: 
 2943.265986779795 

Mean of all scores of the r2 score metric: 
 0.8598956660121839


> Ao variar o modelo utilizado para o LassoLars, temos que o modelo LassoLars obteve um desempenho similar ao de Regressão Linear de acordo com a métrica *r2 score*, com uma diferença de apenas 0,1%. No erro absoluto médio o modelo LassoLars obteve um valor menor, em relação ao de Regressão Linear, variando em apenas 7 unidades. 

> Desta forma, temos que ambos os modelos para este problema estão com desempenho satisfatório, podendo ser escolhido qualquer um dos dois.

<h3>Features Selection</h3>

* Goal: Understand the model performance with categorical features that gonna be transformed into dummies.

In [92]:
data_new_features = dataframe.drop(columns=['car_ID', 'symboling', 'CarName', 'aspiration', 'drivewheel', 'enginelocation', 'enginetype', 'cylindernumber', 'fuelsystem'], axis=1)
data_new_features.head()

Unnamed: 0,fueltype,doornumber,carbody,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,gas,two,convertible,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,111,5000,21,27,13495.0
1,gas,two,convertible,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,111,5000,21,27,16500.0
2,gas,two,hatchback,94.5,171.2,65.5,52.4,2823,152,2.68,3.47,9.0,154,5000,19,26,16500.0
3,gas,four,sedan,99.8,176.6,66.2,54.3,2337,109,3.19,3.4,10.0,102,5500,24,30,13950.0
4,gas,four,sedan,99.4,176.6,66.4,54.3,2824,136,3.19,3.4,8.0,115,5500,18,22,17450.0


In [93]:
# Creating dummies for the categorical variables
data_new_features = pd.get_dummies(data_new_features, prefix_sep='_')
data_new_features.head()

Unnamed: 0,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price,fueltype_diesel,fueltype_gas,doornumber_four,doornumber_two,carbody_convertible,carbody_hardtop,carbody_hatchback,carbody_sedan,carbody_wagon
0,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,111,5000,21,27,13495.0,0,1,0,1,1,0,0,0,0
1,88.6,168.8,64.1,48.8,2548,130,3.47,2.68,9.0,111,5000,21,27,16500.0,0,1,0,1,1,0,0,0,0
2,94.5,171.2,65.5,52.4,2823,152,2.68,3.47,9.0,154,5000,19,26,16500.0,0,1,0,1,0,0,1,0,0
3,99.8,176.6,66.2,54.3,2337,109,3.19,3.4,10.0,102,5500,24,30,13950.0,0,1,1,0,0,0,0,1,0
4,99.4,176.6,66.4,54.3,2824,136,3.19,3.4,8.0,115,5500,18,22,17450.0,0,1,1,0,0,0,0,1,0


In [94]:
x = data_new_features.drop(columns=['price']).values
y = data_new_features['price'].values

In [95]:
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, random_state=42)

In [96]:
scaler_features = StandardScaler()

x_train_scaled = scaler_features.fit_transform(x_train)
x_test_scaled = scaler_features.transform(x_test)

In [97]:
linear_reg = LinearRegression()
linear_reg.fit(x_train_scaled, y_train)

LinearRegression()

In [98]:
predict = linear_reg.predict(x_test_scaled)
predict

array([25001.14781483, 18499.38072612, 10516.31156271, 13032.47066773,
       23060.88231895,  5232.36051622,  6893.66533911,  7994.01877965,
        8988.78705121,  8549.12960822, 16825.32915885,  5890.89388392,
       16769.03247515,  9616.44710505, 37864.15090084,  6086.42429427,
       -3082.89889623, 14335.48437258, 10629.96147753, 11613.70646462,
       11331.66754325, 22053.32915801, 13717.96105399,   263.03750964,
        7265.02756397, 25244.54063829, 19820.53217842, 14586.94715991,
        2922.40099812, 17071.23121   , 25007.81952329,  6188.42345687,
        5063.25756189, 24311.69561944,  8223.23015182, 31432.66607763,
       12222.03698848,  9780.32346928,  5906.4508931 , 14654.70702404,
        6796.24858477, 13281.63870549, 20483.46841227,  4269.84899317,
        6503.30014984,  9654.95544131,  6188.42345687,  7524.32530188,
       18396.05632027, 14118.94444985,  4509.45039335, 19955.52709543,
        4734.57729546, 10034.51165565,  2875.41349661, 15515.70840478,
      

In [99]:
px.scatter(x=y_test, 
           y=predict, 
           trendline='ols',
           trendline_color_override = 'red',
           title='Visualização da linha de regressão').update_layout(xaxis_title='Valores de teste reais', yaxis_title='Valores previstos no modelo')

In [100]:
mae = mean_absolute_error(y_test, predict)
mae

2961.0231378450853

In [101]:
mse = mean_squared_error(y_test, predict)
mse

18773528.064407237

In [102]:
r2 = r2_score(y_test, predict)
r2

0.7290366394171358

In [103]:
scores = cross_validate(linear_reg, x, y, cv=4,
                        scoring=('r2', 'neg_mean_absolute_error'),
                        return_train_score=True)

In [104]:
print("Mean absolute Errors:\n", -scores['test_neg_mean_absolute_error'], "\n")
print("R2 Score:\n", scores['train_r2'])

Mean absolute Errors:
 [3147.0764751  2803.79097208 3848.83801368 3049.77399641] 

R2 Score:
 [0.88711609 0.85898904 0.89260757 0.89915534]


In [105]:
print("Mean of all scores of the mean absolute errors metric: \n", -scores['test_neg_mean_absolute_error'].mean(), "\n")
print("Mean of all scores of the r2 score metric: \n", scores['train_r2'].mean())

Mean of all scores of the mean absolute errors metric: 
 3212.369864318553 

Mean of all scores of the r2 score metric: 
 0.884467011681027


> Foram utilizadas variáveis categóricas neste experimento do cross validation, como forma de testar o desempenho do modelo com novas *features*. O modelo testado foi o de Regressão Linear. Como resultado do *cross validation* temos que o erro absoluto médio foi de 3.212, sendo superior aos demais testes realizados. Já a média dos valores do *r2 score* foi de 88%, valor superior aos demais testes realizados. Tendo em vista que a diferença alcançada no *r2 score* não foi muito grande em relação aos demais testes e que a média do erro absoluto médio foi superior as demais, é possível concluir que a inclusão destas novas *features* não garantiram uma performance melhor para o problema.

Portanto, neste projeto foram realizados os seguintes passos
  
  * Criação do modelo de regressão linear
  * Predição dos valores e avaliação do modelo
  * *Cross Validation* variando
    * Dados de treino e teste
    * Modelo de regressão utilizado
    * Utilização de novas *features*

Ao final das analises, temos que o modelo de regressão linear e o *lasso lars* obtiveram resultados satisfatórios para o problema em questão. Assim como, a adição de novas *features* não garantiu um desempenho na resolução da problemática. Portanto, este projeto aplicou os conhecimentos em regressão linear e técnicas de validação do seu desempenho, garantindo assim um entendimento melhor do algoritmo, das suas variações e das formas de validar a solução encontrada. 