<a href="https://colab.research.google.com/github/David-Gentil/Regressao-Linear_DNC/blob/main/Machine_Learning_Car_Price.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introdução

Missão: desenvolver um modelo de machine learning para prever os preços de automóveis com base em várias variáveis explicativas.



*   Temos uma base de dados sobre automóveis, com várias características (variáveis explicativas) como ano, quilometragem, potência do motor, etc.
*   Queremos prever o preço de cada automóvel (variável target) com base nessas características.
*   O objetivo é desenvolver um modelo de regressão, já que a variável target (preço) é contínua.

##Importação dos Dados

In [149]:
#Importando Bibiliotecas

import pandas as pd
import numpy as np
import seaborn as sns
from scipy import stats
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn.preprocessing import LabelEncoder #Para transformar dados categóricos em números
from sklearn.preprocessing import MinMaxScaler #Para normalização dos dados (variáveis com valores muito destintos entre colunas)
from sklearn.preprocessing import StandardScaler #Padronize as caraterísticas removendo a média e escalando para a variância da unidade
import statsmodels.formula.api as smf

In [150]:
#Montando drive do Google

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [151]:
#Carregando e lendo base de dados

preço_carros = pd.read_csv('/content/drive/MyDrive/RID187211_Módulo_06/CarPrice_Assignment.csv')

preço_carros.head()

Unnamed: 0,car_ID,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,1,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,2,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,3,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,4,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,5,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


##Análise Exploratória

In [152]:
#Descrevendo dados
preço_carros.describe()

Unnamed: 0,car_ID,symboling,wheelbase,carlength,carwidth,carheight,curbweight,enginesize,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
count,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0
mean,103.0,0.834146,98.756585,174.049268,65.907805,53.724878,2555.565854,126.907317,3.329756,3.255415,10.142537,104.117073,5125.121951,25.219512,30.75122,13276.710571
std,59.322565,1.245307,6.021776,12.337289,2.145204,2.443522,520.680204,41.642693,0.270844,0.313597,3.97204,39.544167,476.985643,6.542142,6.886443,7988.852332
min,1.0,-2.0,86.6,141.1,60.3,47.8,1488.0,61.0,2.54,2.07,7.0,48.0,4150.0,13.0,16.0,5118.0
25%,52.0,0.0,94.5,166.3,64.1,52.0,2145.0,97.0,3.15,3.11,8.6,70.0,4800.0,19.0,25.0,7788.0
50%,103.0,1.0,97.0,173.2,65.5,54.1,2414.0,120.0,3.31,3.29,9.0,95.0,5200.0,24.0,30.0,10295.0
75%,154.0,2.0,102.4,183.1,66.9,55.5,2935.0,141.0,3.58,3.41,9.4,116.0,5500.0,30.0,34.0,16503.0
max,205.0,3.0,120.9,208.1,72.3,59.8,4066.0,326.0,3.94,4.17,23.0,288.0,6600.0,49.0,54.0,45400.0


In [153]:
#Informações sobre a base de dados
preço_carros.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   car_ID            205 non-null    int64  
 1   symboling         205 non-null    int64  
 2   CarName           205 non-null    object 
 3   fueltype          205 non-null    object 
 4   aspiration        205 non-null    object 
 5   doornumber        205 non-null    object 
 6   carbody           205 non-null    object 
 7   drivewheel        205 non-null    object 
 8   enginelocation    205 non-null    object 
 9   wheelbase         205 non-null    float64
 10  carlength         205 non-null    float64
 11  carwidth          205 non-null    float64
 12  carheight         205 non-null    float64
 13  curbweight        205 non-null    int64  
 14  enginetype        205 non-null    object 
 15  cylindernumber    205 non-null    object 
 16  enginesize        205 non-null    int64  
 1

In [154]:
#Identificando Outliers pelo método Z-Score

def detecta_outliers_zscore(dados, threshold=3):
    """Detecta outliers usando o método Z-score."""
    z_scores = stats.zscore(dados)
    outliers = [x for i, x in enumerate(dados) if abs(z_scores[i]) > threshold]
    return outliers

# Exemplo de uso
dados = preço_carros['price']
outliers = detecta_outliers_zscore(dados)
print("Outliers:", outliers)


Outliers: [41315.0, 40960.0, 45400.0]


In [155]:
#Removendo coluna car_ID e CarName
preço_carros = preço_carros.drop(['car_ID', 'CarName'], axis=1)

###Análise inicial

*   Temos dados categóricos que precisamos transformar em numéricos;
*   Temos escalas bem diferentes nos dados numéricos;
*   Temos 205 linhas, sem dados nulos;
*   Temos 3 valores de preços fora do intervalo padrão (outlier), porém iremos identificar se essa informação é relevante.

##Pré-processamento

In [156]:
#Transformando dados categóricos em numéricos
le = LabelEncoder()

preço_carros['fueltype'] = le.fit_transform(preço_carros['fueltype'])
preço_carros['aspiration'] = le.fit_transform(preço_carros['aspiration'])
preço_carros['doornumber'] = le.fit_transform(preço_carros['doornumber'])
preço_carros['carbody'] = le.fit_transform(preço_carros['carbody'])
preço_carros['drivewheel'] = le.fit_transform(preço_carros['drivewheel'])
preço_carros['enginelocation'] = le.fit_transform(preço_carros['enginelocation'])
preço_carros['enginetype'] = le.fit_transform(preço_carros['enginetype'])
preço_carros['cylindernumber'] = le.fit_transform(preço_carros['cylindernumber'])
preço_carros['fuelsystem'] = le.fit_transform(preço_carros['fuelsystem'])

In [157]:
#Plotando a base tranformada
preço_carros.head()

Unnamed: 0,symboling,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,carwidth,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,3,1,0,1,0,2,0,88.6,168.8,64.1,...,130,5,3.47,2.68,9.0,111,5000,21,27,13495.0
1,3,1,0,1,0,2,0,88.6,168.8,64.1,...,130,5,3.47,2.68,9.0,111,5000,21,27,16500.0
2,1,1,0,1,2,2,0,94.5,171.2,65.5,...,152,5,2.68,3.47,9.0,154,5000,19,26,16500.0
3,2,1,0,0,3,1,0,99.8,176.6,66.2,...,109,5,3.19,3.4,10.0,102,5500,24,30,13950.0
4,2,1,0,0,3,0,0,99.4,176.6,66.4,...,136,5,3.19,3.4,8.0,115,5500,18,22,17450.0


In [158]:
#plotando a descrição da base
preço_carros.describe()

Unnamed: 0,symboling,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,carwidth,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
count,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,...,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0
mean,0.834146,0.902439,0.180488,0.439024,2.614634,1.326829,0.014634,98.756585,174.049268,65.907805,...,126.907317,3.253659,3.329756,3.255415,10.142537,104.117073,5125.121951,25.219512,30.75122,13276.710571
std,1.245307,0.297446,0.385535,0.497483,0.859081,0.556171,0.120377,6.021776,12.337289,2.145204,...,41.642693,2.013204,0.270844,0.313597,3.97204,39.544167,476.985643,6.542142,6.886443,7988.852332
min,-2.0,0.0,0.0,0.0,0.0,0.0,0.0,86.6,141.1,60.3,...,61.0,0.0,2.54,2.07,7.0,48.0,4150.0,13.0,16.0,5118.0
25%,0.0,1.0,0.0,0.0,2.0,1.0,0.0,94.5,166.3,64.1,...,97.0,1.0,3.15,3.11,8.6,70.0,4800.0,19.0,25.0,7788.0
50%,1.0,1.0,0.0,0.0,3.0,1.0,0.0,97.0,173.2,65.5,...,120.0,5.0,3.31,3.29,9.0,95.0,5200.0,24.0,30.0,10295.0
75%,2.0,1.0,0.0,1.0,3.0,2.0,0.0,102.4,183.1,66.9,...,141.0,5.0,3.58,3.41,9.4,116.0,5500.0,30.0,34.0,16503.0
max,3.0,1.0,1.0,1.0,4.0,2.0,1.0,120.9,208.1,72.3,...,326.0,7.0,3.94,4.17,23.0,288.0,6600.0,49.0,54.0,45400.0


In [159]:
#Verificando informação da base
preço_carros.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 24 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   symboling         205 non-null    int64  
 1   fueltype          205 non-null    int64  
 2   aspiration        205 non-null    int64  
 3   doornumber        205 non-null    int64  
 4   carbody           205 non-null    int64  
 5   drivewheel        205 non-null    int64  
 6   enginelocation    205 non-null    int64  
 7   wheelbase         205 non-null    float64
 8   carlength         205 non-null    float64
 9   carwidth          205 non-null    float64
 10  carheight         205 non-null    float64
 11  curbweight        205 non-null    int64  
 12  enginetype        205 non-null    int64  
 13  cylindernumber    205 non-null    int64  
 14  enginesize        205 non-null    int64  
 15  fuelsystem        205 non-null    int64  
 16  boreratio         205 non-null    float64
 1

In [160]:
#Fazendo normalização dos dados
scaler = MinMaxScaler()

preço_carros_norm = pd.DataFrame(scaler.fit_transform(preço_carros), index=preço_carros.index, columns=preço_carros.columns)

In [161]:
#Descrevendo dados normalizados
preço_carros_norm.describe()

Unnamed: 0,symboling,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,carwidth,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
count,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,...,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0
mean,0.566829,0.902439,0.180488,0.439024,0.653659,0.663415,0.014634,0.354419,0.49178,0.467317,...,0.248707,0.464808,0.564111,0.564483,0.196409,0.233821,0.398009,0.339431,0.38819,0.20254
std,0.249061,0.297446,0.385535,0.497483,0.21477,0.278085,0.120377,0.175562,0.184139,0.178767,...,0.157142,0.287601,0.19346,0.149332,0.248253,0.164767,0.194688,0.181726,0.181222,0.198323
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.4,1.0,0.0,0.0,0.5,0.5,0.0,0.230321,0.376119,0.316667,...,0.135849,0.142857,0.435714,0.495238,0.1,0.091667,0.265306,0.166667,0.236842,0.066283
50%,0.6,1.0,0.0,0.0,0.75,0.5,0.0,0.303207,0.479104,0.433333,...,0.222642,0.714286,0.55,0.580952,0.125,0.195833,0.428571,0.305556,0.368421,0.128519
75%,0.8,1.0,0.0,1.0,0.75,1.0,0.0,0.460641,0.626866,0.55,...,0.301887,0.714286,0.742857,0.638095,0.15,0.283333,0.55102,0.472222,0.473684,0.282632
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [162]:
#AJustando escala dos dados
scaler = StandardScaler()

preço_carros_norm = pd.DataFrame(scaler.fit_transform(preço_carros_norm), index=preço_carros_norm.index, columns=preço_carros_norm.columns)

In [163]:
#Descrevendo dados ajustados
preço_carros_norm.describe()

Unnamed: 0,symboling,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,carwidth,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
count,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,...,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0,205.0
mean,-1.039819e-16,-7.79864e-17,6.282238e-17,1.039819e-16,-1.64638e-16,-1.12647e-16,-8.665155e-18,1.299773e-16,0.0,-3.011141e-16,...,-1.516402e-16,-6.932124000000001e-17,-1.386425e-16,8.145246e-16,-2.5995470000000002e-17,1.64638e-16,3.4660620000000004e-17,2.209615e-16,-2.426243e-16,4.332578e-17
std,1.002448,1.002448,1.002448,1.002448,1.002448,1.002448,1.002448,1.002448,1.002448,1.002448,...,1.002448,1.002448,1.002448,1.002448,1.002448,1.002448,1.002448,1.002448,1.002448,1.002448
min,-2.281433,-3.041381,-0.4692953,-0.8846517,-3.050975,-2.391492,-0.1218667,-2.023713,-2.677244,-2.620512,...,-1.586561,-1.620116,-2.923049,-3.789311,-0.7931011,-1.422573,-2.049347,-1.872388,-2.14731,-1.023762
25%,-0.6714717,0.328798,-0.4692953,-0.8846517,-0.7172069,-0.5890807,-0.1218667,-0.7085959,-0.629655,-0.8447824,...,-0.7199469,-1.122179,-0.6653141,-0.4648342,-0.3892993,-0.8648707,-0.6832865,-0.9530117,-0.8371954,-0.6887281
50%,0.133509,0.328798,-0.4692953,-0.8846517,0.4496773,-0.5890807,-0.1218667,-0.2924196,-0.069006,-0.1905661,...,-0.1662771,0.8695675,-0.07312136,0.110556,-0.2883489,-0.2311186,0.1573661,-0.186865,-0.1093538,-0.3741476
75%,0.9384897,0.328798,-0.4692953,1.130388,0.4496773,1.21333,-0.1218667,0.606521,0.735404,0.4636501,...,0.3392475,0.8695675,0.9262039,0.4941494,-0.1873985,0.3012332,0.7878555,0.7325109,0.4729195,0.4048375
max,1.74347,0.328798,2.130854,1.130388,1.616562,1.21333,8.205689,3.686225,2.766741,2.987056,...,4.792679,1.865441,2.258638,2.923575,3.244916,4.661448,3.09965,3.643868,3.384286,4.030858


In [164]:
#Plotando o modelo MRLS
function = 'price~symboling+fueltype+aspiration+doornumber+carbody+drivewheel+enginelocation+wheelbase+carlength+carwidth+carheight+curbweight+enginetype+cylindernumber+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm+citympg+highwaympg'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                  price   R-squared:                       0.880
Model:                            OLS   Adj. R-squared:                  0.865
Method:                 Least Squares   F-statistic:                     57.84
Date:                Sun, 30 Mar 2025   Prob (F-statistic):           5.74e-71
Time:                        17:43:20   Log-Likelihood:                -73.347
No. Observations:                 205   AIC:                             194.7
Df Residuals:                     181   BIC:                             274.4
Df Model:                          23                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------
Intercept         4.707e-17      0.026  

In [165]:
#Plotando o modelo MRLS sem o intercept
function = 'price~symboling+fueltype+aspiration+doornumber+carbody+drivewheel+enginelocation+wheelbase+carlength+carwidth+carheight+curbweight+enginetype+cylindernumber+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm+citympg+highwaympg-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.880
Model:                            OLS   Adj. R-squared (uncentered):              0.865
Method:                 Least Squares   F-statistic:                              58.16
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    2.10e-71
Time:                        17:43:20   Log-Likelihood:                         -73.347
No. Observations:                 205   AIC:                                      192.7
Df Residuals:                     182   BIC:                                      269.1
Df Model:                          23                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [166]:
#Retirando aspiration do modelo
function = 'price~symboling+fueltype+doornumber+carbody+drivewheel+enginelocation+wheelbase+carlength+carwidth+carheight+curbweight+enginetype+cylindernumber+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm+citympg+highwaympg-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.880
Model:                            OLS   Adj. R-squared (uncentered):              0.866
Method:                 Least Squares   F-statistic:                              61.12
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    2.71e-72
Time:                        17:43:20   Log-Likelihood:                         -73.374
No. Observations:                 205   AIC:                                      190.7
Df Residuals:                     183   BIC:                                      263.9
Df Model:                          22                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [167]:
#Retirando cylindernumber do modelo
function = 'price~symboling+fueltype+doornumber+carbody+drivewheel+enginelocation+wheelbase+carlength+carwidth+carheight+curbweight+enginetype+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm+citympg+highwaympg-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.880
Model:                            OLS   Adj. R-squared (uncentered):              0.867
Method:                 Least Squares   F-statistic:                              64.37
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    3.41e-73
Time:                        17:43:20   Log-Likelihood:                         -73.399
No. Observations:                 205   AIC:                                      188.8
Df Residuals:                     184   BIC:                                      258.6
Df Model:                          21                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [168]:
#Retirando citympg do modelo
function = 'price~symboling+fueltype+doornumber+carbody+drivewheel+enginelocation+wheelbase+carlength+carwidth+carheight+curbweight+enginetype+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm+highwaympg-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.880
Model:                            OLS   Adj. R-squared (uncentered):              0.867
Method:                 Least Squares   F-statistic:                              67.75
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    5.15e-74
Time:                        17:43:20   Log-Likelihood:                         -73.664
No. Observations:                 205   AIC:                                      187.3
Df Residuals:                     185   BIC:                                      253.8
Df Model:                          20                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [169]:
#Retirando highwaympg do modelo
function = 'price~symboling+fueltype+doornumber+carbody+drivewheel+enginelocation+wheelbase+carlength+carwidth+carheight+curbweight+enginetype+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.880
Model:                            OLS   Adj. R-squared (uncentered):              0.868
Method:                 Least Squares   F-statistic:                              71.66
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    6.28e-75
Time:                        17:43:20   Log-Likelihood:                         -73.716
No. Observations:                 205   AIC:                                      185.4
Df Residuals:                     186   BIC:                                      248.6
Df Model:                          19                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [170]:
#Retirando carlength do modelo
function = 'price~symboling+fueltype+doornumber+carbody+drivewheel+enginelocation+wheelbase+carwidth+carheight+curbweight+enginetype+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.880
Model:                            OLS   Adj. R-squared (uncentered):              0.868
Method:                 Least Squares   F-statistic:                              75.94
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    8.00e-76
Time:                        17:43:20   Log-Likelihood:                         -73.851
No. Observations:                 205   AIC:                                      183.7
Df Residuals:                     187   BIC:                                      243.5
Df Model:                          18                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [171]:
#Retirando symboling do modelo
function = 'price~fueltype+doornumber+carbody+drivewheel+enginelocation+wheelbase+carwidth+carheight+curbweight+enginetype+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.879
Model:                            OLS   Adj. R-squared (uncentered):              0.868
Method:                 Least Squares   F-statistic:                              80.62
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    1.09e-76
Time:                        17:43:21   Log-Likelihood:                         -74.091
No. Observations:                 205   AIC:                                      182.2
Df Residuals:                     188   BIC:                                      238.7
Df Model:                          17                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [172]:
#Retirando doornumber do modelo
function = 'price~fueltype+carbody+drivewheel+enginelocation+wheelbase+carwidth+carheight+curbweight+enginetype+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.879
Model:                            OLS   Adj. R-squared (uncentered):              0.869
Method:                 Least Squares   F-statistic:                              85.72
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    1.68e-77
Time:                        17:43:21   Log-Likelihood:                         -74.506
No. Observations:                 205   AIC:                                      181.0
Df Residuals:                     189   BIC:                                      234.2
Df Model:                          16                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [173]:
#Retirando wheelbase do modelo
function = 'price~fueltype+carbody+drivewheel+enginelocation+carwidth+carheight+curbweight+enginetype+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.878
Model:                            OLS   Adj. R-squared (uncentered):              0.869
Method:                 Least Squares   F-statistic:                              91.31
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    2.98e-78
Time:                        17:43:21   Log-Likelihood:                         -75.104
No. Observations:                 205   AIC:                                      180.2
Df Residuals:                     190   BIC:                                      230.1
Df Model:                          15                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [174]:
#Retirando enginetype do modelo
function = 'price~fueltype+carbody+drivewheel+enginelocation+carwidth+carheight+curbweight+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.878
Model:                            OLS   Adj. R-squared (uncentered):              0.869
Method:                 Least Squares   F-statistic:                              97.82
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    4.60e-79
Time:                        17:43:21   Log-Likelihood:                         -75.585
No. Observations:                 205   AIC:                                      179.2
Df Residuals:                     191   BIC:                                      225.7
Df Model:                          14                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [175]:
#Retirando fuelsystem do modelo
function = 'price~fueltype+carbody+drivewheel+enginelocation+carwidth+carheight+curbweight+enginesize+boreratio+stroke+compressionratio+horsepower+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.877
Model:                            OLS   Adj. R-squared (uncentered):              0.868
Method:                 Least Squares   F-statistic:                              105.1
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    8.00e-80
Time:                        17:43:21   Log-Likelihood:                         -76.234
No. Observations:                 205   AIC:                                      178.5
Df Residuals:                     192   BIC:                                      221.7
Df Model:                          13                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [176]:
#Retirando horsepower do modelo
function = 'price~fueltype+carbody+drivewheel+enginelocation+carwidth+carheight+curbweight+enginesize+boreratio+stroke+compressionratio+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.876
Model:                            OLS   Adj. R-squared (uncentered):              0.868
Method:                 Least Squares   F-statistic:                              113.6
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    1.38e-80
Time:                        17:43:21   Log-Likelihood:                         -76.915
No. Observations:                 205   AIC:                                      177.8
Df Residuals:                     193   BIC:                                      217.7
Df Model:                          12                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [177]:
#Retirando carheight do modelo
function = 'price~fueltype+carbody+drivewheel+enginelocation+carwidth+curbweight+enginesize+boreratio+stroke+compressionratio+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.875
Model:                            OLS   Adj. R-squared (uncentered):              0.868
Method:                 Least Squares   F-statistic:                              123.1
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    3.32e-81
Time:                        17:43:21   Log-Likelihood:                         -77.992
No. Observations:                 205   AIC:                                      178.0
Df Residuals:                     194   BIC:                                      214.5
Df Model:                          11                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [178]:
#Retirando carbody do modelo
function = 'price~fueltype+drivewheel+enginelocation+carwidth+curbweight+enginesize+boreratio+stroke+compressionratio+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.874
Model:                            OLS   Adj. R-squared (uncentered):              0.867
Method:                 Least Squares   F-statistic:                              134.9
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    5.86e-82
Time:                        17:43:21   Log-Likelihood:                         -78.784
No. Observations:                 205   AIC:                                      177.6
Df Residuals:                     195   BIC:                                      210.8
Df Model:                          10                                                  
Covariance Type:            nonrobust                                                  
                       coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------

In [179]:
#Retirando compressionratio do modelo
function = 'price~fueltype+drivewheel+enginelocation+carwidth+curbweight+enginesize+boreratio+stroke+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.872
Model:                            OLS   Adj. R-squared (uncentered):              0.866
Method:                 Least Squares   F-statistic:                              147.9
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    2.20e-82
Time:                        17:43:21   Log-Likelihood:                         -80.420
No. Observations:                 205   AIC:                                      178.8
Df Residuals:                     196   BIC:                                      208.7
Df Model:                           9                                                  
Covariance Type:            nonrobust                                                  
                     coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------

In [180]:
#Retirando fueltype do modelo
function = 'price~drivewheel+enginelocation+carwidth+curbweight+enginesize+boreratio+stroke+peakrpm-1'
model = smf.ols(formula=function, data=preço_carros_norm).fit()
print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                  price   R-squared (uncentered):                   0.870
Model:                            OLS   Adj. R-squared (uncentered):              0.865
Method:                 Least Squares   F-statistic:                              165.4
Date:                Sun, 30 Mar 2025   Prob (F-statistic):                    4.41e-83
Time:                        17:43:21   Log-Likelihood:                         -81.452
No. Observations:                 205   AIC:                                      178.9
Df Residuals:                     197   BIC:                                      205.5
Df Model:                           8                                                  
Covariance Type:            nonrobust                                                  
                     coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------

##Modelagem

In [181]:
#Definindo variáveis

x = preço_carros_norm[['drivewheel', 'enginelocation', 'carwidth', 'curbweight', 'enginesize', 'boreratio', 'stroke', 'peakrpm']]
y = preço_carros_norm['price']

In [182]:
#Separando dados de treino e teste
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

#Criando um Objeto de Regressão Linear
lr = LinearRegression()

#Treinando o Modelo
lr.fit(x_train,y_train)

In [183]:
#Definindo R²
r_sqr = lr.score(x,y)
print('R²:',r_sqr)

R²: 0.8688741197982512


In [184]:
#Determinando as métricas de treino
y_pred_train = lr.predict(x_train)
print('MAE:', metrics.mean_absolute_error(y_train, y_pred_train))
print('MSE:', metrics.mean_squared_error(y_train, y_pred_train))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_train, y_pred_train)))

MAE: 0.24726456455216866
MSE: 0.1099142187229759
RMSE: 0.3315331336729044


In [185]:
#Determinando as métricas de teste
y_pred = lr.predict(x_test)
print('MAE:', metrics.mean_absolute_error(y_test, y_pred))
print('MSE:', metrics.mean_squared_error(y_test, y_pred))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

MAE: 0.3128896942202744
MSE: 0.21597252611684048
RMSE: 0.4647284434127531
