<a href="https://colab.research.google.com/github/Vitor104/ads-machinelearningQ2/blob/main/Q6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Uma imobiliária deseja prever o valor de imóveis com base em características como localização,
número de quartos, tamanho do terreno, entre outros.
Tarefas:
- Utilize um dataset de preços de imóveis (exemplo: California Housing do Scikit-Learn).
- Aplique técnicas de feature engineering para melhorar o desempenho do modelo.
- Teste diferentes algoritmos de regressão, como Regressão Linear, XGBoost e Redes Neurais
Artificiais (ANNs).
- Avalie os modelos com métricas como RMSE e R².
Pergunta: Qual modelo teve menor erro de previsão? Como otimizar ainda mais o desempenho?

In [17]:
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.metrics import root_mean_squared_error
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

In [11]:
housing = fetch_california_housing(as_frame=True)
df = housing.frame

In [13]:
# Mostrar as 5 primeiras linhas
df.head(5)

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [14]:
# Fazendo a Feature Engineering
def feature_engineering(df):
    df['Households'] = df['Population'] / df['AveOccup']
    df['rooms_per_household'] = df['AveRooms'] / df['Households']
    df['population_per_household'] = df['Population'] / df['Households']
    df['bedrooms_per_room'] = df['AveBedrms'] / df['AveRooms']
    return df

In [15]:
df = feature_engineering(df)

In [16]:
# Mostrar as 5 primeiras linhas novamente
df.head(5)

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal,Households,rooms_per_household,population_per_household,bedrooms_per_room
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526,126.0,0.05543,2.555556,0.146591
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585,1138.0,0.005482,2.109842,0.155797
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521,177.0,0.046826,2.80226,0.129516
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413,219.0,0.026563,2.547945,0.184458
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422,259.0,0.024254,2.181467,0.172096


In [18]:
# Dividir os dados em conjunto de treino e teste
X = df.drop('MedHouseVal', axis=1)
y = df['MedHouseVal']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [19]:
# Usar a regressão linear
modelo_regressao = LinearRegression()
modelo_regressao.fit(X_train, y_train)
y_pred = modelo_regressao.predict(X_test)

In [21]:
# Usar XGBoost
modelo_xgboost = xgb.XGBRegressor()
modelo_xgboost.fit(X_train, y_train)
y_pred_xgboost = modelo_xgboost.predict(X_test)

In [23]:
# Avaliação de modelo - RMSE
rmse = root_mean_squared_error(y_test, y_pred)
rmse_xgboost = root_mean_squared_error(y_test, y_pred_xgboost)

In [24]:
print('RMSE Regressão Linear:', rmse)
print('RMSE XGBoost:', rmse_xgboost)

RMSE Regressão Linear: 0.6949532138542249
RMSE XGBoost: 0.4658451705074177


In [25]:
# Avaliação de modelo - R²
r2 = r2_score(y_test, y_pred)
r2_xgboost = r2_score(y_test, y_pred_xgboost)

In [26]:
print('R² Regressão Linear:', r2)
print('R² XGBoost:', r2_xgboost)

R² Regressão Linear: 0.631443329933038
R² XGBoost: 0.8343938980207516


In [27]:
# Resolução:
# Com 83%, o modelo com XGBoost teve um desempenho bem mais elevado do que o com R².
# E para o aumento da eficácia, o ideal seria aumentar os exemplos no dataset, fazendo o modelo aprender ainda mais e, consequentemente, acertar mais.