<a href="https://colab.research.google.com/github/Daleth-Barreto/Practica1_ML/blob/main/Practica1_ML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Predicción de precios de casas en Melbourne con Machine Learning

En este proyecto, se utiliza un modelo de Machine Learning para predecir el precio de las casas en Melbourne, Australia, utilizando datos del conjunto de datos "Melbourne Housing Snapshot". Se exploran diferentes algoritmos de regresión, como árboles de decisión y Random Forest, para encontrar el modelo con mejor rendimiento. Se realiza una limpieza y preprocesamiento de datos para asegurar la calidad del modelo. El objetivo es obtener un modelo preciso y robusto para la predicción de precios de casas en Melbourne.

In [1]:
!pip install kagglehub pandas matplotlib -q


## Librerías

In [8]:
import pandas as pd
import kagglehub
import matplotlib.pyplot as plt
import os

## Exploración de datos

In [10]:
def cargar_csv_kagglehub(enlace_kagglehub):
    path = kagglehub.dataset_download(enlace_kagglehub)
    print("✅ Dataset descargado en:", path)

    # Buscar primer CSV dentro de la carpeta descargada
    for root, dirs, files_in_dir in os.walk(path):
        for file in files_in_dir:
            if file.endswith(".csv"):
                full_path = os.path.join(root, file)
                print("📄 Cargando:", file)
                return pd.read_csv(full_path)

    print("❌ No se encontró ningún archivo CSV en el dataset.")
    return None


In [11]:
melb_df = cargar_csv_kagglehub("gunjanpathak/melb-data")
melb_df

✅ Dataset descargado en: /kaggle/input/melb-data
📄 Cargando: melb_data.csv


Unnamed: 0.1,Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,1,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,...,1.0,1.0,202.0,,,Yarra,-37.79960,144.99840,Northern Metropolitan,4019.0
1,2,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
2,4,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
3,5,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,...,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
4,6,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.80720,144.99410,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18391,23540,Williamstown,8/2 Thompson St,2,t,622500.0,SP,Greg,26/08/2017,6.8,...,2.0,1.0,,89.0,2010.0,,-37.86393,144.90484,Western Metropolitan,6380.0
18392,23541,Williamstown,96 Verdon St,4,h,2500000.0,PI,Sweeney,26/08/2017,6.8,...,1.0,5.0,866.0,157.0,1920.0,,-37.85908,144.89299,Western Metropolitan,6380.0
18393,23544,Yallambie,17 Amaroo Wy,4,h,1100000.0,S,Buckingham,26/08/2017,12.7,...,3.0,2.0,,,,,-37.72006,145.10547,Northern Metropolitan,1369.0
18394,23545,Yarraville,6 Agnes St,4,h,1285000.0,SP,Village,26/08/2017,6.3,...,1.0,1.0,362.0,112.0,1920.0,,-37.81188,144.88449,Western Metropolitan,6543.0


In [13]:
melb_df.describe()

Unnamed: 0.1,Unnamed: 0,Rooms,Price,Distance,Postcode,Bedroom2,Bathroom,Car,Landsize,BuildingArea,YearBuilt,Lattitude,Longtitude,Propertycount
count,18396.0,18396.0,18396.0,18395.0,18395.0,14927.0,14925.0,14820.0,13603.0,7762.0,8958.0,15064.0,15064.0,18395.0
mean,11826.787073,2.93504,1056697.0,10.389986,3107.140147,2.913043,1.538492,1.61552,558.116371,151.220219,1965.879996,-37.809849,144.996338,7517.975265
std,6800.710448,0.958202,641921.7,6.00905,95.000995,0.964641,0.689311,0.955916,3987.326586,519.188596,37.013261,0.081152,0.106375,4488.416599
min,1.0,1.0,85000.0,0.0,3000.0,0.0,0.0,0.0,0.0,0.0,1196.0,-38.18255,144.43181,249.0
25%,5936.75,2.0,633000.0,6.3,3046.0,2.0,1.0,1.0,176.5,93.0,1950.0,-37.8581,144.931193,4294.0
50%,11820.5,3.0,880000.0,9.7,3085.0,3.0,1.0,2.0,440.0,126.0,1970.0,-37.803625,145.00092,6567.0
75%,17734.25,3.0,1302000.0,13.3,3149.0,3.0,2.0,2.0,651.0,174.0,2000.0,-37.75627,145.06,10331.0
max,23546.0,12.0,9000000.0,48.1,3978.0,20.0,8.0,10.0,433014.0,44515.0,2018.0,-37.40853,145.52635,21650.0


## Limpieza de datos

In [15]:
melb_df = melb_df.dropna(axis=0)
melb_df

Unnamed: 0.1,Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
1,2,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,...,1.0,0.0,156.0,79.00,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
2,4,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,...,2.0,0.0,134.0,150.00,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
4,6,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,...,1.0,2.0,120.0,142.00,2014.0,Yarra,-37.80720,144.99410,Northern Metropolitan,4019.0
6,11,Abbotsford,124 Yarra St,3,h,1876000.0,S,Nelson,7/05/2016,2.5,...,2.0,0.0,245.0,210.00,1910.0,Yarra,-37.80240,144.99930,Northern Metropolitan,4019.0
7,14,Abbotsford,98 Charles St,2,h,1636000.0,S,Nelson,8/10/2016,2.5,...,1.0,2.0,256.0,107.00,1890.0,Yarra,-37.80600,144.99540,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15388,19732,Whittlesea,30 Sherwin St,3,h,601000.0,S,Ray,29/07/2017,35.5,...,2.0,1.0,972.0,149.00,1996.0,Whittlesea,-37.51232,145.13282,Northern Victoria,2170.0
15389,19733,Williamstown,75 Cecil St,3,h,1050000.0,VB,Williams,29/07/2017,6.8,...,1.0,0.0,179.0,115.00,1890.0,Hobsons Bay,-37.86558,144.90474,Western Metropolitan,6380.0
15390,19734,Williamstown,2/29 Dover Rd,1,u,385000.0,SP,Williams,29/07/2017,6.8,...,1.0,1.0,0.0,35.64,1967.0,Hobsons Bay,-37.85588,144.89936,Western Metropolitan,6380.0
15392,19736,Windsor,201/152 Peel St,2,u,560000.0,PI,hockingstuart,29/07/2017,4.6,...,1.0,1.0,0.0,61.60,2012.0,Stonnington,-37.85581,144.99025,Southern Metropolitan,4380.0


## Machine Learning

In [16]:
y = melb_df['Price'] #A predecir
x = melb_df[['Rooms', 'Bathroom', 'Lattitude', 'Longtitude']]# Para predecir

from sklearn.tree import DecisionTreeRegressor

melbourne_model = DecisionTreeRegressor() # Elegir modelo

melbourne_model.fit(x, y) # Entrenarlo


In [18]:
print("Predicciones para las primeras 5 casas: ")
print(x.head())
print("El precio real es: ")
print(y.head())
print("Las predicciones son: ")
print(melbourne_model.predict(x.head()))

Predicciones para las primeras 5 casas: 
   Rooms  Bathroom  Lattitude  Longtitude
1      2       1.0   -37.8079    144.9934
2      3       2.0   -37.8093    144.9944
4      4       1.0   -37.8072    144.9941
6      3       2.0   -37.8024    144.9993
7      2       1.0   -37.8060    144.9954
El precio real es: 
1    1035000.0
2    1465000.0
4    1600000.0
6    1876000.0
7    1636000.0
Name: Price, dtype: float64
Las predicciones son: 
[1035000. 1465000. 1600000. 1876000. 1636000.]


##MAE

El MAE es una métrica para ver el error absoluto que estamos cometiendo( Min Absolute Error)

In [19]:
from sklearn.metrics import mean_absolute_error

predicted_home_prices = melbourne_model.predict(x)
mean_absolute_error(y,predicted_home_prices)


1436.24919302776

# Validación

In [27]:
from sklearn.model_selection import train_test_split
train_X, val_X, train_y, val_y = train_test_split(x, y)

melbourne_model=DecisionTreeRegressor()
melbourne_model.fit(train_X, train_y)

val_predictions = melbourne_model.predict(val_X)
print(mean_absolute_error(val_y, val_predictions))


286707.61200774694


In [28]:
 def get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y):
    model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes)
    model.fit(train_X,train_y)
    preds_val = model.predict(val_X)
    mae = mean_absolute_error(val_y, preds_val)
    return(mae)

In [30]:
for a in [5, 50, 500, 5000]:
    my_mae = get_mae(a,train_X, val_X, train_y, val_y)
    print("Max leaf nodes: %d \t\t Mean Absolute Error: %d" %(a, my_mae))

Max leaf nodes: 5 		 Mean Absolute Error: 378460
Max leaf nodes: 50 		 Mean Absolute Error: 274294
Max leaf nodes: 500 		 Mean Absolute Error: 262283
Max leaf nodes: 5000 		 Mean Absolute Error: 292264


In [32]:
from sklearn.ensemble import RandomForestRegressor

forest_model = RandomForestRegressor()
forest_model.fit(train_X,train_y)
melb_preds = forest_model.predict(val_X)
print(mean_absolute_error(val_y, melb_preds))


237031.25211073196
