

## California Housing Data

El conjunto de datos "California Housing" contiene información sobre el precio medio de las casas en varios distritos de California, así como información relacionada con las características socioeconómicas y geográficas de cada distrito.

Atributos:
<ul>
<li>Latitude: Latitud geográfica del centro del distrito (numérico).</li>
<li>Longitude: Longitud geográfica del centro del distrito (numérico).</li>
<li>House_median_age: Mediana de la edad de las viviendas en el distrito (numérico).</li>
<li>Total_rooms: Total de habitaciones por vivienda en el distrito (numérico).</li>
<li>Total_bedrooms: Total de dormitorios por vivienda en el distrito (numérico).</li>
<li>Population: Población total en el distrito (numérico).</li>
<li>Households: Número promedio de personas por vivienda en el distrito (numérico).</li>
<li>Median_income: Mediana del ingreso de los hogares en el distrito (numérico).</li>
<li>Median_house_value: clase objetivo. Mediana del valor de las viviendas en el distrito (numérico).</li>
<li>Ocean_proximity: localización de la vivienda. (categórico)</li>
</ul>

Se puede obtener en https://www.kaggle.com/datasets/camnugent/california-housing-prices

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from google.colab import drive
from sklearn.preprocessing import LabelEncoder
from keras.optimizers import SGD, Adam
from tensorflow.keras.regularizers import l1, l2
from tensorflow.keras.initializers import HeNormal
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.layers import Dense, Conv1D, Flatten, MaxPooling1D
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.losses import Huber
from tensorflow.keras.metrics import RootMeanSquaredError

In [2]:
drive.mount('/content/drive')
path = '/content/drive/MyDrive/Colab Notebooks/proyecto-tensorflow/buenos/housing.csv'

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
# Cargar el conjunto de datos
housing=pd.read_csv(path,sep=",", header=0)

In [4]:
# Identificar filas con valores nulos
rows_with_nans = housing[housing.isnull().any(axis=1)]
print(rows_with_nans)

# Reemplazar valores faltantes
housing = housing.dropna()

       longitude  latitude  housing_median_age  total_rooms  total_bedrooms  \
290      -122.16     37.77                47.0       1256.0             NaN   
341      -122.17     37.75                38.0        992.0             NaN   
538      -122.28     37.78                29.0       5154.0             NaN   
563      -122.24     37.75                45.0        891.0             NaN   
696      -122.10     37.69                41.0        746.0             NaN   
...          ...       ...                 ...          ...             ...   
20267    -119.19     34.20                18.0       3620.0             NaN   
20268    -119.18     34.19                19.0       2393.0             NaN   
20372    -118.88     34.17                15.0       4260.0             NaN   
20460    -118.75     34.29                17.0       5512.0             NaN   
20484    -118.72     34.28                17.0       3051.0             NaN   

       population  households  median_income  media

In [5]:
housing.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY


In [6]:
# Ver los nombres de las columnas
print(housing.columns)

Index(['longitude', 'latitude', 'housing_median_age', 'total_rooms',
       'total_bedrooms', 'population', 'households', 'median_income',
       'median_house_value', 'ocean_proximity'],
      dtype='object')


In [7]:
# Ver las categorías únicas en la columna 'ocean_proximity'
print(housing['ocean_proximity'].unique())

['NEAR BAY' '<1H OCEAN' 'INLAND' 'NEAR OCEAN' 'ISLAND']


In [8]:
# Crear una instancia de LabelEncoder
label_encoder = LabelEncoder()

# Ajustar y transformar la columna 'ocean_proximity'
housing['ocean_proximity_encoded'] = label_encoder.fit_transform(housing['ocean_proximity'])

# Mostrar el DataFrame resultante con la nueva columna codificada
print(housing[['ocean_proximity', 'ocean_proximity_encoded']].head())

  ocean_proximity  ocean_proximity_encoded
0        NEAR BAY                        3
1        NEAR BAY                        3
2        NEAR BAY                        3
3        NEAR BAY                        3
4        NEAR BAY                        3


In [9]:
# Eliminar la columna antigua 'ocean_proximity'
housing.drop('ocean_proximity', axis=1, inplace=True)

In [10]:
housing

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity_encoded
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,3
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,3
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,3
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,3
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,3
...,...,...,...,...,...,...,...,...,...,...
20635,-121.09,39.48,25.0,1665.0,374.0,845.0,330.0,1.5603,78100.0,1
20636,-121.21,39.49,18.0,697.0,150.0,356.0,114.0,2.5568,77100.0,1
20637,-121.22,39.43,17.0,2254.0,485.0,1007.0,433.0,1.7000,92300.0,1
20638,-121.32,39.43,18.0,1860.0,409.0,741.0,349.0,1.8672,84700.0,1


In [11]:
# División de características y etiquetas
y = housing['median_house_value']
X = housing.drop(columns=['median_house_value'])

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [13]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [14]:
input_shape=X_train_scaled.shape[1]

In [15]:
# Crear un modelo secuencial
model = Sequential([
    Dense(64, activation='relu', input_shape=(input_shape,)),
    Dense(32, activation='relu'),
    Dense(1)
])

In [16]:
optimizer = Adam(learning_rate=0.001)

In [17]:
# Compilar el modelo
model.compile(optimizer=optimizer, loss='mean_squared_error', metrics=['mean_absolute_error'])

In [18]:
# Resumen del modelo
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 64)                640       
                                                                 
 dense_1 (Dense)             (None, 32)                2080      
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
Total params: 2753 (10.75 KB)
Trainable params: 2753 (10.75 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [19]:
# Entrenar el modelo
history = model.fit(X_train_scaled, y_train, epochs=20, batch_size=16, validation_split=0.2)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [20]:
# Evaluar el modelo en el conjunto de prueba
test_loss, test_mae = model.evaluate(X_test, y_test)
print(f'Test Mean Absolute Error: {test_mae}')


Test Mean Absolute Error: 16590821.0


## Parámetros avanzados


In [21]:
# Definir la arquitectura del modelo
model = Sequential([
    Dense(128, activation='relu', input_shape=(X_train.shape[1],),
          kernel_initializer=HeNormal(), kernel_regularizer=l1(0.01)),
    Dense(64, activation='relu', kernel_initializer=HeNormal(), kernel_regularizer=l2(0.01)), # se puede utilizar "he_normal"
    Dense(32, activation='relu', kernel_initializer=HeNormal(), kernel_regularizer=l2(0.01)),
    Dense(1)  # Una única salida para la regresión
])

In [22]:
# Definir el optimizador con una tasa de aprendizaje personalizada
optimizer =  RMSprop(learning_rate=0.001)

In [23]:
# Compilar el modelo
model.compile(optimizer=optimizer, loss='mae', metrics=['mse'])


In [24]:
# Resumen del modelo
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 128)               1280      
                                                                 
 dense_4 (Dense)             (None, 64)                8256      
                                                                 
 dense_5 (Dense)             (None, 32)                2080      
                                                                 
 dense_6 (Dense)             (None, 1)                 33        
                                                                 
Total params: 11649 (45.50 KB)
Trainable params: 11649 (45.50 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [25]:
# Definir callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
model_checkpoint = ModelCheckpoint('best_model.h5', save_best_only=True, monitor='val_loss')

callbacks = [early_stopping, model_checkpoint]

In [26]:
# Entrenar el modelo con callbacks
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2, callbacks=callbacks)


Epoch 1/100
Epoch 2/100
 90/409 [=====>........................] - ETA: 0s - loss: 113373.8203 - mse: 26145828864.0000

  saving_api.save_model(


Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100


In [27]:
# Evaluar el modelo en el conjunto de prueba
test_loss, test_mae = model.evaluate(X_test, y_test)
print(f'Test Mean Absolute Error: {test_mae}')


Test Mean Absolute Error: 4922137600.0


## Con una sola capa

In [28]:
model_simple = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(1)
])

In [29]:
# De pérdida, Huber. Es una combinación entre la pérdida cuadrática y la pérdida absoluta, utilizada en modelos de regresión para ser menos sensible a valores atípicos. Se comporta como una pérdida cuadrática para errores pequeños y como una pérdida absoluta para errores grandes.
loss = Huber(delta=1.0)
metrics = [RootMeanSquaredError()]

In [30]:
model_simple.compile(optimizer='adam', loss=loss, metrics=metrics)

In [31]:
history_simple = model_simple.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [34]:
# Evaluar el modelo en el conjunto de prueba
test_loss, test_mae = model_simple.evaluate(X_test, y_test)
print(f'Test Mean Absolute Error: {test_mae}')

Test Mean Absolute Error: 81738.8359375


In [35]:
# Otra estructura. kernel_initializer='he_normal' asigna pesos iniciales en una red neuronal usando una distribución normal con una desviación estándar escalada según el tamaño de la capa previa, optimizando el rendimiento en capas con activaciones ReLU.
model = Sequential([
    Dense(128, activation='relu', input_shape=(X_train.shape[1],), kernel_initializer='he_normal'),
    Dropout(0.2),
    Dense(64, activation='relu', kernel_initializer='he_normal'),
    Dropout(0.2),
    Dense(32, activation='relu', kernel_initializer='he_normal'),
    Dense(1)
])

In [36]:
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

In [37]:
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [40]:
# Evaluar el modelo en el conjunto de prueba
test_loss, test_mae = model.evaluate(X_test, y_test)
print(f'Test Mean Absolute Error: {test_mae}')

Test Mean Absolute Error: 48516.6796875


## Prueba con CNN
Una red neuronal convolucional (CNN) es un tipo de red neuronal diseñada para procesar datos con una estructura de grid (como imágenes), usando capas convolucionales para detectar patrones locales y capas de pooling para reducir dimensionalidad y mantener características importantes. Son ampliamente usadas en tareas de visión por computadora y reconocimiento de patrones.


In [41]:
# Reshape los datos para Conv1D
X_train_cnn = np.expand_dims(X_train, axis=-1)
X_test_cnn = np.expand_dims(X_test, axis=-1)

In [42]:
model_cnn = Sequential([
    # Aplica 32 filtros 1D con tamaño de kernel 2 y activación ReLU a la entrada, extrayendo características locales.
    Conv1D(32, kernel_size=2, activation='relu', input_shape=(X_train.shape[1], 1)),
    # Reduce la dimensión espacial de las características extraídas tomando el valor máximo en ventanas de tamaño 2, ayudando a la reducción de dimensionalidad y control de sobreajuste.
    MaxPooling1D(pool_size=2),
    # Convierte las características 1D en un vector plano, preparándolas para ser ingresadas a capas densas (fully connected) posteriores.
    Flatten(),
    Dense(64, activation='relu'),
    Dense(1)
])

In [43]:
model_cnn.compile(optimizer='adam', loss='mse', metrics=['mae'])

In [44]:
history_cnn = model_cnn.fit(X_train_cnn, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [None]:
# Evaluar el modelo en el conjunto de prueba
test_loss, test_mae = model_cnn.evaluate(X_test, y_test)
print(f'Test Mean Absolute Error: {test_mae}')