El objetivo de este ejercicio es implementar un modelo de regresión lineal utilizando TensorFlow para predecir los precios de la vivienda según el conjunto de datos de vivienda de California. El conjunto de datos contiene varias características, como el ingreso promedio, la edad promedio de la vivienda y más. Su tarea es construir un modelo de regresión lineal y evaluar su desempeño.

In [2]:
import tensorflow as tf
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

Cargar el Data Set California Housing, y dividirlo en caracteristicas y variable objetivo.

In [3]:
raw = fetch_california_housing()
X = pd.DataFrame(data=raw['data'], columns=raw['feature_names'])
y = pd.Series(raw['target'])
print (X)
#print (y)


       MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \
0      8.3252      41.0  6.984127   1.023810       322.0  2.555556     37.88   
1      8.3014      21.0  6.238137   0.971880      2401.0  2.109842     37.86   
2      7.2574      52.0  8.288136   1.073446       496.0  2.802260     37.85   
3      5.6431      52.0  5.817352   1.073059       558.0  2.547945     37.85   
4      3.8462      52.0  6.281853   1.081081       565.0  2.181467     37.85   
...       ...       ...       ...        ...         ...       ...       ...   
20635  1.5603      25.0  5.045455   1.133333       845.0  2.560606     39.48   
20636  2.5568      18.0  6.114035   1.315789       356.0  3.122807     39.49   
20637  1.7000      17.0  5.205543   1.120092      1007.0  2.325635     39.43   
20638  1.8672      18.0  5.329513   1.171920       741.0  2.123209     39.43   
20639  2.3886      16.0  5.254717   1.162264      1387.0  2.616981     39.37   

       Longitude  
0        -122.23  
1

Dividir el conjunto

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

print(X_train.shape)
print(X_test.shape)

(16512, 8)
(4128, 8)


Normalizar

In [7]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)

Escalar los datos de entrada

In [14]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Convertir los datos de entrenamiento y prueba a tensores de TensorFlow

In [15]:
X_train_tensor = tf.constant(X_train_scaled, dtype=tf.float32)
y_train_tensor = tf.constant(y_train.to_numpy().reshape(-1, 1), dtype=tf.float32)
X_test_tensor = tf.constant(X_test_scaled, dtype=tf.float32)
y_test_tensor = tf.constant(y_test.to_numpy().reshape(-1, 1), dtype=tf.float32)

Definir las variables para los pesos (W) y el sesgo (b) del modelo

In [17]:
W = tf.Variable(tf.random.normal(shape=(X_train.shape[1], 1)), dtype=tf.float32)
b = tf.Variable(0.0, dtype=tf.float32)

Definir el modelo de regresión lineal

In [16]:
def linear_regression(X):
    return tf.matmul(X, W) + b

Definir la función de pérdida como el error cuadrático medio entre los valores predichos y los valores verdaderos

In [18]:
def loss(y_true, y_pred):
    return tf.reduce_mean(tf.square(y_pred - y_true))

Elegir un optimizador Gradient Descent para minimizar la función de pérdida

In [19]:
optimizer = tf.optimizers.SGD(learning_rate=0.01)

In [20]:
# Entrenar el modelo
epochs = 1000

for epoch in range(epochs):
    with tf.GradientTape() as tape:
        # Calcular las predicciones y la pérdida en el conjunto de entrenamiento
        y_pred_train = linear_regression(X_train_tensor)
        current_loss = loss(y_train_tensor, y_pred_train)

    # Calcular los gradientes
    gradients = tape.gradient(current_loss, [W, b])

    # Actualizar los pesos y el sesgo utilizando el optimizador
    optimizer.apply_gradients(zip(gradients, [W, b]))

    if epoch % 100 == 0:
        # Calcular la pérdida en el conjunto de prueba
        y_pred_test = linear_regression(X_test_tensor)
        test_loss = loss(y_test_tensor, y_pred_test)

        # Imprimir la pérdida de entrenamiento y prueba a intervalos regulares
        print(f"Epoch {epoch}: Train Loss = {current_loss.numpy()}, Test Loss = {test_loss.numpy()}")

# Evaluar el modelo
y_pred_test = linear_regression(X_test_tensor)
mse = np.mean(np.square(y_pred_test - y_test.to_numpy().reshape(-1, 1)))


print("MSE:", mse)

Epoch 0: Train Loss = 11.275544166564941, Test Loss = 12.050978660583496
Epoch 100: Train Loss = 0.7706097364425659, Test Loss = 0.7603852152824402
Epoch 200: Train Loss = 0.6075896620750427, Test Loss = 0.6141864657402039
Epoch 300: Train Loss = 0.5803281664848328, Test Loss = 0.5882248282432556
Epoch 400: Train Loss = 0.5630595088005066, Test Loss = 0.5720024108886719
Epoch 500: Train Loss = 0.5506919622421265, Test Loss = 0.5612785816192627
Epoch 600: Train Loss = 0.5417554378509521, Test Loss = 0.5544008612632751
Epoch 700: Train Loss = 0.5352873206138611, Test Loss = 0.5501816868782043
Epoch 800: Train Loss = 0.5305999517440796, Test Loss = 0.5477737188339233
Epoch 900: Train Loss = 0.5271987318992615, Test Loss = 0.5465810298919678
MSE: 0.54618824
