# 7 Linear Regression with TensorFlow using the California Housing Dataset

The goal of this exercise is to implement a linear regression model using TensorFlow to predict house prices based on the California Housing Dataset. The dataset contains various features such as average income, housing average age, and more. Your task is to build a linear regression model and evaluate its performance.

### Import the required libraries:

In [None]:
import tensorflow as tf
#import tensorflow.compat.v1 as tf
import pandas as pd
from sklearn.datasets import fetch_california_housing
from tensorflow import keras

### Load the California Housing Dataset

In [None]:
raw = fetch_california_housing()
X = pd.DataFrame(data=raw['data'], columns=raw['feature_names'])
y = pd.Series(raw['target'])

In [None]:
X

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.023810,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.971880,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.802260,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25
...,...,...,...,...,...,...,...,...
20635,1.5603,25.0,5.045455,1.133333,845.0,2.560606,39.48,-121.09
20636,2.5568,18.0,6.114035,1.315789,356.0,3.122807,39.49,-121.21
20637,1.7000,17.0,5.205543,1.120092,1007.0,2.325635,39.43,-121.22
20638,1.8672,18.0,5.329513,1.171920,741.0,2.123209,39.43,-121.32


In [None]:
y

0        4.526
1        3.585
2        3.521
3        3.413
4        3.422
         ...  
20635    0.781
20636    0.771
20637    0.923
20638    0.847
20639    0.894
Length: 20640, dtype: float64

## Preprocess the data:

* Normalize the features using the mean and standard deviation.
* Split the dataset into training and testing sets (e.g., 80% for training, 20% for testing).


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

In [None]:
# Normalizar las características
scaler = StandardScaler()
X_normalized = scaler.fit_transform(X)

In [None]:
# Dividir el conjunto de datos en entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X_normalized, y, test_size=0.2, random_state=42)

## Define the TensorFlow graph:

* Create placeholders for the input features (X) and target variable (y).
* Create variables for the model's weights (W) and bias (b).
* Define the linear regression model using the equation: y_pred = X * W + b.
* Define the loss function as the mean squared error between the predicted values and the true values.
* Choose an optimizer (e.g., Gradient Descent) to minimize the loss function.

In [None]:
# Crear tensores para las características de entrada (X) y la variable objetivo (y)
X_placeholder = tf.Variable(X_train, dtype=tf.float32, name='X_placeholder')
y_placeholder = tf.Variable(y_train, dtype=tf.float32, name='y_placeholder')

In [None]:
# Crear variables para los pesos del modelo (W) y el sesgo (b)
W = tf.Variable(tf.zeros(shape=[X_train.shape[1], 1]), dtype=tf.float32, name='weights')
b = tf.Variable(0.0, dtype=tf.float32, name='bias')

In [None]:
# Definir el modelo de regresión lineal
y_pred = tf.matmul(X_placeholder, W) + b

In [None]:
# Definir la función de pérdida como el error cuadrático medio
loss = tf.math.reduce_mean(tf.square(y_pred - y_placeholder),name="mse")

In [None]:
tf.compat.v1.disable_v2_behavior()

In [None]:
# Elegir un optimizador para minimizar la función de pérdida (por ejemplo, Gradient Descent)
learning_rate = 0.01
optimizer = tf.compat.v1.train.GradientDescentOptimizer(learning_rate=learning_rate)

training_op = optimizer.minimize(loss)


## Train the model:

* Initialize TensorFlow session.
* Initialize the model's variables.
* Set the number of training epochs and the learning rate.
* For each epoch, iterate through the training dataset and update the model's parameters using the optimizer.
* Print the training loss at regular intervals.

In [None]:
#theta = tf.Variable(tf.random.uniform([20640 + 1, 1], -1.0, 1.0), name="theta")
theta = tf.Variable(tf.random.uniform([X_train.shape[1] + 1, 1], -1.0, 1.0), name="theta")


In [None]:
with tf.compat.v1.Session() as sess:
    #tf.global_variables_initializer().run()
    tf.compat.v1.global_variables_initializer().run()

    for epoch in range(1000):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", loss.eval())
        sess.run(training_op)

    best_theta = theta.eval()



Epoch 0 MSE = 5.629741
Epoch 100 MSE = 1.4119647
Epoch 200 MSE = 1.3374567
Epoch 300 MSE = 1.335832
Epoch 400 MSE = 1.335492
Epoch 500 MSE = 1.3351744
Epoch 600 MSE = 1.3348573
Epoch 700 MSE = 1.3345406
Epoch 800 MSE = 1.3342237
Epoch 900 MSE = 1.3339068


## Evaluate the model:

* Use the trained model to make predictions on the test dataset.
* Calculate the mean squared error (MSE) between the predicted and true values.
* Print the MSE as a measure of the model's performance.