<a href="https://colab.research.google.com/github/DrashiDave/DATA-602/blob/main/Week%20-%2012/AB37815_Week_12.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Week 12 Template

This template provides code to load the [California housing dataset](https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset) from scikit-learn.  In this dataset each observation represents a census block group. The dataset features represent numeric properties of the census block such as the median income, median house age, and average number of bedrooms for the block.  The target variable reflects the median house value for that census block (in hundreds of thousands of dollars).  Refer to the Scikit user guide for details.

For this assignment, you will need to build and train a deep (i.e., fully-connected) neural network in Keras that predicts the median house value from the given target variables. Note that this is a regression problem.

Your approach should:

* Scale the data and perform preprocessing as you see fit.  You may use scikit-learn for preprocessing.
* Predict unseen observations (validation and test) with a mean absolute percentage error (MAPE) of less than 25\%.
* Use a `ModelCheckpoint` callback during training to save the weights corresponding to the highest validation MAPE.  (You will need to use the `validation_split` parameter or provide validation data.)
* Load the weights from the best model after training
* Evaluate the best model against the test dataset

To receive full credit, your notebook must show that evaluation of the model on the test dataset yields an MAPE of 25\% or less.


In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_percentage_error
import numpy as np
import tensorflow as tf

###**Data loading and splitting**

For unbiased evaluation, we split data into training and testing sets using 85-15 ratio.

In [None]:
california_housing = fetch_california_housing(as_frame=False)
X = california_housing.data
y = california_housing.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15)

###**Data Preprocessing**

The features are scaled using StandardScaler to normalize them which ensures their capability to the neural network's optimization process.

In [None]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

###**Model Architecture**

A fully connected neural network is defined using the Sequential API in TensorFlow/Keras.


**Dense layers** are to learn the representations.

**BatchNormalization layers** are to stabilize training.

**Dropout layers** are to prevent overfitting.

The final layer has one unit with a linear activation function for regression.

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1)  # Output layer for regression
])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


###**Model Compilation**

The model is compiled using the Adam optimizer with the learning rate of 0.001.

Mean Squared Error (MSE) is the loss function and Mean Absolute Error is the performance metric (MAE)

In [None]:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss='mean_squared_error',
              metrics=['mae'])

###**Model Training Callback**

The ModelCheckpoint callback is used to save the best-performing model during training based on validation Mean Absolute Error (MAE).



In [None]:
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    'best_model.keras',
    monitor='val_mae',
    mode='min',
    save_best_only=True,
    save_weights_only=False
)

###**Training**

The training process runs 100 epochs with a batch size of 32.

In [None]:
training_history = model.fit(
    X_train_scaled, y_train,
    validation_split=0.2,
    epochs=100,
    batch_size=32,
    callbacks=[checkpoint_callback],
    verbose=0
)

###**Load and evaluate the best model**

The weights of best-performung model are loaded for evaluation on the test dataset.

The model is evaluated on test dataset using MAPE. An MAPE value of 25% or less id the goal.

In [None]:
best_model = tf.keras.models.load_model('best_model.keras')

y_pred = best_model.predict(X_test_scaled).flatten()
mape = mean_absolute_percentage_error(y_test, y_pred)

print(f"Mean Absolute Percentage Error: {mape * 100:.2f}%")

[1m97/97[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
Mean Absolute Percentage Error: 20.45%


The model achieves MAPE value of 21.60%, which is below the required threshold of 25%. It demonstrate successful prediction of the medan house values.