<a href="https://colab.research.google.com/github/ReddyKhajaValluru/Courseera_MLSpecialization/blob/main/NN_Coding_Tips.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Basic Imports

In [None]:
# for array computations and loading data
import numpy as np

# for preparing data and error metrics
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# for building and training neural networks
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

Tensorflow is most often used to create multi-layer models. The [Sequential](https://keras.io/guides/sequential_model/) model is a convenient means of constructing these models.

## Split the dataset into training, cross validation, and test sets

In previous labs, you might have used the entire dataset to train your models. In practice however, it is best to hold out a portion of your data to measure how well your model generalizes to new examples. This will let you know if the model has overfit to your training set.

As mentioned in the lecture, it is common to split your data into three parts:

* ***training set*** - used to train the model
* ***cross validation set (also called validation, development, or dev set)*** - used to evaluate the different model configurations you are choosing from. For example, you can use this to make a decision on what polynomial features to add to your dataset.
* ***test set*** - used to give a fair estimate of your chosen model's performance against new examples. This should not be used to make decisions while you are still developing the models.

Scikit-learn provides a [`train_test_split`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) function to split your data into the parts mentioned above. In the code cell below, you will split the entire dataset into 60% training, 20% cross validation, and 20% test.

In [None]:
# Get 60% of the dataset as the training set. Put the remaining 40% in temporary variables: x_ and y_.
x_train, x_, y_train, y_ = train_test_split(x, y, test_size=0.40, random_state=1)

# Split the 40% subset above into two: one half for cross validation and the other for the test set
x_cv, x_test, y_cv, y_test = train_test_split(x_, y_, test_size=0.50, random_state=1)

# Delete temporary variables
del x_, y_

print(f"the shape of the training set (input) is: {x_train.shape}")
print(f"the shape of the training set (target) is: {y_train.shape}\n")
print(f"the shape of the cross validation set (input) is: {x_cv.shape}")
print(f"the shape of the cross validation set (target) is: {y_cv.shape}\n")
print(f"the shape of the test set (input) is: {x_test.shape}")
print(f"the shape of the test set (target) is: {y_test.shape}")

## Preparing Data
Here we will convert the given input data into the format required to get best output out of Neural network.


1.   We can generate more features with the current features (Mostly for regression without neural networks)
2.   We can normalize the features if they have widely different ranges of values.





### Adding more polynomial features
The code below demonstrates how to do this using the [`PolynomialFeatures`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html) class.

PolynomialFeatures will create a new matrix consisting of all polynomial combinations of the features with a degree less than or equal to the degree we just gave the model (i.e degree=2 here).



1.   If the input is univariate [x] output: [1 x x^2]
2.   If the input is Multivariate [x1,x2] output: [1 x1 x2 x1^2 x1x2 x2^2]

Above mentioned order is also followed. If you don't want the bias term 1, you can simply specify "include_bias=False" as an argument.

In [None]:
from sklearn.preprocessing import PolynomialFeatures

In [None]:
# Add polynomial features
# Instantiate the class to make polynomial features
degree = 2
poly = PolynomialFeatures(degree, include_bias=False)
# Compute the number of features and transform the training set
X_train_mapped = poly.fit_transform(x_train)
# Add the polynomial features to the cross validation set and test set
X_cv_mapped = poly.transform(x_cv)
X_test_mapped = poly.transform(x_test)

### Feature scaling (Normalizing Data)
In the course of ML specialization, you saw that it is usually a good idea to perform feature scaling to help your model converge faster. This is especially true if your input features have widely different ranges of values. For example, if you add polynomial terms, your input features will indeed have different ranges. i.e, $x$ runs from around 1600 to 3600, while $x^2$ will run from 2.56 million to 12.96 million.

So, it's good to practice feature scaling. For that, you will use the [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) class from scikit-learn. This computes the z-score of your inputs. As a refresher, the z-score is given by the equation:

$$ z = \frac{x - \mu}{\sigma} $$

where $\mu$ is the mean of the feature values and $\sigma$ is the standard deviation. The same $\mu$ and $\sigma$ should be used for the CV and test set, before sending to the neural network.

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
# Scale the features using the z-score
# Instantiate the class
scaler = StandardScaler()
# Compute the mean and standard deviation of the training set then transform it
X_train_mapped_scaled = scaler.fit_transform(X_train_mapped)
# Scale the cross validation set and test set using the mean and standard deviation of the training set
X_cv_mapped_scaled = scaler.transform(X_cv_mapped)
X_test_mapped_scaled = scaler.transform(X_test_mapped)

## Regression Model (with scikit-learn)
Now, you will create and train a regression model. As we have all the polynomial features we can simply use linear regression for all 4 tasks - univariate and multivariate linear and polynomial regressions.

For this lab, you will use the [LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) class but take note that there are other [linear regressors](https://scikit-learn.org/stable/modules/classes.html#classical-linear-regressors) which you can also use.

In [None]:
# Initialize the class
model = LinearRegression()

# Train the model
model.fit(X_train_mapped_scaled, y_train )

# Compute the training error(MSE)
yhat = model.predict(X_train_mapped_scaled)
print(f"Training MSE: {mean_squared_error(y_train, yhat) / 2}") # MSE dont have 2 in the formulae so we use 2.

## Regression Model (with Neural Networks)

Step1: Defining Model

In [None]:
tf.random.set_seed(1234)  # applied to achieve consistent results
model = Sequential(
    [
        tf.keras.Input(shape=(2,)),
        Dense(units=25, activation = 'relu', kernel_regularizer=tf.keras.regularizers.l2(0.1), name = 'layer1'),
        Dense(units=15, activation = 'relu', kernel_regularizer=tf.keras.regularizers.l2(0.01), name = 'layer2'),
        Dense(units=1, activation = 'linear', kernel_regularizer=tf.keras.regularizers.l1(0.1), name = 'layer3') # we can use relu also if the output is non negative
    ], name = "my_model"
)

>**Note 1:** The `tf.keras.Input(shape=(2,)),` specifies the expected shape of the input. This allows Tensorflow to instantiate the weights and bias parameters at this point.  This is useful when exploring Tensorflow models using `model.summary()` immediately after defining the model. This statement can be omitted in practice and Tensorflow will size the network parameters when the input data is specified in the `model.fit` statement.

The above can be used if you want to specify additional properties about the input layer. Instead, if you dont have any info other that input dimenion (i.e number of input features) we can write the following concise versions

1.   Dense(units=25, input_dim=2, activation = 'relu', name = 'layer1')
2.   Dense(units=25, input_shape=(2,), activation = 'relu', name = 'layer1')



>**Note 2:** Tensorflow gives us an oppurtunity to provide different regularization to different layers.

The `model.summary()` provides a description of the network:

In [None]:
model.summary()

Step2: Setup the loss and optimizer

In [None]:
model.compile(
    loss=tf.keras.losses.MeanSquaredError(), # simply we can also say loss='mse'
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
)

Step3: Train the model

In [None]:
model.fit(
    X_train,y_train,
    epochs=200
)

Step4: Predict output and calculate error

When *evaluating* a linear regression model, you average the squared error difference of the predicted values and the target values.

$$ J_\text{test}(\mathbf{w},b) =
            \frac{1}{2m_\text{test}}\sum_{i=0}^{m_\text{test}-1} ( f_{\mathbf{w},b}(\mathbf{x}^{(i)}_\text{test}) - y^{(i)}_\text{test} )^2
            \tag{1}
$$

In [None]:
# Record the cross validation MSEs
yhat = model.predict(X_cv)
cv_mse = mean_squared_error(y_cv, yhat) / 2

## Binary Classification (Logistic Regression)

Step1: Defining Model

In [None]:
tf.random.set_seed(1234)  # applied to achieve consistent results
model = Sequential(
    [
        Dense(units=25, input_dim=2, activation = 'relu'),
        Dense(units=15, activation = 'relu'),
        Dense(units=1, activation = 'sigmoid')
    ]
)

Step2: Setup the loss and optimizer

In [None]:
model.compile(
    loss = tf.keras.losses.BinaryCrossentropy(),
    optimizer = tf.keras.optimizers.Adam(0.01),
)

Step3: Train the model

In [None]:
model.fit(
    X_train,y_train,
    epochs=200
)

Step4: Predict output and calculate error

In the previous sections on regression models, you used the mean squared error to measure how well your model is doing. For classification, you can get a similar metric by getting the fraction of the data that the model has misclassified. For example, if your model made wrong predictions for 2 samples out of 5, then you will report an error of `40%` or `0.4`. The code below demonstrates this using  Numpy's [`mean()`](https://numpy.org/doc/stable/reference/generated/numpy.mean.html) function.

The evaluation function for categorical models used here is simply the fraction of incorrect predictions:  
$$ J_{cv} =\frac{1}{m}\sum_{i=0}^{m-1}
\begin{cases}
    1, & \text{if $\hat{y}^{(i)} \neq y^{(i)}$}\\
    0, & \text{otherwise}
\end{cases}
$$

In [None]:
# predict the output
yhat = model.predict(X_cv)
# Set the threshold for classification
threshold = 0.5
yhat = np.where(yhat >= threshold, 1, 0)
# Record the cross validation errors
cv_mse = np.mean(yhat != y_cv)

We can also use accuracy score from scikit-learn library for evaluating the performance of classification models.

Remember error is 1-accuracy

In [None]:
from sklearn.metrics import accuracy_score
import numpy as np

# Example true labels (ground truth)
y_true = np.array([0, 1, 1, 0, 1, 0])

# Example predicted labels from a classification model
y_pred = np.array([0, 1, 0, 0, 1, 1])

# Compute accuracy using accuracy_score
accuracy = accuracy_score(y_true, y_pred)

print("True Labels:", y_true)
print("Predicted Labels:", y_pred)
print("Accuracy:", accuracy)


## Multi-Class Classification (Softmax Regression)

Step1: Defining Model

Non preferred option: output layer activation = 'softmax' , loss = SparseCategoricalCrossentropy()

Preferred option: output layer activation = 'linear' , loss = SparseCategoricalCrossentropy(from_logits=True)

The preferred option is numerically stable, but remember that the output of the model should be passed through a softmax function to get the probabilities.

In [None]:
tf.random.set_seed(1234)  # applied to achieve consistent results
model = Sequential(
    [
        Dense(units=25, input_dim=2, activation = 'relu'),
        Dense(units=15, activation = 'relu'),
        Dense(units=9, activation = 'linear')
    ]
)

Step2: Setup the loss and optimizer

In [None]:
model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer = tf.keras.optimizers.Adam(0.01),
)

**SparseCategorialCrossentropy or CategoricalCrossEntropy**

Tensorflow has two potential formats for target values and the selection of the loss defines which is expected.
- SparseCategorialCrossentropy: expects the target to be an integer corresponding to the index. For example, if there are 10 potential target values, y would be between 0 and 9.
- CategoricalCrossEntropy: Expects the target value of an example to be one-hot encoded (one-hot vector) where the value at the target index is 1 while the other N-1 entries are zero. An example with 10 potential target values, where the target is 2 would be [0,0,1,0,0,0,0,0,0,0].


Step3: Train the model

In [None]:
model.fit(
    X_train,y_train,
    epochs=200
)

Step4: Predict output and calculate error

Historically output of linear layer is called logits. The output class is decided by checking the output probabilities (which ever is higher). But there is no need to pass the output through softmax because the argmax can be found using logits also.

The error is same as binary classification, since this is also a classification.

In [None]:
# predict the output
logits = model.predict(X_cv)
yhat = tf.nn.softmax(logits).numpy()
# Predict the output as the one with highest probability
yhat = np.argmax(yhat,axis=1) # axis is written because each row belongs to different input and we need the argmax value in each row
# Record the cross validation errors
cv_mse = np.mean(yhat != y_cv)

## Multi-label Classification
This is when you have multiple labels in a single image. We can tackle this problem as multiple logistic regressions in a single problem by giving multiple neurons in the output layer.

Step1: Defining Model

In [None]:
tf.random.set_seed(1234)  # applied to achieve consistent results
model = Sequential(
    [
        Dense(units=25, input_dim=2, activation = 'relu'),
        Dense(units=15, activation = 'relu'),
        Dense(units=3, activation = 'sigmoid')
    ]
)

Step2: Setup the loss and optimizer

In [None]:
model.compile(
    loss = tf.keras.losses.BinaryCrossentropy(),
    optimizer = tf.keras.optimizers.Adam(0.01),
)

Step3: Train the model

In [None]:
model.fit(
    X_train,y_train,
    epochs=200
)

Step4: Predict output and calculate error

The error is same as binary classification, since this is also a classification.

In [None]:
# predict the output
yhat = model.predict(X_cv)
# Set the threshold for classification
threshold = 0.5
yhat = np.where(yhat >= threshold, 1, 0)
# Record the cross validation errors
cv_mse = np.mean(yhat != y_cv)