# Normalisation

The lessons emphasized the significance of adjusting data such that all features align within a comparable range. For those curious about the rationale behind this, refer to the 'details' section below. For others, the subsequent segment will guide you through the practical steps of feature scaling.

## Objective
You will:

- Leverage the multi-variable techniques established in the prior session
- Execute Gradient Descent on datasets enriched with multiple attributes
- Investigate the influence of the learning rate (alpha) on the gradient descent process
- Enhance the efficiency of gradient descent by implementing feature scaling through z-score normalization.

## Library

In [None]:
import numpy as np
import matplotlib.pyplot as plt
np.set_printoptions(precision=2)

## Theory

- Feature scaling, which entails dividing each positive feature by its peak value. More broadly, one can adjust each feature using its minimum and maximum values as per the formula: (x-min)/(max-min) Both approaches normalize features to fall between -1 and 1. The first method is best suited for positive features and is straightforward, making it ideal for the examples in the lecture. In contrast, the latter method is versatile and applies to all kinds of features.
- Mean normalization: $x_i := \dfrac{x_i - \mu_i}{max - min} $ 
- Z-score normalization, which we will delve into in the following section.

### z-score normalization 

Using the z-score normalization method, every feature will possess a zero mean and a standard deviation of one.

To realize the z-score normalization, modify your input values following this equation:
x^{(i)}_j = \dfrac{x^{(i)}_j - \mu_j}{\sigma_j} \tag{4}
Here, the index $j$ identifies a feature or a column within the $\mathbf{X}$ matrix. $µ_j$ represents the average of all values corresponding to feature (j), while $\sigma_j$ denotes the standard deviation for feature (j).

$$
\begin{align}
\mu_j &= \frac{1}{m} \sum_{i=0}^{m-1} x^{(i)}_j \tag{5}\\
\sigma^2_j &= \frac{1}{m} \sum_{i=0}^{m-1} (x^{(i)}_j - \mu_j)^2  \tag{6}
\end{align}
$$

Note on Implementation: It's vital, when you normalize features, to retain the normalization parameters - the mean and the standard deviation utilized in the calculations. After deducing the model's parameters, we typically aim to forecast the values of new, unseen data. For a novel x value (like living room dimensions or bedroom count), you must first apply normalization using the mean and standard deviation derived from your initial training dataset.

In [None]:
def normalize_with_zscore(X):
    """
    Normalizes the dataset using z-score method for each column.
    
    Args:
      X (ndarray (m,n)): Original data with m samples and n features.
      
    Returns:
      X_normalized (ndarray (m,n)): Data after applying z-score normalization for each feature.
      mean_values (ndarray (n,)): Mean value for each feature.
      std_values (ndarray (n,)): Standard deviation for each feature.
    """
    
    # Calculate the mean for each feature.
    mean_values = np.mean(X, axis=0)           # mean_values will have shape (n,)
    
    # Calculate the standard deviation for each feature.
    std_values  = np.std(X, axis=0)            # std_values will have shape (n,)
    
    # Subtract the feature mean and divide by its standard deviation for each feature value.
    X_normalized = (X - mean_values) / std_values      

    return X_normalized, mean_values, std_values