<a href="https://colab.research.google.com/github/Metallicode/Math/blob/main/Feature_Scaling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Feature Scaling

[Feature scaling](https://en.wikipedia.org/wiki/Feature_scaling) is an essential preprocessing step in many machine learning algorithms, especially for optimization-based methods such as gradient descent. Let's dive into the concept:

**1. What is it?**
Feature scaling is the process of normalizing or standardizing the range of independent variables or features in the data. The primary goal is to make sure features have a similar scale so that no particular feature dominates the others in influencing the model due to its scale.

**2. Common Methods:**
- **Min-Max Scaling (Normalization)**: This scales features so they are in the range [0, 1]. The formula is:
  
  new_x = (x-min(x)) / (max(x) - min(x))
  
- **Standardization (Z-score normalization)**: This method scales features so they have a mean (μ) of 0 and a standard deviation (σ) of 1. The formula is:
  
  new_x = (x - μ) / σ

### Connection to Gradient Descent:

**1. Faster Convergence:**
When we use gradient descent to update weights, features with a smaller range would have a smaller gradient step, while features with a larger range would take a larger step. If one or more features have vastly larger scales than others, then gradient descent can oscillate and take a long time to converge. When all features are on a similar scale, the gradient tends to point more directly towards the minimum, allowing gradient descent to converge more quickly.

**2. More Robust Convergence:**
Unscaled or poorly scaled data can lead to a situation where the contour plot of the cost function is elongated. Gradient descent can overshoot and oscillate around the minimum rather than converging. Feature scaling ensures a more "circular" contour, leading to a smoother path to the minimum.

**3. Avoiding Dominant Features:**
If one feature has a range of 0-1 and another feature has a range of 0-1000, the latter feature can disproportionately affect the prediction, even if it's not necessarily more important. Scaling ensures each feature has an equal initial influence on the model.

### In Summary:
Feature scaling standardizes the range of features, ensuring that no feature artificially dominates others due to its scale. In the context of gradient descent, this results in a more direct path to the cost function's minimum and faster convergence. For deep learning models and neural networks where there could be thousands of gradient updates, feature scaling becomes especially crucial to ensure timely and stable convergence.

**Generally we do not need to scale the labels, just the features.**

In [None]:
import numpy as np

##Normalization

In [None]:


# Sample data
data = np.array([50, 60, 70, 80, 90])

# Compute the minimum and maximum
data_min = np.min(data)
data_max = np.max(data)

# Normalize the data
normalized_data = (data - data_min) / (data_max - data_min)

print("Original Data: ", data)
print("Normalized Data: ", normalized_data)


In [None]:
##using sklearn

from sklearn.preprocessing import MinMaxScaler

data = np.array([50, 60, 70, 80, 90]).reshape(-1, 1)  # Reshaping because MinMaxScaler expects 2D array

scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)
#print(data)
print("Original Data: ", data.flatten())
print("Normalized Data: ", normalized_data.flatten())


[[50]
 [60]
 [70]
 [80]
 [90]]
Original Data:  [50 60 70 80 90]
Normalized Data:  [0.   0.25 0.5  0.75 1.  ]


##Standardization

In [None]:
import numpy as np

# Sample data
data = np.array([50, 60, 70, 80, 90])

# Compute the mean and standard deviation
mean = np.mean(data)
std_dev = np.std(data)

# Standardize the data
standardized_data = (data - mean) / std_dev

print("Original Data: ", data)
print("Standardized Data: ", standardized_data)
print("Mean of Standardized Data: ", np.mean(standardized_data))
print("Standard Deviation of Standardized Data: ", np.std(standardized_data))


##Using Sklearn

In [10]:
from sklearn.preprocessing import StandardScaler

scalar = StandardScaler()

data = np.array([50, 60, 70, 80, 90])

data = data.reshape(-1, 1)

scalar.fit(data)

scalar.transform(data)

array([[-1.41421356],
       [-0.70710678],
       [ 0.        ],
       [ 0.70710678],
       [ 1.41421356]])