# Scaling
Feature scaling is a crucial step in the preprocessing pipeline for many 
statistical modeling and machine learning algorithms. Scalers transform features 
so that they are on the same scale, making it easier for the models to converge 
and perform optimally.

Importance:

* **Consistency** across features: without scaling , algorithms may become biased toward the larger-scale features

* **Faster convergence**: unscaled features can lead to slow convergence since large features values dominate the optimization steps

* **Model performance**: scaling is particularly critical for clustering algorithms, like K-means, and k-nearest neighbors, which depend on the distance between points. On the other hand, Tree-based algorithms are less sensitive to feature scaling. 

This notebook will implement some of the most important scalers.
The construction will be based on classes, each composed of the following functions:
1. __init__: 
2. fit
3. transform
4. inverse_transform



In [1]:
import numpy as np
# MinMax Scaler

#class MinMaxScaler:
class MinMaxScaler:

    def __init__(self):
        """
        The constructor initialized the object, setting X_min and X_max as 
        attributes
        """
        self.X_min = None
        self.X_max = None

    def fit(self, X):
        """
        The fit method calculates the min and max values from the data.
        """
        # Guarante that X is an numpy array
        X = np.array(X)

        # Compute min/max values of X
        self.X_min = min(X)
        self.X_max = max(X)


    def transform(self, X):
        """
        The transform method scales the data using X_min and X_max and applying 
        the formula X_scaled = (x-min)/(max-min)
        """
        if self.X_min == None or self.X_max == None:
            raise ValueError("Scaller not fitted yet. Please call fit")

        if self.X_min == self.X_max:
            raise ValueError("The minimum and maximum value in the data are the same. Please, check and fit again")

        X = np.array(X)
        X_scaled = (X - self.X_min)/(self.X_max - self.X_min)

        return X_scaled

    def inverse_transform(self, X_scaled):
        """
        The inverse_transform method reverses the scaling process.
        """
        X_original = X_scaled* (self.X_max - self.X_min) + self.X_min

        return X_original



In [2]:
X = [10, 20, 30, 40, 50]
a = MinMaxScaler()
a.fit(X)

# Transform the data (scale it)
X_scaled = a.transform(X)
print("Scaled data:", X_scaled)

# Inverse transform the scaled data (return to original scale)
original_data = a.inverse_transform(X_scaled)
print("Original data:", original_data)



Scaled data: [0.   0.25 0.5  0.75 1.  ]
Original data: [10. 20. 30. 40. 50.]


In [3]:
# Standard Scaler

class StandardScaler:

    def __init__(self):
        self.X_mean = None
        self.X_std = None


    def fit(self, X):
        X = np.array(X)

        self.X_mean = np.mean(X)
        self.X_std = np.std(X)

    def transform(self, X):

        if self.X_mean == None or self.X_std ==None:
            raise ValueError("Please fit your data before transforming it")
        
        if self.X_std == 0:
            raise ValueError("There is no variability in the data.")

        X = np.array(X)
        X_standard = (X - self.X_mean)/self.X_std
        return X_standard

    def inverse_transform(self, X_standard):
        X_original = X_standard*self.X_std + self.X_mean
        return X_original
    



In [4]:
X = [10, 20, 30, 40, 50]
a = StandardScaler()
a.fit(X)

# Transform the data (scale it)
X_scaled = a.transform(X)
print("Scaled data:", X_scaled)

# Inverse transform the scaled data (return to original scale)
original_data = a.inverse_transform(X_scaled)
print("Original data:", original_data)


Scaled data: [-1.41421356 -0.70710678  0.          0.70710678  1.41421356]
Original data: [10. 20. 30. 40. 50.]


# Robust Scaler

The Robust Scaler uses the **median** and the **interquartile range (IQR)*** 
for scaling, which makes it **robust to outliers**:

$$ X_{\text{scaled}} = \frac{X - median(X)}{IQR(R)}

In [5]:
class RobustScaler:
    def __init__(self):
        self.X_median = None
        self.X_IQR = None
    
    def fit(self, X):
        X = np.array(X)
        self.X_median = np.median(X)
        self.X_IQR = np.percentile(X, 75) - np.percentile(X,25)

    
    def transform(self, X):
        X = np.array(X) 
        if self.X_median == None or self.X_IQR == None:
            raise ValueError("Please fit your data before transforming it.")

        if self.X_IQR == 0:
            raise ValueError("The values of the 75 and 25 percentiles are the same. Please check your data and fit it again")
        X_scaled = (X - self.X_median)/self.X_IQR
        return X_scaled

    def inverse_transform(self, X_scaled):
        X_original = (X_scaled * self.X_IQR) + self.X_median
        return X_original

In [6]:
X = [10, 20, 30, 40, 50]

a = RobustScaler()
a.fit(X)

# Transform the data (scale it)
X_scaled = a.transform(X)
print("Scaled data:", X_scaled)

# Inverse transform the scaled data (return to original scale)
original_data = a.inverse_transform(X_scaled)
print("Original data:", original_data)

Scaled data: [-1.  -0.5  0.   0.5  1. ]
Original data: [10. 20. 30. 40. 50.]



#### Choosing the Right Scaling Method:

1.	Standardization (Z-Score Scaling):
* When to use: When you don’t have extreme outliers, and gradient-based
     optimization is used (e.g., logistic regression, neural networks).
* Pros: Works well for most machine learning algorithms, especially if the
     features are normally distributed or have similar ranges.
* Cons: Sensitive to outliers, which can skew the mean and standard deviation.
* Note: standardization does not requires features to be normally distributed.


2.	Min-Max Scaling:
* When to use: When you need features to be within a bounded range (e.g., 0 to 1) or when the algorithm is sensitive to feature magnitudes (e.g., distance-based algorithms like KNN, SVMs).
* Pros: Ensures all features are within a fixed range, useful when the algorithm expects input features in a specific range.
* Cons: Very sensitive to outliers, as they can affect the minimum and maximum values.

3.	Robust Scaling:
* When to use: When your data has significant outliers or is not normally distributed.
* Pros: More robust to outliers than standardization or min-max scaling.
* Cons: Does not strictly bound the values like min-max scaling.