<a href="https://colab.research.google.com/github/Fawzy-AI-Explorer/X-From-Scratch/blob/main/Feature_Scaling_From_Scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Feature Scaling From Scratch
Table of content:
- [Goals](#goals)
- [Tools](#tools)
- [What is Feature Scaling?](#What)
- [what is The goal of feature scaling?](#goal)
- [Why Should we Use Feature Scaling?](#why)
- [Standardization  (Z_Score)](#Standardization)
- [Normalization (MinMaxScaler)](#Normalization)
- [LICENSE](#license)

## <a name="goals">Goals<a>
In this toturial, we will:

- Implement the Z_Score (Standarization)
- Implement the MinMaxScaler (Normalization)<br><br>

*All from scratch*

## <a name="tools">Tools<a>
In this toturial, we will make use of:
- pandas, a Python library used for working with data sets
- NumPy, a popular library for scientific computing


##**<a name="What">What is Feature Scaling?<a>**
                  1.   Feature scaling, also known as data normalization or standardization, is a preprocessing technique used to rescale the values of features in a dataset to a similar scale.                                                
                  2.  Feature scaling becomes necessary in machine learning when the data set containing features that have different ranges

##**<a name="goal">what is The goal of feature scaling?<a>**
The goal of feature scaling in machine learning is to ensure that all features in the dataset are on a similar scale or range.

###**<a name="why">Why Should we Use Feature Scaling?<a>**
Many machine learning algorithms perform better when features are on a similar scale. Scaling features helps algorithms converge faster and find the optimal solution more efficiently.  

---
---
---
 In algorithms that use distance-based calculations or Tree-Based or gradient descent optimization, features with larger scales can dominate the calculation, leading to biased results.

In [None]:
import numpy as np
import pandas as pd

In [None]:
# Create a dictionary with column names as keys and lists as values
data = {
    'Feature1': [1, 2, 3, 4, 5],
    'Feature2': [6, 7, 8, 9, 10],
    'Feature3': [11, 12, 13, 14, 15]
}

# Create DataFrame from the dictionary
df = pd.DataFrame(data)

# Display the DataFrame
print(df)

   Feature1  Feature2  Feature3
0         1         6        11
1         2         7        12
2         3         8        13
3         4         9        14
4         5        10        15


## **<a name="Standardization">Standardization<a>**
Standardization, also known as z-score normalization or z-score standardization, is a data preprocessing technique used to transform features by scaling them to have a mean of 0 and a standard deviation of 1.

---
---
---
---
We will implement MinMaxScaler Here’s the formula for standardization:<br>
<br>$$X^" ~=~ \frac{X- {mean}}{std}\tag{1}$$<br>

In [None]:
def fit(x):
  mean = np.mean(x ,axis=0)
  std = np.std(x, axis=0)
  return mean , std

mean , std= fit(df)
print(mean , std)

Feature1     3.0
Feature2     8.0
Feature3    13.0
dtype: float64 Feature1    1.414214
Feature2    1.414214
Feature3    1.414214
dtype: float64


we need only mean and std of the each column

In [None]:
def fit(x):
  x_numpy = x.to_numpy()
  mean = np.mean(x_numpy ,axis=0)
  std = np.std(x_numpy, axis=0)
  return mean , std

mean , std= fit(df)
print(mean , std)

[ 3.  8. 13.] [1.41421356 1.41421356 1.41421356]


**Great!!!!**

In [None]:
def transform(x , mean , std ):

    if mean is None or std is None:
        raise ValueError("Scaler has not been fitted.")

    # Perform standardization by centering and scaling
    X_tr = (x - mean) / std

    return X_tr

c = transform(df , mean , std )
print(c)

   Feature1  Feature2  Feature3
0 -1.414214 -1.414214 -1.414214
1 -0.707107 -0.707107 -0.707107
2  0.000000  0.000000  0.000000
3  0.707107  0.707107  0.707107
4  1.414214  1.414214  1.414214


**let’s check result.**

In [None]:
from sklearn.preprocessing import StandardScaler

# Create an instance of StandardScaler
scaler = StandardScaler()

# Fit the scaler to your data (compute mean and standard deviation)
scaler.fit(df)

# Transform the training data
df_scaled = scaler.transform(df)
df_scaled

array([[-1.41421356, -1.41421356, -1.41421356],
       [-0.70710678, -0.70710678, -0.70710678],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.70710678,  0.70710678,  0.70710678],
       [ 1.41421356,  1.41421356,  1.41421356]])

**Geat!!!!!!!!**

In [None]:
class StandardScaler:
    def __init__(self):
        self.mean_ = None
        self.std_ = None

    def fit(self, x, y=None):
        """
        Compute the mean and std to be used for later scaling.

        Parameters:
            x : {array-like, sparse matrix} of shape (n_samples, m_features)
                The data used to compute the mean and standard deviation used for later scaling along the features axis.
            y : None
                Ignored.

        Returns:
            self : object
                Fitted scaler.
        """
        x_numpy = x.to_numpy()
        # Compute the mean and standard deviation along the features axis
        self.mean_ = np.mean(x_numpy, axis=0)
        self.std_ = np.std(x_numpy, axis=0)

        return self


    def transform(self, x):
        """
        Perform standardization by centering and scaling.

        Parameters:
            x : {array-like, sparse matrix} of shape (n_samples, n_features)
                The data used to scale along the features axis.

        Returns:
            x_transformed : {ndarray, sparse matrix} of shape (n_samples, n_features)
                Transformed array.
        """
        # Check if the scaler has been fitted
        if self.mean_ is None or self.std_ is None:
            raise ValueError("Scaler has not been fitted.")

        # Perform standardization by centering and scaling
        x_transformed = (x - self.mean_) / self.std_

        return x_transformed


    def fit_transform(self, x, y=None):
        """
        Fit to data, then transform it.

        Parameters:
            X : array-like of shape (n_samples, n_features)
                Input samples.
            y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None
                Target values (ignored).

        Returns:
            X_new : ndarray of shape (n_samples, n_features)
                Transformed array.
        """
        # Fit the scaler to the data and transform the data
        self.fit(x)
        x_trans = self.transform(x)
        return x_trans


In [None]:
scaler = StandardScaler()
scaler.fit(df)
scaler.transform(df)

Unnamed: 0,Feature1,Feature2,Feature3
0,-1.414214,-1.414214,-1.414214
1,-0.707107,-0.707107,-0.707107
2,0.0,0.0,0.0
3,0.707107,0.707107,0.707107
4,1.414214,1.414214,1.414214


In [None]:
scaler.fit_transform(df)

Unnamed: 0,Feature1,Feature2,Feature3
0,-1.414214,-1.414214,-1.414214
1,-0.707107,-0.707107,-0.707107
2,0.0,0.0,0.0
3,0.707107,0.707107,0.707107
4,1.414214,1.414214,1.414214


**It work Done !!!!**

## **<a name="Normalization">Normalization<a>**

Normalization is a data preprocessing technique used to scale numeric features within a specific range or distribution.           
Normalization  involves transforming the values of numeric features to a common scale, often between 0 and 1.

---
---
---
---

We will implement Standard Scaler Here’s the formula for standardization:<br>

<br>$$X^" ~=~ \frac{X- X_{min}}{X_{max}-X{min}}\tag{1}$$<br>

In [None]:
df

Unnamed: 0,Feature1,Feature2,Feature3
0,1,6,11
1,2,7,12
2,3,8,13
3,4,9,14
4,5,10,15


In [None]:
def fit(x):

  x_numpy = x.to_numpy()
  # Compute the mean and standard deviation along the features axis
  min = np.min(x_numpy, axis=0)
  max = np.max(x_numpy, axis=0)

  return min , max

min , max = fit(df)
print(min , max)


[ 1  6 11] [ 5 10 15]


In [None]:
def transform ( x , min , max ) :
  if min is None or max is None:
      raise ValueError("Scaler has not been fitted.")

  # Perform standardization by centering and scaling
  # print(min, max)
  x_transformed = (x - min) / (max - min)
  return x_transformed
xtr = transform(df,min,max)
xtr

Unnamed: 0,Feature1,Feature2,Feature3
0,0.0,0.0,0.0
1,0.25,0.25,0.25
2,0.5,0.5,0.5
3,0.75,0.75,0.75
4,1.0,1.0,1.0


In [None]:
class MinMaxScaler:
    def __init__(self):
        self.min = None
        self.max = None

    def fit(self, x, y=None):
        """
        Compute the min and max to be used for later scaling.

        Parameters:
            x : {array-like, sparse matrix} of shape (n_samples, m_features)
                The data used to compute the mean and standard deviation used for later scaling along the features axis.
            y : None
                Ignored.

        Returns:
            self : object
                Fitted scaler.
        """
        x_numpy = x.to_numpy()
        # Compute the mean and standard deviation along the features axis
        self.min = np.min(x_numpy, axis=0)
        self.max = np.max(x_numpy, axis=0)

        return self


    def transform(self, x):
        """
        Perform Normalization by centering and scaling.

        Parameters:
            x : {array-like, sparse matrix} of shape (n_samples, n_features)
                The data used to scale along the features axis.

        Returns:
            x_transformed : {ndarray, sparse matrix} of shape (n_samples, n_features)
                Transformed array.
        """
        # Check if the scaler has been fitted
        if self.min is None or self.max is None:
            raise ValueError("Scaler has not been fitted.")

        # Perform standardization by centering and scaling
        x_transformed = (x - self.min) / (self.max - self.min)

        return x_transformed


    def fit_transform(self, x, y=None):
        """
        Fit to data, then transform it.

        Parameters:
            X : array-like of shape (n_samples, n_features)
                Input samples.
            y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None
                Target values (ignored).

        Returns:
            X_new : ndarray of shape (n_samples, n_features)
                Transformed array.
        """
        # Fit the scaler to the data and transform the data
        self.fit(x)
        x_trans = self.transform(x)
        return x_trans


In [None]:
scaler = MinMaxScaler()
scaler.fit(df)
scaler.transform(df)

Unnamed: 0,Feature1,Feature2,Feature3
0,0.0,0.0,0.0
1,0.25,0.25,0.25
2,0.5,0.5,0.5
3,0.75,0.75,0.75
4,1.0,1.0,1.0


**let’s check result.**

In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

# Fit the scaler to your data
scaler.fit(df)

# Transform your data using the fitted scaler
df_scaled = scaler.transform(df)
df_scaled

array([[0.  , 0.  , 0.  ],
       [0.25, 0.25, 0.25],
       [0.5 , 0.5 , 0.5 ],
       [0.75, 0.75, 0.75],
       [1.  , 1.  , 1.  ]])

**Great!!!!!! the same**