# Worksheet 

Recall that we write our own minmax scalar during the lecture. In this worksheet, you will write your own standard scalar. 

Standard scalar assumes that each feature follows a distribution, then we want to minus the mean and divide by standard deviation for each feature. If all features follow normal distribution, then each feature will be standard normal after transformation.

You should write a class called `std_scalar()` to do this and this class contains 3 methods initialization, fit_transform, and transform. Your class should work as `sklearn.preprocessing.StandardScaler`.

You should add docstring and comments for your class. 

**Grading policy:**

1. Your code passes the given test example.
2. Class docstring, function docstring, and inline comments should be well-written and informative.

In [1]:
class std_scalar:
    """
    A class to perform standard scaling of features by removing the mean and scaling to unit variance.
    
    Methods
    -------
    fit_transform(data):
        Fits the scaler on the data and then transforms the data by scaling it.
        
    transform(data):
        Transforms the data using the previously computed mean and standard deviation.
    """
    
    def __init__(self):
        """
        Initializes the std_scalar object. 
        Attributes for storing the mean and standard deviation of the data will be defined later.
        """
        self.mean_ = None
        self.std_ = None
    
    def fit_transform(self, data):
        """
        Fits the scaler to the input data and transforms it by subtracting the mean and dividing by the standard deviation.
        
        Parameters
        ----------
        data : numpy array of shape (n_samples, n_features)
            The data to be scaled.
            
        Returns
        -------
        data_scaled : numpy array of shape (n_samples, n_features)
            The scaled version of the input data.
        """
        # Compute mean and standard deviation for each feature (column)
        self.mean_ = np.mean(data, axis=0)
        self.std_ = np.std(data, axis=0, ddof=0)
        
        # Transform the data using the computed mean and std deviation
        data_scaled = (data - self.mean_) / self.std_
        return data_scaled
    
    def transform(self, data):
        """
        Transforms the input data using the previously computed mean and standard deviation.
        
        Parameters
        ----------
        data : numpy array of shape (n_samples, n_features)
            The data to be scaled.
            
        Returns
        -------
        data_scaled : numpy array of shape (n_samples, n_features)
            The scaled version of the input data.
        """
        # Ensure that the scaler has already been fitted
        if self.mean_ is None or self.std_ is None:
            raise ValueError("Scaler has not been fitted. Call 'fit_transform' first.")
        
        # Transform the data using the previously computed mean and std deviation
        data_scaled = (data - self.mean_) / self.std_
        return data_scaled

In [2]:
# Test that your class is correct
import numpy as np
from sklearn.preprocessing import StandardScaler

data1 = np.arange(12).reshape(4,3)
data2 = np.arange(15).reshape(5,3)

## StandardScalar:
scalar = StandardScaler()
data1_sd =  scalar.fit_transform(data1)
data2_sd = scalar.transform(data2)

## Your own class:
std_scalar = std_scalar()
data1_sd_own =  std_scalar.fit_transform(data1)
data2_sd_own = std_scalar.transform(data2)

print(np.allclose(data1_sd, data1_sd_own))   # Should be True
print(np.allclose(data2_sd, data2_sd_own))   # Should be True

True
True
