1️⃣ Additional Scalers / Normalizers

RobustScaler – Scales features using median and IQR, which is robust to outliers.

MaxAbsScaler – Scales features to [-1, 1] using max absolute value.

Normalizer – Scales each sample vector to unit norm (L1 or L2).

2️⃣ Encoding / Transformation Utilities

LabelEncoder – Convert categorical labels to integers.

OneHotEncoder – Convert categorical features to one-hot vectors.

OrdinalEncoder – Encode categorical features as ordered integers.

3️⃣ Missing Value Handling

SimpleImputer – Fill missing values with mean, median, mode, or constant.

Advanced Imputer – Fill missing values using kNN or regression.

4️⃣ Feature Selection / Engineering

VarianceThreshold – Remove features with low variance.

PolynomialFeatures – Create polynomial or interaction features.

Log / Power Transform – Apply log, sqrt, or Box-Cox transformations.

5️⃣ Pipelines / Utility Functions

Pipeline class – Chain multiple transformations together, like sklearn pipelines.

fit_transform_all – Fit multiple transformers and apply in sequence.

save/load scaler – Serialize your scaler to reuse later.

6️⃣ Metrics / Evaluation Helpers

R2, MSE, MAE functions – Regression evaluation.

Accuracy, Precision, Recall – Classification evaluation.

7️⃣ Extras / Convenience

Support for pandas DataFrames – Automatically handle column names.

Custom ranges for scalers – Already partly done, but could add flexible ranges per column.

Check input consistency – Raise errors for mismatched feature counts between fit and transform.

In [1]:
import numpy as np
import pandas as pd

# Standard Scaler

In [2]:
class StandardScaler:
    def __init__(self):
        self.mean=None
        self.std=None
    
    def fit(self,X_train):
        self.mean=X_train.mean(axis=0)
        self.std=X_train.std(axis=0)
        self.std[self.std==0]=1
        return self
    
    def fit_transform(self,X_train):
        self.fit(X_train)
        return (X_train-self.mean)/self.std
    
    def transform(self,X_test):
        return (X_test-self.mean)/self.std





# MinMax Scaler

In [3]:
class MinMaxScaler:
    def __init__(self,min=0,max=1):
        self.min=min
        self.max=max
        self.range=max-min
        self.min_val=None
        self.max_val=None
        if self.range<0:
            raise Exception("Min value is greater than max value")
        
    def fit(self,x):
        self.max_val=np.max(x,axis=0)
        self.min_val=np.min(x,axis=0)
        self.scale_=np.where(self.max_val-self.min_val==0,1,self.max_val-self.min_val)
        return self
    
    def fit_transform(self,x):
        self.fit(x)
        return self.min+(x-self.min_val)*(self.range)/self.scale_
    
    def transform(self,x):
        return self.min+(x-self.min_val)*(self.range)/self.scale_




# MaxabsScaler

In [11]:
class MaxAbsScaler:
    def __init__(self):
        self.Absmax=None
    
    def fit(self,x):
        self.Absmax=np.max(abs(x),axis=0)
        self.scale_=np.where(self.Absmax==0,1,self.Absmax)
        return self
    
    def fit_transform(self,x):
        self.fit(x)
        return x/self.scale_
    
    def transform(self,x):
        return x/self.scale_
    


# MeanScaler

In [None]:
class StandardScaler:
    def __init__(self):
        self.mean=None
    
    def fit(self,X_train):
        self.mean=X_train.mean(axis=0)   
        return self
    
    def fit_transform(self,X_train):
        self.fit(X_train)
        return X_train-self.mean
    
    def transform(self,X_test):
        return X_test-self.mean




# Testing StandardScaler


In [4]:
from sklearn.preprocessing import StandardScaler as SklearnScaler



# Sample data
X_train = np.array([[1, 2, 3],
                    [4, 5, 6],
                    [7, 8, 9]], dtype=float)

X_test = np.array([[2, 3, 4],
                   [5, 6, 7]], dtype=float)

# Using custom scaler
custom_scaler = StandardScaler()
X_train_custom = custom_scaler.fit_transform(X_train)
X_test_custom = custom_scaler.transform(X_test)

# Using sklearn scaler
sklearn_scaler = SklearnScaler()
X_train_sklearn = sklearn_scaler.fit_transform(X_train)
X_test_sklearn = sklearn_scaler.transform(X_test)

# Compare results
print("Custom Scaler - Train:\n", X_train_custom)
print("Sklearn Scaler - Train:\n", X_train_sklearn)
print("\nCustom Scaler - Test:\n", X_test_custom)
print("Sklearn Scaler - Test:\n", X_test_sklearn)


Custom Scaler - Train:
 [[-1.22474487 -1.22474487 -1.22474487]
 [ 0.          0.          0.        ]
 [ 1.22474487  1.22474487  1.22474487]]
Sklearn Scaler - Train:
 [[-1.22474487 -1.22474487 -1.22474487]
 [ 0.          0.          0.        ]
 [ 1.22474487  1.22474487  1.22474487]]

Custom Scaler - Test:
 [[-0.81649658 -0.81649658 -0.81649658]
 [ 0.40824829  0.40824829  0.40824829]]
Sklearn Scaler - Test:
 [[-0.81649658 -0.81649658 -0.81649658]
 [ 0.40824829  0.40824829  0.40824829]]


In [5]:
custom_scaler.mean

array([4., 5., 6.])

In [6]:
custom_scaler.std

array([2.44948974, 2.44948974, 2.44948974])

In [7]:
sklearn_scaler.mean_

array([4., 5., 6.])

In [8]:
sklearn_scaler.scale_

array([2.44948974, 2.44948974, 2.44948974])

In [9]:
sklearn_scaler.var_

array([6., 6., 6.])

# Testing MinMaxScaler

In [10]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler as SklearnMinMaxScaler



# Sample data
X_train = np.array([[1, 2],
                    [3, 4],
                    [5, 6]], dtype=float)

X_test = np.array([[2, 3],
                   [4, 5]], dtype=float)

# Custom scaler
custom_scaler = MinMaxScaler(min=0, max=1)
X_train_custom = custom_scaler.fit_transform(X_train)
X_test_custom = custom_scaler.transform(X_test)

# Sklearn scaler
sklearn_scaler = SklearnMinMaxScaler(feature_range=(0, 1))
X_train_sklearn = sklearn_scaler.fit_transform(X_train)
X_test_sklearn = sklearn_scaler.transform(X_test)

# Compare results
print("Custom Scaler - Train:\n", X_train_custom)
print("Sklearn Scaler - Train:\n", X_train_sklearn)
print("\nCustom Scaler - Test:\n", X_test_custom)
print("Sklearn Scaler - Test:\n", X_test_sklearn)


print("\nTrain arrays equal:", np.allclose(X_train_custom, X_train_sklearn))
print("Test arrays equal:", np.allclose(X_test_custom, X_test_sklearn))


Custom Scaler - Train:
 [[0.  0. ]
 [0.5 0.5]
 [1.  1. ]]
Sklearn Scaler - Train:
 [[0.  0. ]
 [0.5 0.5]
 [1.  1. ]]

Custom Scaler - Test:
 [[0.25 0.25]
 [0.75 0.75]]
Sklearn Scaler - Test:
 [[0.25 0.25]
 [0.75 0.75]]

Train arrays equal: True
Test arrays equal: True


# Testing MaxAbsScaler

In [12]:
import numpy as np
from sklearn.preprocessing import MaxAbsScaler as SklearnMaxAbsScaler


# Sample data
X_train = np.array([[1, -2, 3],
                    [-4, 5, -6],
                    [7, -8, 9]], dtype=float)

X_test = np.array([[2, -3, 4],
                   [-5, 6, -7]], dtype=float)

# Custom scaler
custom_scaler = MaxAbsScaler()
X_train_custom = custom_scaler.fit_transform(X_train)
X_test_custom = custom_scaler.transform(X_test)

# Sklearn scaler
sklearn_scaler = SklearnMaxAbsScaler()
X_train_sklearn = sklearn_scaler.fit_transform(X_train)
X_test_sklearn = sklearn_scaler.transform(X_test)

# Compare results
print("Custom Scaler - Train:\n", X_train_custom)
print("Sklearn Scaler - Train:\n", X_train_sklearn)
print("\nCustom Scaler - Test:\n", X_test_custom)
print("Sklearn Scaler - Test:\n", X_test_sklearn)

# Check if arrays are almost equal
print("\nTrain arrays equal:", np.allclose(X_train_custom, X_train_sklearn))
print("Test arrays equal:", np.allclose(X_test_custom, X_test_sklearn))


Custom Scaler - Train:
 [[ 0.14285714 -0.25        0.33333333]
 [-0.57142857  0.625      -0.66666667]
 [ 1.         -1.          1.        ]]
Sklearn Scaler - Train:
 [[ 0.14285714 -0.25        0.33333333]
 [-0.57142857  0.625      -0.66666667]
 [ 1.         -1.          1.        ]]

Custom Scaler - Test:
 [[ 0.28571429 -0.375       0.44444444]
 [-0.71428571  0.75       -0.77777778]]
Sklearn Scaler - Test:
 [[ 0.28571429 -0.375       0.44444444]
 [-0.71428571  0.75       -0.77777778]]

Train arrays equal: True
Test arrays equal: True
