# Data Transformation

Process of converting data into a specific range such that machine learning  models can use it efficiently with maximum computational efficiency and accuracy.

<b>Scaling</b><br>

Shrinking the range of the data without altering the shape of its distribution.<br>
Especially important for algorithms that are sensitive to the scale of the data such as support vector machines(SVMs) or K-neares neighbors(KNN).

In [18]:
# importing necessary libraries 
from sklearn.preprocessing import StandardScaler, MinMaxScaler, MaxAbsScaler , RobustScaler 
import random

<b>Types of scaling:</b></br>

i. Standardization(Standard Scaler) : Scaling the data into unknown specific unkown range (usually between -3 and 3).

In [19]:
# trying standard scaler 
standard_scaler = StandardScaler()
# making a random sample data
data = [[random.randint(10,100) for _ in range(10)] for _ in range(10)] 
data

[[34, 49, 53, 31, 27, 97, 68, 86, 66, 96],
 [64, 83, 15, 85, 34, 54, 96, 24, 22, 87],
 [30, 45, 61, 30, 25, 70, 98, 93, 25, 43],
 [12, 14, 62, 47, 70, 13, 68, 16, 82, 27],
 [30, 11, 47, 69, 30, 47, 13, 58, 94, 98],
 [31, 31, 47, 38, 30, 22, 88, 45, 70, 95],
 [76, 71, 27, 46, 22, 78, 36, 65, 87, 21],
 [95, 17, 94, 74, 25, 44, 81, 48, 96, 84],
 [19, 44, 37, 60, 24, 58, 64, 39, 54, 93],
 [13, 39, 71, 95, 57, 41, 10, 50, 28, 75]]

In [20]:
# applying standard scaler 
standard_scaled_data = standard_scaler.fit_transform(data) 
# conversion of numpy array to list for better readability
standard_scaled_data = standard_scaled_data.tolist() 
standard_scaled_data

[[-0.239072405544125,
  0.3823770927040871,
  0.07482519798050177,
  -1.2283138720661904,
  -0.48684210526315774,
  1.86900676517856,
  0.18972492906155766,
  1.4515156152682522,
  0.13169595770948875,
  0.8485231702739544],
 [0.8815794954439612,
  1.8941004824644312,
  -1.7022732540564136,
  1.2746653389366127,
  -0.026315789473684115,
  0.06704957005124886,
  1.1056383797035605,
  -1.2268762938576891,
  -1.477921303184262,
  0.5316472975575398],
 [-0.38849265900920316,
  0.20452728214404664,
  0.44895118788301025,
  -1.2746653389366127,
  -0.6184210526315788,
  0.7375452705637368,
  1.1710607690351322,
  1.7539147017824714,
  -1.3681746717596879,
  -1.0175236357227093],
 [-1.060883799602055,
  -1.1738087496962673,
  0.49571693662082383,
  -0.48669040213943393,
  2.3421052631578947,
  -1.6510956625120015,
  0.18972492906155766,
  -1.5724752498739398,
  0.7170113253072162,
  -1.5808585205518908],
 [-0.38849265900920316,
  -1.3071961076162975,
  -0.20576929444637962,
  0.533041869009856

<b>Interpretation:</b> All the data points are converted into the range of -3 and 3.

ii. MinMaxScaler : Scaling the data into the range between 0 and 1

In [21]:
# applying MinMaxScaler 
minmax_scaler = MinMaxScaler() 
# applying MinMaxScaler to the same data
minmax_scaled_data = minmax_scaler.fit_transform(data) 
minmax_scaled_data = minmax_scaled_data.tolist() 
minmax_scaled_data

[[0.26506024096385544,
  0.5277777777777777,
  0.4810126582278481,
  0.015384615384615385,
  0.10416666666666669,
  0.9999999999999999,
  0.6590909090909091,
  0.9090909090909092,
  0.5945945945945945,
  0.974025974025974],
 [0.6265060240963856,
  0.9999999999999999,
  0.0,
  0.8461538461538461,
  0.24999999999999994,
  0.488095238095238,
  0.9772727272727272,
  0.10389610389610388,
  0.0,
  0.8571428571428572],
 [0.21686746987951805,
  0.4722222222222222,
  0.5822784810126582,
  0.0,
  0.062499999999999944,
  0.6785714285714285,
  1.0,
  1.0,
  0.040540540540540515,
  0.28571428571428575],
 [0.0,
  0.04166666666666666,
  0.5949367088607594,
  0.2615384615384616,
  1.0,
  0.0,
  0.6590909090909091,
  0.0,
  0.8108108108108107,
  0.07792207792207789],
 [0.21686746987951805,
  0.0,
  0.40506329113924044,
  0.6,
  0.16666666666666669,
  0.40476190476190477,
  0.03409090909090909,
  0.5454545454545454,
  0.972972972972973,
  1.0],
 [0.22891566265060243,
  0.2777777777777778,
  0.4050632911

<b>Interpretation:</b> All the data points are converted into the range between 0 and 1.

iii. MinAbsoluteScaler : Scaling the data into the range between -1 and 1.

In [22]:
maxabs_scaler = MaxAbsScaler() 
# applying MaxAbsScaler to the same data 
maxabs_scaled_data = maxabs_scaler.fit_transform(data).tolist() 
maxabs_scaled_data

[[0.35789473684210527,
  0.5903614457831325,
  0.5638297872340425,
  0.3263157894736842,
  0.38571428571428573,
  1.0,
  0.6938775510204082,
  0.9247311827956989,
  0.6875,
  0.9795918367346939],
 [0.6736842105263158,
  1.0,
  0.1595744680851064,
  0.8947368421052632,
  0.4857142857142857,
  0.5567010309278351,
  0.9795918367346939,
  0.25806451612903225,
  0.22916666666666666,
  0.8877551020408163],
 [0.3157894736842105,
  0.5421686746987951,
  0.648936170212766,
  0.3157894736842105,
  0.35714285714285715,
  0.7216494845360825,
  1.0,
  1.0,
  0.2604166666666667,
  0.4387755102040816],
 [0.12631578947368421,
  0.1686746987951807,
  0.6595744680851063,
  0.49473684210526314,
  1.0,
  0.13402061855670103,
  0.6938775510204082,
  0.17204301075268819,
  0.8541666666666666,
  0.2755102040816326],
 [0.3157894736842105,
  0.13253012048192772,
  0.5,
  0.7263157894736842,
  0.42857142857142855,
  0.4845360824742268,
  0.1326530612244898,
  0.6236559139784946,
  0.9791666666666666,
  1.0],
 [

<b>Interpretation:</b> All the data points are converted into the range of -1 to 1.

iv. Robust scaler : removing the median and scaling the data according to the quantile range (i.e. IQR -> Interquartile Range)

In [23]:
robust_scaler = RobustScaler() 
# applying RobustScaler to the same data
robust_scaled_data = robust_scaler.fit_transform(data).tolist() 
robust_scaled_data

[[0.10071942446043165,
  0.2727272727272727,
  0.1348314606741573,
  -0.6870229007633588,
  -0.1875,
  1.8415841584158417,
  0.0,
  1.6263736263736264,
  -0.03902439024390244,
  0.2413793103448276],
 [0.9640287769784173,
  1.509090909090909,
  -1.5730337078651686,
  0.9618320610687023,
  0.6875,
  0.13861386138613863,
  0.6473988439306358,
  -1.098901098901099,
  -0.8975609756097561,
  0.034482758620689655],
 [-0.014388489208633094,
  0.12727272727272726,
  0.4943820224719101,
  -0.7175572519083969,
  -0.4375,
  0.7722772277227723,
  0.6936416184971098,
  1.934065934065934,
  -0.8390243902439024,
  -0.9770114942528736],
 [-0.5323741007194245,
  -1.0,
  0.5393258426966292,
  -0.1984732824427481,
  5.1875,
  -1.4851485148514851,
  0.0,
  -1.4505494505494505,
  0.2731707317073171,
  -1.3448275862068966],
 [-0.014388489208633094,
  -1.1090909090909091,
  -0.1348314606741573,
  0.4732824427480916,
  0.1875,
  -0.13861386138613863,
  -1.2716763005780347,
  0.3956043956043956,
  0.50731707317