In statistics, standardization is the process of putting different variables on the same scale. This process allows you to compare scores between different types of variables. Typically, to standardize variables, you calculate the mean and standard deviation for a variable. Then, for each observed value of the variable, you subtract the mean and divide by the standard deviation. This process produces standard scores that represent the number of standard deviations above or below the mean that a specific observation falls. For instance, a standardized value of 2 indicates that the observation falls 2 standard deviations above the mean. This interpretation is true regardless of the type of variable that you standardize.

In [3]:
import numpy as np
import pandas as pd

In [4]:
np.random.seed(100)
df = pd.DataFrame()
df['income'] = np.random.normal(50000, scale=10000, size=100)
df['age'] = np.random.normal(40, scale=10, size=100)
df = df.astype(int)

In [6]:
df.head()

Unnamed: 0,income,age
0,32502,22
1,53426,28
2,61530,10
3,47475,40
4,59813,37


In [8]:
df.std() #using std() function to find the standard deviation of the given Series object.

income    9746.405471
age         10.646624
dtype: float64

In [10]:
df.mean() #using mean() function to find the mean of all the observations over the index axis.

income    48957.85
age          38.77
dtype: float64

In [12]:
#standardization process for z_income, z_age

df['z_income'] = (df['income'] - df['income'].mean())/df['income'].std()
df['z_age'] = (df['age'] - df['age'].mean())/df['age'].std()

In [13]:
df.head()

Unnamed: 0,income,age,z_income,z_age
0,32502,22,-1.688402,-1.575147
1,53426,28,0.458441,-1.011588
2,61530,10,1.289927,-2.702265
3,47475,40,-0.152143,0.11553
4,59813,37,1.113759,-0.16625


In [14]:
df.std()

income      9746.405471
age           10.646624
z_income       1.000000
z_age          1.000000
dtype: float64

In [15]:
df.mean()

income      4.895785e+04
age         3.877000e+01
z_income    1.376677e-16
z_age      -3.219647e-16
dtype: float64