# Scales of Measurement

Nominal = Categories that do not have a natural order. Ex: blood type, zip code, race

Ordinal = Categories where order matters but the difference between them is neither clear nor even. Ex: satisfaction scores, happiness level from 1 - 10

Interval = There is order and the difference between two values is meaningful. Ex: Temperature (Celsius and Fahrenheit), credit scores, pH

Ratio = The same as interval except it has a concept of 0. There are no negative numbers. Ex: concentration, Kelvin, weight

# Statistics in Python

Central Tendency - represents the center point or "typical" value of a dataset.

As a result, our rule of thumb is to replace null values with the mean when the data is normally distributed and replace null values with the median when the data is skewed.

In [2]:
import pandas as pd

df_clothes = pd.DataFrame(data = [['blue', 'S', 20, 'lemon shirt'], ['blue', 'M', 50, 'jeans'], ['beige', 'M', 7, 'bucket hat']])

df_clothes.columns = ['color', 'size', 'price', 'type']
df_clothes

Unnamed: 0,color,size,price,type
0,blue,S,20,lemon shirt
1,blue,M,50,jeans
2,beige,M,7,bucket hat


In [4]:
# Can convert categorical values to numerical
# Size is ordinal, so it can be easily encoded

size_map = {'S' : 1, 'M' : 2, 'L' : 3}
df_clothes['size'] = df_clothes['size'].map(size_map)
df_clothes 

Unnamed: 0,color,size,price,type
0,blue,1,20,lemon shirt
1,blue,2,50,jeans
2,beige,2,7,bucket hat


In [6]:
%pip install scikit-learn

Note: you may need to restart the kernel to use updated packages.


In [10]:
# Very good for nominal values
# where we want a number so it processes faster
# but the number doesn't mean anything
from sklearn.preprocessing import LabelEncoder

class_labels = LabelEncoder()

df_clothes['types'] = class_labels.fit_transform(df_clothes['type'].values)
df_clothes

Unnamed: 0,color,size,price,type,types
0,blue,1,20,lemon shirt,2
1,blue,2,50,jeans,1
2,beige,2,7,bucket hat,0


In [11]:
# One hot encoding - easiest way
# This doesn't work if you have data that is already numerical

df_clothes = pd.get_dummies(df_clothes[['color', 'size', 'price']])
df_clothes

Unnamed: 0,size,price,color_beige,color_blue
0,1,20,False,True
1,2,50,False,True
2,2,7,True,False
