# Variance

Variance is a statistical measure that quantifies the spread or dispersion of a set of data points around their mean or average. It provides a numerical value that describes the degree to which individual data points in a dataset differ from the mean

$$\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2$$

# Dataset

In [1]:
# Importing module
import numpy as np

# Creating synthetic data (1000 samples) that follows normal distribution with mean:0 and std:4
data = np.random.normal(0, 4, 1000).round(0).astype(int)

# Calculate Variance using User Defined Function

In [2]:
# User defined function for calculating variance
def variance(data):
    n = len(data) # No of observations
    mean = np.mean(data) # Mean of all the observations
    diffs = [(i-mean)**2 for i in data] # Square difference of mean and each observations
    sum_diffs = sum(diffs) # Sum of square difference
    var = sum_diffs/n # Variance
    return var

In [3]:
# Calculate variance
var = variance(data)

var

16.689483999999975

# Calculate Variance using Numpy's Var Function

In [4]:
# Importing numpy
import numpy as np

# Calculate variance
var = np.var(data)

var

16.689483999999997

# Comparing Data using Variance

In [5]:
# Temp data1
temp_data1 = [100, 200, 300, 400]

# Calculate variance for temp_data1
temp_var1 = np.var(temp_data1)

temp_var1

12500.0

In [6]:
# Temp data2
temp_data2 = [900, 1200, 1500, 1800]

# Calculate variance for temp data2
temp_var2 = np.var(temp_data2)

temp_var2

112500.0

# Limitations
- Less intuitive
- Sensitive to outliers

In [7]:
# New data with outlier 
new_data = np.array(list(data)+[50])

var = np.var(new_data)

var

19.143328200271256

# Standard Deviation

Standard deviation is often preferred in practical applications due to its ease of interpretation and direct relationship to the original units of the data.

$$ \sigma = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2 } $$

OR 

$$ \sigma = \sqrt{\sigma^2} $$

In [8]:
# Importing module
import numpy as np
from math import sqrt

# Calculating variance
var = np.var(data)

# Calculating std
std = sqrt(var)

std

4.085276490030998