# Measures of Variability

The measures of dispersion give a quantitative measure of the spread of a distribution. They provide an idea of whether the values in a distribution are situated around the central value or spread out. The following are the commonly used measures of
dispersion.

**Range:** The range is a measure of the difference between the lowest and highest values in a dataset.  

**Interquartile range:** A measure of the difference between the third quartile and the first quartile. This measure is less affected by extreme values since it focuses on the values lying in the middle. The interquartile range is a good measure for skewed distributions that have outliers. The interquartile range is denoted by  $IQR = Q_3 - Q_1.$

**Variance:** This is a measure of how much values in a dataset are scattered around the mean value. The value of the variance is a good indication of whether the mean is representative of values in the dataset. A small variance would indicate that the mean is an appropriate measure of central tendency. The following formula gives the variance:

    
$$\sigma^2 = \frac{\sum (x-\mu)^2}{N}$$
    
Where $\mu$ is the mean, and $N$ is the number of values in the dataset.

**Standard deviation:** This measure is calculated by taking the square root of the variance. The variance is not in the same units as the data since it takes the square of the differences; hence taking the square root of the variance brings it to the same units as the data. For instance, in a dataset about the average rainfall in centimeters, the variance would give the value in $cm^2$, which would not be interpretable, while the standard deviation in $cm$ would give an idea of the average rainfall deviation in centimeters.

**Skewness:** This measures the degree of asymmetry of a distribution

![](https://upload.wikimedia.org/wikipedia/commons/c/cc/Relationship_between_mean_and_median_under_different_skewness.png)

**Positive Skewness:** A positively skewed distribution is characterized by many outliers in the upper region, or right tail. A positively skewed distribution is said to be skewed right because of its relatively long upper (right) tail.

**Negative Skewness:** A negatively skewed distribution has a disproportionately large amount of outliers that fall within its lower (left) tail. A negatively skewed distribution is said to be skewed left because of its long lower tail.

**Kurtosis:**  Kurtosis is a measure of whether a given distribution of data is curved, peaked, or flat.

![](https://brewcode.stringlab.org/wp-content/uploads/2020/08/image-28.png)

In [4]:
import pandas as pd
data = pd.Series([19,23,19,18,25,16,17,19,15,23,21,23,21,11,6])

In [5]:
data.describe()

count    15.000000
mean     18.400000
std       4.997142
min       6.000000
25%      16.500000
50%      19.000000
75%      22.000000
max      25.000000
dtype: float64

In [6]:
data.mode()

0    19
1    23
dtype: int64

The values 19 and 23 are the most frequently occurring values

In [7]:
data.median()

19.0

In [11]:
range_data = max(data)-min(data)
range_data

19

In [12]:
data.std()

4.99714204034952

In [13]:
data.var()

24.97142857142857

In [17]:
from scipy.stats import skew, kurtosis

skew(data), kurtosis(data)

(-1.038344732097918, 0.6995494033062934)

**Points to note:**  
1. The mean value is affected by outliers (extreme values). Whenever there are outliers in a dataset, it is better to use the median.
2. The standard deviation and variance are closely tied to the mean. Thus, if there are outliers, standard deviation and variance may not be representative measures too.
3. The mode is generally used for discrete data since there can be more than one modal value for continuous data.