## **Univariate Statistical Measures**

In [45]:
import numpy as np
import pandas as pd

In [46]:
data=pd.DataFrame({'age':[1,2,3,4,5]})
data

Unnamed: 0,age
0,1
1,2
2,3
3,4
4,5


## **1. Measures of central tendency**

### *1.1 Mean*
- Average
- It is represented by, **Σx** (if column name is x)
- mean = sum of values/ total no. of values
- used with continous variable only
- population mean => µ => **µx = Σx / N** (if column name is x)
- sample mean => **x̅ = Σx / n** (if column name is x)

In [47]:
mean = data['age'].mean()
print(mean)

3.0


In [48]:
# mean() working
mean=data['age'].sum()/len(data)
print(mean)
                # if column have any null or empty value then it will be ignored.

3.0


### *1.2 Median*
- Center value
- In odd nos, center position value.
- In even nos, average of two center values

In [49]:
median = data['age'].median()
print(median)   # column will be sorted first then median will applied.
                # if column have any null or empty value then it will be ignored.

3.0


### *1.3 Mode*
- most frequent value
- the value which is repeated most number of time.
- if column have 1 repeated value then it is called uni-modal data
- if column have 2 repeated values then it is called bi-variate data
- if column have more than 2 repated values then it is called multi-modal data

In [50]:
mode = data['age'].mode()
print(mode) # it is multi-modal data

0    1
1    2
2    3
3    4
4    5
Name: age, dtype: int64


In [51]:
data['age'].mode()[1]

np.int64(2)

## **2. Measure of Dispression / Spread**

### *2.1 Minimum*

In [52]:
print(data['age'].min())

1


### *2.2 Maximum*

In [53]:
print(data['age'].max())

5


### *2.3 Range*
- max value - min value

In [54]:
print(data['age'].max() - data['age'].min())

4


### *2.4 Deviation*
- how much the value differ from the mean value
- less the deviation better the data
- when deviation is less prediction is easy
- represented as, **(xi - x̅)** here xi is data points of data and x̅ is mean

In [None]:
print(data['age']-data['age'].mean())

0   -2.0
1   -1.0
2    0.0
3    1.0
4    2.0
Name: age, dtype: float64


### *2.5 Mean Deviation*
- represented as Σ(xi - x̅)/ N
- everytime the sum of deviation will always be zero, because the mean is the balance or center point and from there will be equal no of +ve and -ve values.
- to overcome this problem we have absolute or square deviation.

In [61]:
a= data['age']-data['age'].mean()
b=a.mean().sum()

print(b)

0.0


### *2.6 Absolute Deviation*
- represented as |xi - x̅|
- while considering the positive value only check how much the value differ from the mean

In [62]:
a=data['age']-data['age'].mean()
b=a.abs()
print(b)

0    2.0
1    1.0
2    0.0
3    1.0
4    2.0
Name: age, dtype: float64


### *2.7 Mean Ansolute Deviation*
- respresnted as, Σ|x - x̅| / N
- It is the average of absolute deviation from center or mean value.

In [74]:
a=data['age'] - data['age'].mean()
b=a.abs()
c=b.mean()
print(c)

1.2


### *2.8 Square Deviation*
- represented as, (x - x̅)²
- It is the Square of difference between the data points.

In [78]:
a=data['age'] - data['age'].mean()
b=a**2
print(b)

0    4.0
1    1.0
2    0.0
3    1.0
4    4.0
Name: age, dtype: float64


### *2.9 Mean Square Deviation or Variance*
- represented as, 
    - population variance -> **𝞼² = Σ(x - µ)²/ N**
    - sample variance  ->  **s² = Σ(x - x̅)²/ N-1**
- It is the average of square of the difference between the datapoints.

In [79]:
# population varianve

print(data['age'].var(ddof=0))

2.0


In [81]:
# sample variance

print(data['age'].var(ddof=1))

2.5


### *2.9 Standard Deviation or Root Mean Square Deviation*
- represented as, 
    - population SD -> **𝞼² = √ Σ(x - µ)²/ N**
    - sample SD  ->  **s² = √ Σ(x - x̅)²/ N-1**

In [83]:
# population SD

print(data['age'].std(ddof=0))

1.4142135623730951


In [84]:
# sample SD

print(data['age'].std(ddof=1))

1.5811388300841898
