## **Univariate Statistical Measures**

In [1]:
import numpy as np
import pandas as pd

In [2]:
data=pd.DataFrame({'age':[1,2,3,4,5]})
data

Unnamed: 0,age
0,1
1,2
2,3
3,4
4,5


In [3]:
len(data['age'])

5

## **1. Measures of central tendency**

### *1.1 Mean*
- Average
- It is represented by, **Σx** (if column name is x)
- mean = sum of values/ total no. of values
- used with continous variable only
- population mean => µ => **µx = Σx / N** (if column name is x)
- sample mean => **x̅ = Σx / n** (if column name is x)

In [4]:
mean = data['age'].mean()
print(mean)

3.0


In [5]:
# mean() working
mean=data['age'].sum()/len(data)
print(mean)
                # if column have any null or empty value then it will be ignored.

3.0


### *1.2 Median*
- Center value
- In odd nos, center position value.
- In even nos, average of two center values
- The data should be sorted in accending order first 

In [6]:
median = data['age'].median()
print(median)   # column will be sorted first then median will applied.
                # if column have any null or empty value then it will be ignored.

3.0


### *1.3 Mode*
- most frequent value
- the value which is repeated most number of time.
- if column have 1 repeated value then it is called uni-modal data
- if column have 2 repeated values then it is called bi-variate data
- if column have more than 2 repated values then it is called multi-modal data

In [7]:
mode = data['age'].mode()
print(mode) # it is multi-modal data

0    1
1    2
2    3
3    4
4    5
Name: age, dtype: int64


In [8]:
data['age'].mode()[1]

np.int64(2)

## **2. Measure of Dispression / Spread**

### *2.1 Minimum*

In [9]:
print(data['age'].min())

1


### *2.2 Maximum*

In [10]:
print(data['age'].max())

5


### *2.3 Range*
- max value - min value

In [11]:
print(data['age'].max() - data['age'].min())

4


### *2.4 Deviation*
- how much the value differ from the mean value
- less the deviation better the data
- when deviation is less prediction is easy
- represented as, **(xi - x̅)** here xi is data points of data and x̅ is mean

In [12]:
print(data['age']-data['age'].mean())

0   -2.0
1   -1.0
2    0.0
3    1.0
4    2.0
Name: age, dtype: float64


### *2.5 Mean Deviation*
- represented as Σ(xi - x̅)/ N
- everytime the sum of deviation will always be zero, because the mean is the balance or center point and from there will be equal no of +ve and -ve values.
- to overcome this problem we have absolute or square deviation.

In [13]:
a= data['age']-data['age'].mean()
b=a.mean().sum()

print(b)

0.0


### *2.6 Absolute Deviation*
- represented as |xi - x̅|
- while considering the positive value only check how much the value differ from the mean

In [14]:
a=data['age']-data['age'].mean()
b=a.abs()
print(b)

0    2.0
1    1.0
2    0.0
3    1.0
4    2.0
Name: age, dtype: float64


### *2.7 Mean Ansolute Deviation*
- respresnted as, Σ|x - x̅| / N
- It is the average of absolute deviation from center or mean value.

In [15]:
a=data['age'] - data['age'].mean()
b=a.abs()
c=b.mean()
print(c)

1.2


### *2.8 Square Deviation*
- represented as, (x - x̅)²
- It is the Square of difference between the data points.

In [16]:
a=data['age'] - data['age'].mean()
b=a**2
print(b)

0    4.0
1    1.0
2    0.0
3    1.0
4    4.0
Name: age, dtype: float64


### *2.9 Mean Square Deviation or Variance*
- represented as, 
    - population variance -> **𝞼² = Σ(x - µ)²/ N**
    - sample variance  ->  **s² = Σ(x - x̅)²/ N-1**
- It is the average of square of the difference between the datapoints.

In [17]:
# population varianve

print(data['age'].var(ddof=0))

2.0


In [18]:
# sample variance

print(data['age'].var(ddof=1))

2.5


### *2.9 Standard Deviation or Root Mean Square Deviation*
- represented as, 
    - population SD -> **𝞼² = √ Σ(x - µ)²/ N**
    - sample SD  ->  **s² = √ Σ(x - x̅)²/ N-1**

In [19]:
print(data['age'].std())    # by deadult use sample SD

1.5811388300841898


In [20]:
# population SD

print(data['age'].std(ddof=0))

1.4142135623730951


In [21]:
# sample SD

print(data['age'].std(ddof=1))

1.5811388300841898


### *2.10 Percentile*
- it means how much data are below yours.
- in rank wise it tell us, what percent of people scored lower than you.
- A lower rank = higher percentile rank (Rank 1 = 99th percentile in most cases)
- the data should be sort in accending order before doing percentile.

- calculate as,
    - if you have all data	-->  **(no. of value less than your value / total no. of values) * 100**
    - if you have rank and total people	-->  **(total values - your value / total values) * 100**

### *2.11 Quartiles*
- for a given rank,
    - **min** or 0 percentile -- means 0% of data/people are below to it.
    - **Q1** or 25 percentile -- means 25% of data/people are below to it.
    - **Q2** or **median** or 50 percentile -- means 50% of data/people are below to it.
    - **Q3** or 75 percentile -- means 75% of data/people are below to it.
    - **max** or 100 percentile -- means 100% of data/people are below to it.

    - To reperesent the five no. summary we use **boxplot**

In [22]:
# 0 percentile or minimum

print(data['age'].quantile(0))

1.0


In [23]:
# 25 percentile or Q1

print(data['age'].quantile(0.25))

2.0


In [24]:
# 50 percentile or Q2

print(data['age'].quantile(0.5))

3.0


In [25]:
# 75 percentile or Q3

print(data['age'].quantile(0.75))

4.0


In [26]:
# 100 percentile or max

print(data['age'].quantile(1))

5.0


In [27]:
print(data['age'].describe())

count    5.000000
mean     3.000000
std      1.581139
min      1.000000
25%      2.000000
50%      3.000000
75%      4.000000
max      5.000000
Name: age, dtype: float64


In [28]:
# 5 no.s summary

print(data['age'].quantile([0,0.25,0.5,0.75,1]))

0.00    1.0
0.25    2.0
0.50    3.0
0.75    4.0
1.00    5.0
Name: age, dtype: float64


### *2.12 Inter Quantile Range (IQR)*
- Calculated as, **Q3-Q1**

In [29]:
Q1=data['age'].quantile(0.25)
Q3=data['age'].quantile(0.75)
IQR=Q3-Q1
print(IQR)

2.0


### *2.13 Lower Limit*
- Calculated as, **Q1-(IQR * 1.5)**

In [30]:
print(data['age'].quantile(0.25)-(data['age'].quantile(0.75)-data['age'].quantile(0.25)*1.5))

1.0


### *2.14 Upper limit*
- Calculated as, **Q3+(IQR * 1.5)**

-------------------------------------------------------------------------------------------------------------------------------------------------

In [31]:
print(data['age'].quantile(0.75)-(data['age'].quantile(0.75))-data['age'].quantile(0.25)*1.5)

-3.0


## **3. Measures of Shape**