# Descriptive statistics

> Descriptive statistics is a branch of statistics that focuses on summarizing and describing a dataset or a sample.

In this lesson, we will use "Tips" dataset ([data](https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv), [source](https://rdrr.io/cran/reshape2/man/tips.html)).

In [3]:
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv')

## Mean

"Mean" = "average"

$$\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i$$

In [4]:
df['total_bill'].mean() # average amount of $ in bill

19.78594262295082

## Median

Mean is not an edequate measure sometime, especially when we have a couple of outliers. Median might better represent the central tendecy in data.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Finding_the_median.png/2880px-Finding_the_median.png" width=300>

In [5]:
df['total_bill'].median()

17.795

## Variance and standard deviation

Variance is a measure of how "far away" the values in a dataset are from the mean.

$$\text{Var}(X) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2$$

Variance is a bit unintuitive becuase of the squared difference, e.g. if the data is in meters $m$, the variance will be in $m^2$. To "bring back" the unit of measure, we have standrd deviation, which is simply a square root of variance:

$$ \sigma = \sqrt{\text{Var}(X)} $$

In [6]:
df['total_bill'].var() # in $^2

79.25293861397827

In [7]:
df['total_bill'].std() # in $

8.902411954856856

## Percentile and quantile

A percentile is a measure that indicates the relative position of a particular value in a dataset. It tells us what percentage of the data falls below or equals that value.

Image 100 data points sorted in a row. To get p75 (75th pecentile), we need to find the value of 75th data point. Example with 10 data points:

![percentile](./percentiles.svg)

1st Quantile = q1 = p25 \
2nd Quantile = q2 = p50 = median \
3rd Quantile = q3 = p75


In [8]:
df['total_bill'].quantile(0.75) # naming can be confusing, here "quantile" = "percentile"

24.127499999999998