# Descriptive Statistics in Python


At its core, statistics is a branch of Mathematics that is about counting, measuring and interpreting data.

Merriam-Webster dictionary defines statistics as 
> "a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data"

Two main statistical methods are used in data analysis: 

1. Descriptive statistics: usually summarizes data from a sample using measures such as the mean or standard deviation

2. Inferential statistics: which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).

In this blogpost, I will cover descriptive statistics using python.

Descriptive statistics are are most often concerned with two sets of properties of a distribution of data, central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other.


## Averages and measures of central location

Measures of center are statistical numbers that give us a sense of the "middle" or "typical" of a numeric variable. Common measures of center include the mean, median and mode.

Below is a list of functions in python that can be used as measures of central locations using the [statistics module](https://docs.python.org/3/library/statistics.html).   

- mean( ): Arithmetic mean (“average”) of data.

- harmonic_mean( ): It is the reciprocal of the arithmetic mean of the reciprocals of the data (say for three numbers a,b and c, 1/mean = 3/(1/a + 1/b + 1/c))

- median( ): Median or middle value of data. The median is a robust measure of central location, and is less affected by the presence of outliers in your data compared to the mean. When the number of data points is odd, the middle data point is returned.

- median_low( ):	Low median of data.
- median_high( ): High median of data.
- median_grouped( ): Median, or 50th percentile, of grouped data.
- mode( ): Mode (most common value) of discrete data.



In [10]:
# Importing relevant modules

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statistics
%matplotlib inline 

plt.style.use('ggplot')
plt.rcParams.update({'font.size': 16})

In [20]:
myData = [1, 2, 3, 4, 4, 7, 8, 10]

# Mean
print("mean = ", statistics.mean(myData))

# Harmonic Mean
print("Harmonic mean = ", statistics.harmonic_mean(myData))

# Median
print("median = ",statistics.median(myData))

# Mode
print("mode = ",statistics.mode(myData))


mean =  4.875
Harmonic mean =  2.9616571176729836
median =  4.0
mode =  4


In [None]:
# 