# Descriptive statistics in Python

## Distribution of a single variable

### Measures of central tendency and positional measures

- arithmetic mean
- median
- geometric mean (useful for calculation of average rate of growth)
- trimmed mean
- positional measures - quantiles: quartiles, deciles, percentiles

### Measures of variability / dispersion

- variance, standard deviation
- IQR (interquartile range)
- mean absolute difference
- coefficient of variation: standard deviation divided by the mean

### Measures of shape of the distribution

- skewness (measure of asymmetry)
- kurtosis (measure of tailedness, extremity of tails)


In [4]:
import pandas as pd

csv_url = "https://docs.google.com/spreadsheets/d/1H6b5mkq68MeRQyP0Cr2weCpVkzmpR0c2Oi7p147o2a0/export?format=csv"

# Read the sheet into a DataFrame
d = pd.read_csv(csv_url)

print(d.head())

   height  handedness  right_hand_span  left_hand_span  head_circ eye_colour  \
0     159        0.88             19.0            19.0       54.0       Blue   
1     160       -1.00             19.0            20.0       57.0      Green   
2     161        0.79             17.0            16.5       57.0      hazel   
3     162        0.79             16.0            16.0       57.0       gray   
4     162        0.79             16.0            16.0       54.0      Brown   

   gender  siblings  movies  soda   bedtime       fb_freq  fb_friends  \
0  Female         2     3.0   7.0  02:00:00    once a day       135.0   
1  Female         2     0.5   2.0  04:30:00             0         1.0   
2  Female         3     3.0   2.0  23:50:00   once a week       354.0   
3  Female         2     0.0   2.0  23:10:00  almost never       192.0   
4  Female         2     1.0   3.0  00:00:00         never         1.0   

                  stat_likert  
0  Neither agree nor disagree  
1              S

In [9]:
import numpy as np
# arithmetic mean
print("Average height: ")
print(np.mean(d['height']))
print("Average number of facebook connections: ")
print(np.mean(d['fb_friends']))

Average height: 
176.21666666666667
Average number of facebook connections: 
289.9642857142857


In [14]:
# median
print("Median height: ")
print(np.median(d['height']))
print("Median number of Facebook connections: ")
print(np.median(d['fb_friends']))
print("Median number of Facebook connections (NAs excluded): ")
print(np.nanmedian(d['fb_friends']))
print("Median number of Facebook connections (NAs excluded -- method 2): ")
print(np.median(d['fb_friends'].dropna()))


Median height: 
176.0
Median number of Facebook connections: 
nan
Median number of Facebook connections (NAs excluded): 
222.0
Median number of Facebook connections (NAs excluded -- method 2): 
222.0


Geometric mean:

$$\left(\prod_{i=1}^n{x_i}\right)^{1/n}$$

Using (natural) logarithms ($\exp(x)$ means $e^x$):

$$\exp\left(\frac{1}{n}\left(\sum_{i=1}^n{\ln(x_i)}\right)\right)$$



In [25]:
print("Geometric mean for the height:")
print(np.exp(np.mean(np.log(d['height']))))
print("Geometric mean for the number of facebook friends (only positive numbers can go to the geometric mean):")
print(np.exp(np.mean(np.log(d['fb_friends'][d['fb_friends']>0].dropna()))))

Geometric mean for the height:
175.92789854951246
Geometric mean for the number of facebook friends (only positive numbers can go to the geometric mean):
127.79432124392397
