<a href="https://colab.research.google.com/github/Auburngrads/colab_projects/blob/master/dasc_descriptive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Descriptive Statistics with Python

Import required libraries into our workspace

In [0]:
import statistics as st
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
import pandas as pd
import scipy as sp

### Measures of Central Tendency - **mean**

Mean is computed as

$$
\bar{x} = \frac{\sum_{i = 1}^{N}x_i}{N} = \frac{x_1 + x_2 + \cdots + x_N}{N}.
$$

Compute the mean of an array

In [2]:
nums = [-2,-4,1,2,3,5,7,9]
st.mean(nums)

2.625

Compute the mean of a dictionary

In [3]:
Dict = {1:"one",2:"two",3:"three"}

Dict

st.mean(Dict)

2

### Measures of Central Tendency - **median**

Compute the median of an array

In [0]:
nums = [-2,-4,1,2,3,5,7,9]
st.median(nums)

Compute the median of a dictionary

In [0]:
Dict = {1:"one",2:"two",3:"three"}

st.median(Dict)

### Measures of Central Tendency - **mode**

Compute the mode of an array

In [4]:
nums = [-2,-4,1,2,3,5,7,9]
st.mode(nums)

StatisticsError: ignored

### Measures of variation - **range**

Compute the range of an array

In [0]:
max(nums) - min(nums)

### Measures of variation - **variance**

Compute the variance of an array

In [0]:
st.variance(nums)

Compute the variance of a dictionary

In [0]:
st.variance(Dict)

### Measures of variation - **standard deviation**

Compute the standard deviation of an array

In [0]:
st.stdev(nums)

Compute the standard deviation of a dictionary

In [0]:
st.stdev(Dict)

# Generating numeric and visual data summaries

Creating data sets as `numpy` arrays

In [0]:
N_obs = 4000

normal = np.random.normal(loc = 1, scale = 10000, size = N_obs)

lognormal = np.random.lognormal(mean = 10, sigma = .75, size = N_obs)

Combining the arrays into a dictionary of key:value pairs

In [0]:
d = {'normal': normal, 'lognormal': lognormal}
d

Converting the dictionary into a `pandas` DataFrame

In [0]:
df = pd.DataFrame(data = d)

Viewing the first 10 rows of `df`

df.head(10)

### Generating a histogram of normal observations

In [0]:
# Settings
sb.set_style("whitegrid")

# Create histogram
plot = sb.distplot(df['normal'], 
                   kde = False, 
                   bins = int(N_obs / 10),
                   axlabel = "x NOR(0,1)")

# add vertical line showing the location of the mean, median 
plot = plt.axvline(df['normal'].mean(),   0,1, color = 'red')
plot = plt.axvline(df['normal'].median(), 0,1, color = 'orange')

plt.show(plot)

### Creating a histogram of lognormal observations

In [0]:
# Create histogram
plot = sb.distplot(df['lognormal'], 
                   kde = False, 
                   bins = int(N_obs / 10),
                   axlabel = "x LOGNOR(10,0.75)")

# add vertical line showing the location of the mean, median 
plot = plt.axvline(df['lognormal'].mean(),   0,1, color = 'red')
plot = plt.axvline(df['lognormal'].median(), 0,1, color = 'orange')

plt.show(plot)

### Creating a boxplot comparing the normal and lognormal data

In [0]:
plot = sb.boxplot(data = df)

plt.show(plot)

### Creating a jointplot showing the relationship between the normal and lognormal observations

In [0]:
sb.jointplot("normal","lognormal", data = df)

plt.show()

### Creating a jointplot showing the relationship between the lognormal observations at itself

In [0]:
plot = sb.jointplot("lognormal","lognormal", data = df)

plt.show(plot)

### Covariance and Correlation

Compute the covariances for the DataFrame `df` we created earlier we can use the `cov()` function from the pandas library

In [0]:
cov1 = df.cov()
cov1

Compute the correlations for the DataFrame `df` we created earlier we can use the `corr()` function from the pandas library

In [0]:
df.corr()

Compute the covariance of `df` without relying on `pandas`

In [0]:
X = df['normal']
Y = df['lognormal']
X_diff = X - st.mean(X)
Y_diff = Y - st.mean(Y)
prod = X_diff * Y_diff

cov2 = sum(prod) / (len(X) - 1)

cov1['lognormal'][0] - cov2