# Distributions

__Purpose:__ The purpose of this lecture is to offer a brief overview of distributions. 

__At the end of this lecture you will be able to:__
> 1. Understand the characteristics of some common distributions such as Normal, Binomial, Uniform, and Exponential 

In [None]:
import numpy as np
import pandas as pd
from scipy import stats
import seaborn as sns
import math 
import random
import matplotlib.pyplot as plt
%matplotlib inline

### 1.1.1 Theoretical vs. Empirical Probability Distributions:

__Overview:__ 
-  Using the NBA data, we were able to plot Probability Distributions. These distributions are known as __Empirical Distributions__ since they are observed empirically through a Random Experiment 
- We can contrast __Empirical Distributions__ with __Theoretical Distributions__ which are "special" cases of Probability Distributions that have defined characteristics including: 
> 1. Parameters that define the Probability Density Function 
> 1. Formula for Probability Density Function 
> 2. Formula for Expected Value (Central Tendency) 
> 3. Formula for Variance (Dispersion) 
- One of the primary goals of Statistics is to characterize/assume a given set of univariate data follows one of the many Theoretical Probability Distributions. There are many consequences of assuming data fits one of the Probability Distributions (for example, assuming data fits a Theoretical Normal Distribution allows us to perform Linear Regression with the data) 

__Helpful Points:__
1. In the real world, data does not fit the Theoretical Probability Distributions exactly, but they will be "close enough." There are tests that you can use to test whether a series of data fits a specific Probability Distribution with some level of certainty 

### 1.1.2 Common Theoretical Probability Distributions:

__Overview:__ 
- There are many Probability Distributions that are used commonly in Statistical Analysis. However, we will highlight the four most common Distributions here: 
> 1. __[Normal Distribution](https://en.wikipedia.org/wiki/Normal_distribution):__ Normal Distribution (Gaussian Distribution) is the most common Probability Distribution and many sets of real world data follows this type of distribution. Another interesting fact about the Normal Distribution is that as data becomes larger and larger, their Probability Distribution will look more like the Normal Distribution. The characteristics of a Normal Distribution include: 
>> a. __Type:__ Normal Distribution is Continuous <br> 
>> b. __Parameters:__ Normal Distribution is characterized by two parameters: (1) Mean ($\mu$) and (2) Variance ($\sigma^2$)<br>
>> c. __Probability Density Function:__ The PDF of a Normal Distribution is: <br> 
<center> $f(x \mid \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^-{\frac{(x - \mu)^2}{2\sigma^2}}$ </center> 

>> d. __Expected Value:__ $E[X] = \mu$<br>
>> e. __Variance:__ $V[X] = \sigma^2$<br>
>> f. __Examples:__ Height of men in the United States 

> 2. __[Binomial Distribution](https://en.wikipedia.org/wiki/Binomial_distribution):__ Binomial Distribution describes data that comes from repeated __[Bernoulli Trials](https://en.wikipedia.org/wiki/Bernoulli_trial)__ which are just experiments that can only take on a "1" or "0". In total, the Binomial Distribution describes the number of trials ($n$) where each trial has $p$ Probability of success. The characteristics of a Binomial Distribution include: 
>> a. __Type:__ Binomial Distribution is Discrete <br> 
>> b. __Parameters:__ Binomial Distribution is characterized by two parameters: (1) Number of Trials ($n$) and (2) Probability of Success ($p$)<br>
>> c. __Probability Density Function:__ The PMF of a Binomial Distribution is: <br> 
<center> $n \choose k$$p^k(1 - p)^{n-k}$ </center> 

>> d. __Expected Value:__ $E[X] = np$<br>
>> e. __Variance:__ $V[X] = np(1 - p)$<br>
>> f. __Examples:__ Tossing a coin 10 ($n$) times, where each coin toss is a Bernoulli trial and the coin has probability $p$ of landing on heads. 

> 3. __[Uniform Distribution](https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)):__ Uniform Distribution describes data where each possible value has the same probability of occurring. The characteristics of a Uniform Distribution include: 
>> a. __Type:__ Uniform Distribution is Continuous <br> 
>> b. __Parameters:__ Uniform Distribution is characterized by two parameters: (1) Left Bound ($a$) and (2) Right Bound ($b$)<br>
>> c. __Probability Density Function:__ The PDF of a Uniform Distribution is: <br> 
<center> $\frac{1}{(b - a)}$ for $x \in [a, b]$ and 0, otherwise</center>

>> d. __Expected Value:__ $E[X] = \frac{1}{2}(a + b)$<br>
>> e. __Variance:__ $V[X] = \frac{1}{12}(b - a)^2$<br>
>> f. __Examples:__ Random sampling from a data set, where each data point as the same chance of being selected  

> 4. __[Exponential Distribution](https://en.wikipedia.org/wiki/Exponential_distribution):__ Exponential Distribution typically describes the time in between events occurring. Exponential Distribution is considered "memoryless" and therefore the history of the distribution has no bearing on predicting what will happen next. The characteristics of an Exponential Distribution include: 
>> a. __Type:__ Exponential Distribution is Continuous <br> 
>> b. __Parameters:__ Exponential Distribution is characterized by one parameter: Rate ($\lambda$)<br>
>> c. __Probability Density Function:__ The PDF of an Exponential Distribution is: <br> 
<center> $\lambda e^{-\lambda x}$</center>

>> d. __Expected Value:__ $E[X] = \lambda^{-1}$<br>
>> e. __Variance:__ $V[X] = \lambda^{-2}$<br>
>> f. __Examples:__ Time between visits to the Emergency Room in a hospital can be described by an Exponential Distribution. People may arrive to the Emergency Room according to a Poisson Process (not covered here), but the time between these patient arrivals is Exponentially Distributed. Moreover, if the average time between patients is 30 minutes and patients have come in during the morning hours at 9:00 am, 9:25 am, 9:26 am, 10:00 am, the next patient will still be expected to come in after 30 minutes since the distribution has no "memory" of the past. 

__Helpful Points:__ 
1. We can simulate a Random Variable based on any distribution which just means for any values of x, we plug into the probability density/mass function and observe the resulting probability 
2. Each of the Probability Distributions above have corresponding Cumulative Distribution Functions with associated properties, but these are not shown 
3. Each of the Theoretical Probability Distributions will be shown below 

__Practice:__ Examples of Common Theoretical Probability Distributions in Python 

### Part 1 (Normal Distribution):

### Example 1.1 (Probability Density Function - mean = 0 and var = 1):

In [None]:
# define the parameters of the distribution 
mu = 0 
sigma = 1
# define the number of samples 
num_samples = 100000

In [None]:
# simulate normal distribution 
norm_data = np.random.normal(mu, sigma, num_samples)
norm_data[:10]

The `np.random.normal()` function generates random numbers from a Normal Distribution with parameters equal to $\mu = 0$ and $\sigma = 1$

In [None]:
norm_data.mean() # almost 0 

In [None]:
norm_data.var() # almost 1 

In [None]:
# plot the probability distribution 
sns.distplot(norm_data)

This is a special case of a Normal Distribution called a __Standard Normal Distribution__ since it is centered at 0 with a dispersion of 1. 

### Example 1.2 (Probability Density Function - mean = -2,0,2 and var = 1):

In [None]:
# define the parameters of the distribution 
mu_1 = 0
mu_2 = -2
mu_3 = 2
sigma = 1
# define the number of samples 
num_samples = 100000

In [None]:
# simulate normal distribution 
norm_data_1 = np.random.normal(mu_1, sigma, num_samples)
norm_data_2 = np.random.normal(mu_2, sigma, num_samples)
norm_data_3 = np.random.normal(mu_3, sigma, num_samples)

In [None]:
# plot the probability distributions 
sns.distplot(norm_data_1)
sns.distplot(norm_data_2)
sns.distplot(norm_data_3)

We can see that by adjusting the mean of the distribution and keeping the variance constant, we are simply shifting the distribution right and left. 

### Example 1.3 (Probability Density Function - mean = 0 and var = 0.5, 1, 4):

In [None]:
# define the parameters of the distribution 
mu = 0
sigma_1 = 0.5
sigma_2 = 1
sigma_3 = 4
# define the number of samples 
num_samples = 100000

In [None]:
# simulate normal distribution 
norm_data_1 = np.random.normal(mu, sigma_1, num_samples)
norm_data_2 = np.random.normal(mu, sigma_2, num_samples)
norm_data_3 = np.random.normal(mu, sigma_3, num_samples)

In [None]:
# plot the probability distributions 
sns.distplot(norm_data_1)
sns.distplot(norm_data_2)
sns.distplot(norm_data_3)

We can see that by adjusting the variance of the distribution and keeping the mean constant, we are simply shrinking and expanding the distribution. 

### Example 1.4 (Cumulative Distribution  Function - mean = 0 and var = 1):

In [None]:
# define the parameters of the distribution 
mu = 0 
sigma = 1
# define the number of samples 
num_samples = 100000

In [None]:
# simulate normal distribution 
norm_data = np.random.normal(mu, sigma, num_samples)
norm_data[:10]

In [None]:
sns.distplot(norm_data, hist_kws={"cumulative":True},kde_kws={"cumulative":True})

### Part 2 (Binomial Distribution):

### Example 2.1 (Probability Mass Function - n = 100000 and p = 0.5):

In [None]:
# define the parameters of the distribution 
n = 100000 
p = 0.5
# define the number of samples 
num_samples = 200000

In [None]:
# simulate binomial distribution 
binom_data = np.random.binomial(n, p, num_samples)
binom_data[:10]

The `np.random.binomial()` function generates random numbers from a Binomial Distribution with parameters equal to $n = 100000$ and $p = 0.5$

In [None]:
binom_data.mean() # np = (100000)(0.5) = 50000

In [None]:
binom_data.var() # np(1-p) = ((100000)(0.5))(1 - 0.5) = roughly 25000

In [None]:
# plot the probability distribution 
sns.distplot(binom_data)

Notice how it practically looks normal. This is because we had a large number of trials and as the data becomes larger and larger, the distribution approaches normal. 

### Example 2.2 (Cumulative Distribution  Function - n = 100000 and p = 0.5):

In [None]:
sns.distplot(binom_data, hist_kws={"cumulative":True},kde_kws={"cumulative":True})

### Part 3 (Uniform Distribution):

### Example 3.1 (Probability Density Function - a = 0 and b = 1):

In [None]:
# define the parameters of the distribution 
a = 0 
b = 1
# define the number of samples 
num_samples = 100000

In [None]:
# simulate uniform distribution 
uniform_data = np.random.uniform(a, b, num_samples)
uniform_data[:10]

The `np.random.uniform()` function generates random numbers from a Uniform Distribution with parameters equal to $a = 0$ and $b = 1$

In [None]:
uniform_data.mean() # (a + b)/2 = (0 + 1)/2

In [None]:
uniform_data.var() # (b - a)^2/12 = (1 - 0)^2/12

In [None]:
# plot the probability distribution 
sns.distplot(uniform_data)

### Example 3.2 (Cumulative Distribution  Function - a = 0 and b = 1):

In [None]:
sns.distplot(uniform_data, hist_kws={"cumulative":True},kde_kws={"cumulative":True})

### Part 4 (Exponential Distribution):

### Example 4.1 (Probability Density Function - beta = 1/2 OR lambda = 2):

In [None]:
# define the parameters of the distribution 
beta = 0.5 
# define the number of samples 
num_samples = 100000

In [None]:
# simulate exponential distribution 
exp_data = np.random.exponential(scale = beta, size = num_samples)
exp_data[:10]

The `np.random.exponential()` function generates random numbers from an Exponential Distribution with parameters equal to $\beta = 0.5$ OR $\lambda = 2$

In [None]:
exp_data.mean() # mu = beta = 0.5

In [None]:
exp_data.var() # var = beta^2 = 0.25 

In [None]:
# plot the probability distribution 
sns.distplot(exp_data)

### Example 4.2 (Probability Density Function - beta = 0.5,3,5 OR lambda = 2, 1/3, 1/5):

In [None]:
# define the parameters of the distribution 
beta_1 = 3
beta_2 = 5
# define the number of samples 
num_samples = 100000

In [None]:
# simulate exponential distribution 
exp_data_1 = np.random.exponential(scale = beta_1, size = num_samples)
exp_data_2 = np.random.exponential(scale = beta_2, size = num_samples)

In [None]:
# plot the probability distributions 
sns.distplot(exp_data_1)
sns.distplot(exp_data_2)

### Example 4.3 (Cumulative Distribution  Function - beta = 0.5 OR lambda = 2):

In [None]:
# define the parameters of the distribution 
beta = 0.5 
# define the number of samples 
num_samples = 100000

In [None]:
# simulate exponential distribution 
exp_data = np.random.exponential(scale = beta, size = num_samples)
exp_data[:10]

In [None]:
sns.distplot(exp_data, hist_kws={"cumulative":True},kde_kws={"cumulative":True})

## 1.2 Bivariate Data:

### 1.2.1 What is Bivariate Data? 

__Overview:__ 
- So far, all of the examples have been using __Univariate Data__ which is simply one series (i.e. depicting one variable) 
- However, it is more common to have multiple series of data that we are interested in the relationship between 
- __Bivariate Data:__ Bivariate Data consists of data from two variables and each value from each variable is paired up with the other variable
> - The goal of Bivariate Data is to investigate the association or the relationship between the two variables. This can be accomplished by simply looking at a graph of the two variables or using statistics that are designed to measure the association (see next section) 
> - Examples of Bivariate Data from the NBA data set include Team's Field Goal Attempts and Team's Steals. We may be interested in knowing whether a team's steals is associated with the number of shots they attempts (i.e. steal the ball in their defensive end and go back the other way and make a shot) 

__Helpful Points:__
1. In some cases, we name each of the two variables as:
> __[Dependent Variable](https://en.wikipedia.org/wiki/Dependent_and_independent_variables#Statistics):__ The Dependent Variable represents the outcome or the variable that is being studied <br>
> __[Independent Variable](https://en.wikipedia.org/wiki/Dependent_and_independent_variables#Statistics):__ The Independent Variable represents the inputs or potential reasons for the outcome
2. The characterization of Bivariate Data into Dependent and Independent Variables is especially useful for Regression Analysis (not covered here) 

__Practice:__ Examples of Graphing Bivariate Data in Python 

In [None]:
# read in data to analyze 
nba_df = pd.read_csv("NBA_GameLog_2010_2017.csv")
tor_2016_2017 = nba_df.loc[(nba_df.loc[:, "Season"] == 2017) & (nba_df.loc[:, "Team"] == "TOR"), ]
tor_2016_2017_home = tor_2016_2017.loc[(tor_2016_2017.loc[:, "Home"] == 1), ]

### Example 1 (Graphing FG Attempts vs. Steals):

In [None]:
nba_df.columns

In [None]:
# scatter plot of the FGM % and steals 
plt.scatter(nba_df.loc[:, "Tm.FGM"], nba_df.loc[:, "Tm.STL"],alpha = 0.05)
plt.xlabel("Tm.FGM (%)")
plt.ylabel("Tm.STL")

### Example 2 (Graphing Tm Points vs. Attendance):

In [None]:
# scatter plot of the TM Points and Attendance 
plt.scatter(tor_2016_2017_home.loc[:, "Tm.Pts"], tor_2016_2017_home.loc[:, "Home.Attendance"])
plt.xlabel("Tm.Pts")
plt.ylabel("Home.Attendance")