 # Section 2.4 Probability Mass and Density Function

In [2]:
import numpy as np
import pandas as pd
import pymc3 as pm
from scipy import stats
import scipy
import arviz as az
import matplotlib.pyplot as plt



In [3]:
az.style.use("arviz-darkgrid")
RANDOM_SEED = 8265
np.random.seed(RANDOM_SEED)

# Probability Mass and Density Function
If we have a distribution, how can we describe how the probability is distributed?

# Let's define a distribution
We'll use a Bernoulli distribution. A Bernoulli distribution is just a distribution that describes the probability of one outcome in a binary situation. Or in other words

The probability
* A coin toss is heads
* that the person of your dreams says yes to dinner date proposal
* you eat a salad today

Two outcomes does not mean 50/50 chance!

# Let's computationally define a distribution
We'll specify 1/3 chance of observing a *1*, which implicitly means 1/3 chance of observing a *0*

In [4]:
p=.3
bern = stats.bernoulli(p)

# Lets draw a 10000 random samples and see what we get

In [7]:
num_samples = 10000
samples = bern.rvs(num_samples)
samples.sum()/samples.shape[0]

0.3066

Unsurprisingly a number very close to .3

# But what if we want to go the other way around
What if we were given an observation *1* and we wanted to understand how much of the distribution's mass associated with 1

In [11]:
bern.pmf(1), bern.pmf(0)

(0.3, 0.7)

We'd use the probability mass function to tell us. In this case we can see that 1 less likely than 0, and we can see by how much

# This similar concept exists for continuous concepts
What if we knew the distribution of human heights is distributed with mean 127 centimeters and standard deviation 20

In [13]:
heights = stats.norm(127, 20)

What is the likelihood we observe someone with height 120 cm?

# Warning: Counter Intuitive Concept 
The probability of an observation of any single number in a continuous is **0**, whether that be 120cm, 127cm, 4000cm, or 10cm.

This is because of the foundational way mathematical measure theory works.

Note though, we don't need to know the probability of any one observed value, what we need is the relative values which the probability density function gives us

# Getting the Probability Density Function
All we need is a function that takes in a value, and tells us the relative likelihood of that value occurring given a distribution

In [19]:
heights.pdf(127), heights.pdf(100), heights.pdf(4000)

(0.019947114020071634, 0.00801916636709598, 0.0)

Don't worry about the exact math. Just focus on the intuition of relative likelihood

# Log PDF
Its just PDF but transformed into the log scale. This is done for purely computational stability reasons. Again this is a computational detail that is not important to understand up front. We bring this up as you may hear the term logpdf and don't want it to surprise you.

So be aware you may see the the terms pdf, and logpdf. Just translate that in your head as "a thing that gives me relative likelihoods of seeing that a certain value"

In [20]:
np.log(heights.pdf(127)), heights.logpdf(127)

(-3.9146708067586635, -3.9146708067586635)

# Summary
* Probability Mass and Density functions tell us how the probability of observations is *spread* across a range of possible values
* Log PDF the same concept, just with a mathematical transform applied
* If we have a possible value, we can use PMF and PDF to tell us the relative likelihood of occurrence of that value, versus any other value *That's all you need to takeaway*
