Beata Sirowy

# Bayes therorem in practice: distributions

Based on: Downey, A. (2021) _Think Bayes_

In this tutorial we’ll use Pmf objects ( a “__probability mass function__”) to solve some more challenging problems and take one more step toward Bayesian statistics. But we’ll start with distributions.


### Distributions
In statistics a distribution is a set of possible outcomes and their corresponding
probabilities. 
- For example, if you toss a coin, there are two possible outcomes
with approximately equal probability. If you roll a 6-sided die, the set of possible
outcomes is the numbers 1 to 6, and the probability associated with each
outcome is 1/6.
- To represent distributions, we’ll use a library called empiricaldist. An
“empirical” distribution is based on data, as opposed to a theoretical distribution.

### Probability Mass Functions
If the outcomes in a distribution are discrete, we can describe the distribution
with a probability mass function, or PMF, which is a function that maps from
each possible outcome to its probability.

- empiricaldist provides a class called Pmf that represents a probability
mass function.

In [1]:
from empiricaldist import Pmf

coin = Pmf()
coin['heads'] = 1/2
coin['tails'] = 1/2
coin

Unnamed: 0,probs
heads,0.5
tails,0.5


Pmf creates an empty Pmf with no outcomes. Then we can add new outcomes
using the bracket operator. In this example, the two outcomes are represented
with strings, and they have the same probability, 0.5.

- You can also make a Pmf from a sequence of possible outcomes.
The following example uses Pmf.from_seq to make a Pmf that represents a
6-sided die.

In [2]:
die = Pmf.from_seq([1,2,3,4,5,6])
die

Unnamed: 0,probs
1,0.166667
2,0.166667
3,0.166667
4,0.166667
5,0.166667
6,0.166667


In this example, all outcomes in the sequence appear once, so they all have the
same probability,

More generally, outcomes can appear more than once, as in the following
example:

In [4]:
letters = Pmf.from_seq(list('Alabama'))
letters

Unnamed: 0,probs
A,0.142857
a,0.428571
b,0.142857
l,0.142857
m,0.142857


The Pmf class inherits from a pandas Series, so anything you can do with a
Series, you can also do with a Pmf.
- For example, you can use the bracket operator to look up a quantity and get the
corresponding probability:


In [6]:
letters["l"]

np.float64(0.14285714285714285)

In [5]:
letters[2:5]

b    0.142857
l    0.142857
m    0.142857
Name: , dtype: float64

In [7]:
die([1,4,7])

array([0.16666667, 0.16666667, 0.        ])

The quantities in a Pmf can be strings, numbers, or any other type that can be
stored in the index of a pandas Series.

### The Cookie Problem Revisited