# Lambda School Data Science Module 123

## Introduction to Bayesian Inference




## Assignment - Code it up!

We used pure math to apply Bayes Theorem to drug tests. Now write Python code to reproduce the results! This is purposefully open ended - you'll have to think about how you should represent probabilities and events. You can and should look things up.

### 1) Write a function 

`def prob_drunk_given_positive(prob_drunk_prior, false_positive_rate):` 

You should only truly need these two values in order to apply Bayes Theorem. In this example, imagine that individuals are taking a breathalyzer test with an 8% false positive rate, a 100% true positive rate, and that our prior belief about drunk driving in the population is 1/1000. 
 - What is the probability that a person is drunk after one positive breathalyzer test?
 - What is the probability that a person is drunk after two positive breathalyzer tests?
 - How many positive breathalyzer tests are needed in order to have a probability that's greater than 95% that a person is drunk beyond the legal limit?





In [0]:
# prb_drunk_postv returns the probability of a drunk driver given a prior probability of drunk driver
def prb_drunk_postv(prb_drunk_prior, rte_false_pstv):
  prb_rate_not_drunk        = .999            # probability of a non-drunk diver
  prb_rate_p_when_drunk     = 1.00            # positve test rate if drunk
  prb_rate_p_when_not_drunk = rte_false_pstv  # positve test rate if not drunk (false positive)
  prb_drunk = prb_drunk_prior                 # prior knowledge of drunk driving

  # Set up the calculation via a Baysian Method
  numerator   = prb_rate_p_when_drunk * prb_drunk
  denominator = (prb_rate_p_when_drunk * prb_drunk) + (prb_rate_p_when_not_drunk * prb_rate_not_drunk)

  # Calculate probability of being a drunk driver given some prior test assumption
  bae_stat = numerator / denominator
 
  return bae_stat
  

In [13]:
# What is the probability that a person is drunk after one positive breathalyzer test?
tst_0 = prb_drunk_postv(.001, .08)
tst_1 = prb_drunk_postv(tst_0, .08)
tst_2 = prb_drunk_postv(tst_1, .08)

print(f'Probability of being drunk - before any tests - is: {"{:.4%}".format(tst_0)}')
print(f'Probability of being drunk - after 1 positive breahtalyzer tests - is: {"{:.4%}".format(tst_1)}')
print(f'Probability of being drunk - after 2 positive breahtalyzer tests - is: {"{:.4%}".format(tst_2)}')


Probability of being drunk - before any tests - is: 1.2358%
Probability of being drunk - after 1 positive breahtalyzer tests - is: 13.3920%
Probability of being drunk - after 2 positive breahtalyzer tests - is: 62.6263%


### 2) Explore `scipy.stats.bayes_mvs`  
Read its documentation, and experiment with it on data you've tested in other ways earlier this week.
 - Create a visualization comparing the results of a Bayesian approach to a traditional/frequentist approach. (with a large sample size they should look close to identical, however, take this opportunity to practice visualizing condfidence intervals in general. The following are some potential ways that you could visualize confidence intervals on your graph:
  - [Matplotlib Error Bars](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.errorbar.html)
  - [Seaborn barplot with error bars](https://seaborn.pydata.org/generated/seaborn.barplot.html)
  - [Vertical ines to show bounds of confidence interval](https://www.simplypsychology.org/confidence-interval.jpg)
  - [Confidence Intervals on Box Plots](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.boxplot.html)


### 3) In your own words, summarize the difference between Bayesian and Frequentist statistics

In my opinion the frequentist approach analyzes a problem space at a snapshot in time. So for the Monty Hall problem, the frequentist approach would to analyze a set of three doors in which a cool car is behind one of two closed doors and a goat is behind another door which is open.  To the frequentist this is a simple problem in which there are two closed doors with a desirable prize behind one of them.  Therefore there's a 50-50 chance of picking "right".

Bayesian statistics accommodates some bit of prior knowledge that impacts the problem space.  For the Monty Hall problem, it is known that a car is behind one of three doors.  However, there's prior knowledge in that one of those three doors is open revealing a goat.  Bayesian statistics takes this bit of (prior) knowledge and enables the viewer to include it into calculating probabalities of a desired outcome.

## Resources

- [Worked example of Bayes rule calculation](https://en.wikipedia.org/wiki/Bayes'_theorem#Examples) (helpful as it fully breaks out the denominator)
- [Source code for mvsdist in scipy](https://github.com/scipy/scipy/blob/90534919e139d2a81c24bf08341734ff41a3db12/scipy/stats/morestats.py#L139)

## Stretch Goals:

- Go back and study the content from Modules 1 & 2 to make sure that you're really comfortable with them.
- Apply a Bayesian technique to a problem you previously worked (in an assignment or project work) on from a frequentist (standard) perspective
- Check out [PyMC3](https://docs.pymc.io/) (note this goes beyond hypothesis tests into modeling) - read the guides and work through some examples
- Take PyMC3 further - see if you can build something with it!