# Git, Python, and Probability Primer
## Introduction to Quantified Cognition
By: Per B. Sederberg, PhD

<a href="https://colab.research.google.com/github/compmem/QuantCog/blob/2020_Spring/notebooks/02_Python_and_Probability.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>

*Note: If you are brand new to Python, you should have reviewed for homework some of the suggested introductions from the first class.*

It's not too late!

https://github.com/compmem/QuantCog

## Git and GitHub

[Git](https://git-scm.com/) is a distributed version control system.

[GitHub](https://github.com/) is a website that allows for sharing repositories created and managed with Git.

## Quick Git Demo

Let's open up a terminal and try some git!

1. Create a repo
2. Add/Modify a file
3. Commit the file
4. Repeat steps 2 and 3

## Jupyter Notebooks

[Jupyter](https://jupyter.org/) notebooks provide an interactive tool for creating documents with code, visualizations, and narrative.

It's like a computerized lab notebook for all your analyses.

## Quick Jupyter Demo

Colab vs. local (via Anaconda)

## A short intro to Python

The Python programming language was invented by Guido van Rossum, who just recently stepped down as benevelont dictator for life to be replaced by an elected steering committee. 

Python has a set of core principles provided by the Zen of Python, which are available within Python, itself (so Zen, right?!):

In [None]:
import this

## Python can do pretty much anything we need

- Write programs to run experiments

- Computational modeling (cognitive models, neural networks, simulation studies)

- Data processing from start to finish

  - Reading and parsing log files
  
  - Data preprocessing (esp, important for EEG and fMRI) and filtering
  
  - Statistical analyses

- Produce papers, presentations, websites...

- Just about anything else you can think of

  - Games, graphics, audio apps, even cell phone and tablet apps


## Objects everywhere!

Python is an object-oriented language and supports (though does not require) an object-oriented programming model. This simply means it makes it easy to create new objects, inheriting features from other objects.

## What is an object?

Everything (yes, everything) in Python is an object (or an instance of an object), which is really useful!

- Objects have *attributes* that tell us about that object instance
- Objects can have *methods* that a functions that object can perform


## What is an instance?

When you initialize an object, you create an *instance* of it, taking up memory in the computer. 

Thus, any variable we define is simply pointing the variable's name (in the current namespace) to a chunk of memory containing the instance of that object.

Python keeps track of all object instances and cleans them when they are no longer needed.

In [None]:
# let's make an int object instance
x = 42
type(x)

In [None]:
# let's explore x
# (press tab after x.<tab>)


In [None]:
y = 'The answer to the ultimate question.'
type(y)

In [None]:
# let's explore y
# (press tab after y.<tab>)


## A bit about namespaces

It's usually a very good idea to keep a clean namespace. Given that Python is designed to be highly *extensible*, the base functionality is small and there are tons of useful modules (i.e., libraries) you can import. 

Most have naming conventions that help you keep track of where your methods are coming from:

```python
import numpy as np
```
and not
```python
import * from numpy
```

## Basics of Probability



### A   
$P(A) \in [0, 1]$

### not A
$1 - P(A)$

### A or B 
$P(A \cup B) = P(A) + P(B) - P(A \cap B)$ or <br> $P(A \cup B) = P(A) + P(B)$ if A and B are mutually exclusive

### A and B
$P(A \cap B) = P(A \mid B) P(B) = P(B \mid A) P(A)$ or <br> $P(A \cap B) = P(A) P(B)$ if A and B are independent

### A given B 
$P(A \mid B) = \frac{P(A \cap B)}{P(B)} = \frac{P(B \mid A) P(A)}{P(B)}$






## Simple examples

Using the table above, let's try and figure out the probabilities of the following:

1. Rolling a 5 on a 6-sided die.
2. Not rolling a 3 on a 6-sided die.
3. Rolling a 4 or a 5 on a 6-sided die.
4. Rolling less than 4 or an even number on a 6-sided die.
5. Rolling two 3's in a row on a 6-sided die.


## Harder question

### The case of Tim Tebow

What is the probability of becoming a Major League Baseball (MLB) player if you hit a home run (HR) in your first at-bat in the minor leagues? We have the following important information:

a) 0.5% of future MLB players hit a home run in their first minor league at-bat.

b) 1% of minor league players make it to MLB.

c) Only 0.1% of players hit a homerun in their first minor league at-bat.

Hint: You are trying to solve $P(MLB \mid HR)$.

*Bonus*: What additional information do we need to know in order to calculate the probability a player doesn't make it to MLB, but hit a home run in their first minor league at bat?

## Probability Distributions

At the core of probability theory are the mathematical functions determining the probability of the potential outcomes of an experiment. 

In statistics, these distributions represent the models that attempt to describe the observed data. The equations take in parameters that determine the shape of the probability distributions.

Let's explore some continuous and discrete probability distributions relevant to quantifying data and models!!!

## *ONLY* if on Google Colab


In [None]:
# to retrieve the dists.py file
!wget https://raw.githubusercontent.com/compmem/QuantCog/2020_Spring/notebooks/dists.py

## Exploring probability distributions

In [None]:
# load matplotlib inline mode
%matplotlib inline

# import some useful libraries
import numpy as np                # numerical analysis linear algebra
import pandas as pd               # efficient tables
import matplotlib.pyplot as plt   # plotting
import ipywidgets as widgets      # interactive widgets

# local code wrapping scipy distributions
import dists

In [None]:
# function to help plot a PDF
def plot_pdf(dist, support=[-5, 5], npoints=100):
    # set a range of linearly-spaced points
    x = np.linspace(support[0], support[1], npoints)
    
    # evaluate the pdf at those points
    pdf = dist.pdf(x)
    
    # plot the results
    plt.plot(x, pdf, lw=3)
    plt.xlabel('Value')
    plt.ylabel('Probability')

### [Uniform](https://en.wikipedia.org/wiki/Uniform_distribution_(continuous))

A continuous probability distribution assigning equal probability over a range.

What happens when we change the range?


In [None]:
# plot the PDF
plot_pdf(dists.uniform(lower=-2, upper=2))
plot_pdf(dists.uniform(lower=-1, upper=1))


### [Normal/Gaussian distribution](https://en.wikipedia.org/wiki/Normal_distribution)

Important in statistics due to the Central Limit Theorem.

In [None]:
# plot the PDF
plot_pdf(dists.normal(mean=0, std=1))
plot_pdf(dists.normal(mean=1, std=2), support=[-5, 10])


### [Beta](https://en.wikipedia.org/wiki/Beta_distribution)

Only has support between 0 and 1. Useful to help determine the probability of a probability.

We'll spend some time with Beta distributions in subsequent classes.

In [None]:
# plot the PDF
plot_pdf(dists.beta(alpha=0.5, beta=0.5), support=[0,1])
plot_pdf(dists.beta(alpha=2, beta=5), support=[0,1])
plot_pdf(dists.beta(alpha=20, beta=10), support=[0,1])


### [Gamma](https://en.wikipedia.org/wiki/Gamma_distribution)

Only has positive support. 

Other common distributions are special cases of the Gamma distribution (e.g., Exponential, Chi-Square, etc...)

In [None]:
# Gamma Distribution
plot_pdf(dists.gamma(alpha=0.5, beta=0.5), support=[-1,10])
plot_pdf(dists.gamma(alpha=10.0, beta=5), support=[-1,10])

### And more...

Take some time now to explore:

- Inverse Gamma
- Exponential
- Student's t
- Half Cauchy

Making note of how the parameters affect the shape of the distribution and the support.

Refer to their respective pages on Wikipedia for additional information, and try to replicate the illustrative plots on those pages.