<a href="https://colab.research.google.com/github/americano-diana/neuronotes/blob/main/probabilistics_overview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Topic 1: Stats - Bayesian & Frequentist approaches (but mostly bayesian)

I begin this series of notebooks as someone who is deeply interested in understanding the maths behind probabilistic statistics and modeling techniques. I also find it very useful to be able to visualize & test with code the topics that I learn. Thus this series of "NeuroNotes" notebooks are born. As a disclaimer, I am myself a master's student who is learning/solidifying their own learning, if there are any mistakes please do let me know and I will correct right away. I hope other students or anyone who is curious about both theory and application of these topics will find it useful! If no one else but myself sees it, that is all fine as well, as I will have learned anyways.

Let's go ahead 😺

-DPAG


# Stats Overview - the traditional way vs the bayesian way?

Frequentist and bayesian statistics are constantly at feud it seems, let us go over what each of them are, propose, and why in neuroscience we find bayesian stuff cool (aka, is the brain doing bayesian stats?!?)

## Disclaimer

Content is taken/summarized from the book "All of Statistics" by Larry A. Wasserman, which I highly recommend. It has all the necessary definitions and proofs one will ever (or perhaps never) need!

[Link to the book](https://link.springer.com/book/10.1007/978-0-387-21736-9)

# It all begins with: Probability

"Probability is the mathematical language for quantifying uncertainty." (Larry A. Wasserman, in the above mentioned book).

The probability $P$ of an event $A$ can be defined as a real number under the function $P(A)$. This value $P$ can be either a distribution or specific measure (number).

Now there's different ways how we can interpret this $P(A)$. The two common ones which I hinted at above are the frequentist approach and the degree of belief (bayesian) approach.

According to frequentist statistics, $P(A)$ is the proportion of times that A is true given $x$ repetitions.

The most simple usual example is the $P(A)$ where A means "coin will land on heads". If we say $P(Heads) = 1/2$, it means that if we flip the coin x times then heads is expected to be the outcome during half of those times.

The bayesian interpretation is that $P(A)$ measures the observer's **strength of belief** that A will be true, given current knowledge and assumptions (known as priors). Following the example where A means "coin will land on heads", from a bayesian approach,  $P(Heads) = 1/2$  means we believe coin will land on heads half of the time. However, the value of our initial belief (in the example: $1/2$) can **change and adapt** given new evidence.


### Now let's go over some code and explore these two interpreations!

## Frequentist probability

Given a finite sample space (number of trials called $N$ let's say), the $P(A)$ will be

$P(A) = |A|/|N|$

There's a few assumptions behind this definition, such as our $N$ being finite and each outcome being equally likely, for more details (such as conditional probability) check out the book. For our simple comparison we shall take this as a sample frequentist way of calculating probability.

In [None]:
# Importing libraries

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta
import ipywidgets as widgets      # Interactive widgets so we can play with the degree-of-belief
from ipywidgets import interact

In [None]:
# Consider a random exercise where we tossed coin 10 times and observed a certain number of tails vs heads
# Feel free to edit the numbers if you want
heads = 6
tails = 4

# Now how do we calculate P(A) from a frequentist approach?
# Formula: P(A) = heads / (heads + tails), where heads = A, and N = (tails + heads)
frequentist_p = heads / (heads + tails)

print(frequentist_p)

# This means the probability of heads based off our observations = 60%! (Unless you toggle the numbers, you can explore how it changes, but this should be pretty straight forward)

0.6


## Bayesian probability

To speak of Bayesian statistics we most necessarily refer to Bayes' Theorem. Bayes' Theorem was proposed by a mathematician under the same name, Thomas Bayes, as a theorem to calculate probability.

Bayes' Theorem relates the "direct" probability of a hypothesis conditional on a given body of data, $P(E|H)$, to the "inverse" probability of the data conditional on the hypothesis, $P(H|E)$, calculated as follows:

$P(H∣E)= P (E | H ) * P (H) / P(E)$


Where:

𝐻 = Hypothesis (e.g., “the coin’s probability of heads is 0.5”)

𝐸 = Evidence or data (e.g., “we observed 6 heads and 4 tails”)

𝑃(𝐻) = Prior probability — our belief in 𝐻 before seeing any data.

𝑃(𝐸∣𝐻) = Likelihood — the probability of E given H.

𝑃(𝐸) = Evidence or marginal likelihood — the total probability of the data, considering all possible hypotheses.

𝑃(𝐻∣𝐸) = Posterior probability — our updated belief in 𝐻

-

In summary, the calculation of the posterior (our updated belief) can be defined as:

Posterior = Likelihood * Prior / Data

For our exercise with the coin let's define our priors as
P(Tails) = 0.4
P(Heads) = 0.6

with Likelihoods both equal to 0.5

How do we update our posteriors?

In [None]:
# Now let's do the same exercise utilizing a bayesian approach

# Defining our priors as stated above - these are the probabilities for each outcome (which are based off the frequentist values above)
P_heads = 0.6
P_tails = 0.4

# Likelihood of event E given each hypothesis - assuming both events happening have the same likelihood!
P_E_given_heads = 0.5
P_E_given_tails = 0.5

# Evidence (total probability of E)
P_E = P_E_given_heads * P_heads + P_E_given_tails * P_tails

# Updated posterior for heads
P_heads_given_E = (P_E_given_heads * P_heads) / P_E

# Updated posterior for tails
P_tails_given_E = (P_E_given_tails * P_tails) / P_E

print(f"P(Heads | E) = {P_heads_given_E:.2f}")
print(f"P(Tails | E) = {P_tails_given_E:.2f}")


P(Heads | E) = 0.60
P(Tails | E) = 0.40


### Discussion point: Why do you think the priors and posteriors ended up being the same?

In [None]:
# OPTIONAL: Here's a widget to interact and move values to visualize frequentist vs bayesian probabilities

# Interactive plotting function
def simulate_coin(true_p=0.5, n_flips=200, alpha_prior=1, beta_prior=1):
    np.random.seed(42)  # For reproducibility

    # Simulate flips
    flips = np.random.binomial(1, true_p, n_flips)
    trials = np.arange(1, n_flips + 1)

    # Frequentist estimates
    cumulative_heads = np.cumsum(flips)
    frequentist_estimate = cumulative_heads / trials

    # Bayesian updating
    alpha_post = alpha_prior
    beta_post = beta_prior
    posterior_means = []
    for flip in flips:
        if flip == 1:
            alpha_post += 1
        else:
            beta_post += 1
        posterior_means.append(alpha_post / (alpha_post + beta_post))

    # Plot
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    # Frequentist plot
    axes[0].plot(trials, frequentist_estimate, label='Frequentist Estimate')
    axes[0].axhline(y=true_p, color='red', linestyle='--', label='True Probability')
    axes[0].set_xlabel("Number of Trials")
    axes[0].set_ylabel("Estimated Probability of Heads")
    axes[0].set_title("Frequentist Approach")
    axes[0].legend()

    # Bayesian plot (Posterior mean)
    axes[1].plot(trials, posterior_means, label='Bayesian Posterior Mean')
    axes[1].axhline(y=true_p, color='red', linestyle='--', label='True Probability')
    axes[1].set_xlabel("Number of Trials")
    axes[1].set_ylabel("Belief in Probability of Heads")
    axes[1].set_title("Bayesian Approach")
    axes[1].legend()

    plt.show()

    # Posterior distribution after all flips
    x = np.linspace(0, 1, 200)
    posterior_dist = beta(alpha_post, beta_post)
    plt.figure(figsize=(7, 4))
    plt.plot(x, posterior_dist.pdf(x), label=f'Posterior: Beta({alpha_post}, {beta_post})')
    plt.axvline(true_p, color='red', linestyle='--', label='True Probability')
    plt.xlabel("Probability of Heads")
    plt.ylabel("Density")
    plt.title("Final Bayesian Posterior Distribution")
    plt.legend()
    plt.show()

# Create interactive sliders
interact(
    simulate_coin,
    true_p=widgets.FloatSlider(value=0.5, min=0.05, max=0.95, step=0.05, description='True P'),
    n_flips=widgets.IntSlider(value=200, min=10, max=1000, step=10, description='Num Flips'),
    alpha_prior=widgets.IntSlider(value=1, min=1, max=10, step=1, description='Alpha Prior'),
    beta_prior=widgets.IntSlider(value=1, min=1, max=10, step=1, description='Beta Prior')
)


interactive(children=(FloatSlider(value=0.5, description='True P', max=0.95, min=0.05, step=0.05), IntSlider(v…

## Lastly - Bayesian Stats and Neuroscience

Now we come back to something I mentioned I asked in the beginning: **Is the brain doing bayesian statistics?**

First I must mention we are still figuring out what the brain does, and we might never fully understand it, but the short answer is **we think so**.

The theory is that our brain functions like a prediction machine adapting to minimize uncertainty in its environment, and the way it does this is through constantly processing the world based off existing priors, which are then adjusted into posteriors, and repeat.

This gave rise the the widely known predictive processing/coding theory in neuroscience.

What's cool about predictive coding is that it follows a hierarchical processing/updating mechanism which means we can even apply it to Artificial Neural Networks (ANNs) and test if it makes our ANN behave more brain-like!

Predictive Coding is a complex topic of its own, I aim to learn more about it and write a notebook about it next.

Hope you found this useful too!

## Cool extra resources

[Book: Active Inference, The Free Energy Principle in Mind, Brain and Behavior](https://direct.mit.edu/books/oa-monograph/5299/Active-InferenceThe-Free-Energy-Principle-in-Mind)

[Video: Karl Friston speaks of Predictive Coding](https://www.youtube.com/watch?v=b1hEc6vay_k&ab_channel=Dartmouth)