# Bernoulli Distribution

**Bernoulli Distribution**: a discrete probability model, characterized by representing random experiments that can only result in two mutually exclusive outcomes: success or failure.

An experiment that fits the Bernoulli distribution is known as a *Bernoulli trial* or a *binomial experiment*. In this type of experiment, the value 1 is assigned to the outcome considered a "success" with probability $p$, and the value 0 is assigned to the outcome considered a "failure" with probability $q = 1-p$

**Math Notation**: If $X$ is a discrete random variable that follows a Bernoulli distribution with parameter $p$, it's denoted as:
$$\begin{matrix}
X \sim \text{Bernoulli}(p) & \text{o} & X \sim B(p)
\end{matrix}$$
where $0 < p < 1$

**PMF**:

$$\begin{matrix}
P(X= x) = p^x (1 - p)^{1-x} 
\end{matrix}$$ 

where: $x \in \{ 0,1 \}$. This means: $P(X=1) = p$, $P(X=0) = 1-p$

**CDF**:

$$F(x) = \left\{ \begin{array}{cl} 0 & x<0 \\ 1-p & 0\leq x \leq 1 \\ 1 & x\geq 1 \end{array} \right.$$

**Variance**: $\sigma^2 = \text{Var}(X) = p(1-p)$

**Expected Value**: $E[X] = p$

In [None]:
import random
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 
from typing import Callable, Dict, List

plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

def pmf(p:float, x:int) -> float:
  "Calculate Probability Function (PMF)"
  if x not in [0,1]:
    raise Exception("")
  return p**x * (1-p)**(1-x)

def cdf(p:float, x:float) -> float:
  "Calculate Cumulative Distribution Function (CDF)"
  if x < 0: return 0.0
  if 0 <= x < 1:
    return 1 - p 
  return 1.0

variance:Callable[[float], float] = lambda p: p * (1 - p)
def get_statistics(p:float) -> Dict:
  stats:Dict = {
    "expected_value": p,
    "variance": variance(p),
    "standard_deviation": np.sqrt( variance(p) )
  }
  return stats

def generate_sample(p:float, n:int) -> List[int]:
  "Generates a random sample from the Bernoulli distribution"
  sample:List[int] = []
  for _ in range(n):
    U:float = random.random() # Uniform(0,1)
    sample.append(1 if U < p else 0)
  return sample

def plot_distribution(p:float) -> None:
  "Generate distribution plots"
  fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2, figsize=(12,10))
  
  # plot 1: PMF
  x_values: List[int] = [0,1]
  pmf_values: List[float] = [pmf(p,x) for x in x_values]
  
  ax1.bar(x_values, pmf_values, color=['red', 'blue'], alpha=0.7, width=0.5)
  ax1.set_title(f"Probability Function (PMF)\nBernoulli(p={p})")
  ax1.set_xlabel('x')
  ax1.set_ylabel('P(X = x)')
  ax1.set_xticks([0,1])
  ax1.grid(True, alpha=0.3)
  
  for i,v in enumerate(pmf_values):
    ax1.text(i, v + 0.01, f"{v}", ha='center', va='bottom')
  
  # plot 2: CDF
  x_range: np.ndarray = np.linspace(-0.5, 1.5, 1000)
  cdf_values: List[float] = [cdf(p,x) for x in x_range]
  
  ax2.plot(x_range, cdf_values, 'b-', linewidth=2)
  ax2.set_title(f"Cumulative Distribution Function (CDF)\nBernoulli(p={p})")
  ax2.set_xlabel('x')
  ax2.set_ylabel('F(x)')
  ax2.grid(True, alpha=0.3)
  ax2.set_ylim(-0.1, 1.1)
  
  # plot 3: simulation vs. theoretical
  sample:List[int] = generate_sample(p,1000)
  experimental_freq:List[float] = [sample.count(0)/len(sample), sample.count(1)/len(sample)]
  
  x_pos: np.ndarray = np.arange(2)
  width:float = 0.35
  
  ax3.bar(x_pos - width/2, pmf_values, width, label='Theoretical', alpha=0.7)
  ax3.bar(x_pos + width/2, experimental_freq, width, label='Experimental', alpha=0.7)
  ax3.set_title('Comparison: Theoretical vs Experimental\n(n=1000)')
  ax3.set_xlabel('x')
  ax3.set_ylabel('Probability')
  ax3.set_xticks(x_pos)
  ax3.set_xticklabels(['0','1'])
  ax3.legend()
  ax3.grid(True, alpha=0.3)
  
  # plot 4: effect of parameter p
  p_values: np.ndarray = np.linspace(0.1, 0.9, 9)
  variances: List[float] = [variance(p) for p in p_values]
  
  ax4.plot(p_values, variances, 'ro-', linewidth=2, markersize=6)
  ax4.set_title('Variance as a function of p')
  ax4.set_xlabel('p')
  ax4.set_ylabel('Var(X) = p(1-p)')
  ax4.grid(True, alpha=0.3)
  ax4.axvline(x=0.5, color='gray', linestyle='--', alpha=0.7, label='p=0.5 (maximum variance)')
  ax4.legend()
  
  plt.tight_layout()
  plt.show()

def simulate_convergence(p:float, n_max:int = 10000) -> None:
  "Simulate convergence of the sample mean towards the theoretical expectation"
  sample:List[int] = generate_sample(p,n_max)
  cumulative_means:List[float] = []
  
  for i in range(1, n_max + 1):
    current_mean:float = sum(sample[:i]) / i
    cumulative_means.append(current_mean)
  
  stats = get_statistics(p)
  expected_value = stats['expected_value']
  
  plt.figure(figsize=(10,6))
  plt.plot(range(1, n_max + 1), cumulative_means, 'b-', alpha=0.7, linewidth=1)
  plt.axhline(y=expected_value, color='red', linestyle='--', linewidth=2, label=f"E[X] = {expected_value}")
  plt.title(f"Convergence of the Sample Mean\nBernoulli(p={p})")
  plt.xlabel(f"Sample Size (n={n_max})")
  plt.ylabel("Sample Mean")
  plt.legend()
  plt.grid(True, alpha=0.3)
  plt.show()

## Examples

### Coin Toss Example

Suppose you want to toss a coin and find the probability of getting tails. This is a single experiment with two possible outcomes: success is considered getting tails, which has a probability of $p = 0.5$; while failure is considered getting heads, with a probability $q = 1 - p = 0.5$.

There is a random variable $X$ that measures the "number of tails in one toss," and there are only two possible outcomes: 0 (no tails, meaning heads) and 1 (tails). Therefore, the random variable $X$ follows a Bernoulli distribution with parameter $p = 0.5$, that is, $X \sim \text{Bernoulli}(0.5)$

In [None]:
p_coin = 0.5
print(get_statistics(p_coin))
plot_distribution(p_coin)
simulate_convergence(p_coin)

### Dice Example

A dice is rolled, and we want to find the probability of getting a 6.

When rolling a dice, there are 6 possible outcomes: $\Omega = \{ 1,2,3,4,5,6 \}$. This is a single experiment performed once. Success is defined as rolling a 6, so the probability of success is $\frac{1}{6}$. Failure refers to rolling any other number, with probability $1-\frac{1}{6} = \frac{5}{6}$.

The random variable $X$ measures the "number of times a 6 is rolled," and only takes two possible values: 0 (no 6 rolled) and 1 (a 6 rolled). This variable follows a Bernoulli distribution with parameter $p = \frac{1}{6}$:

$$X \sim \text{Bernoulli}\left( \frac{1}{6} \right)$$

In [None]:
p_dice = 1/6
print(get_statistics(p_dice))
plot_distribution(p_dice)
simulate_convergence(p_dice)