# Probability

- Foundation of Data Science
- Allows to produce data from models
- Study regularities in randon phenomena
    + The rule of large numbers
    + Repeating an experiment many times the outcomes
    show patterns approaching to the mean.

### Let´s flip a coin**



Only two outcomes
- This random experiment is called a Bernoulli trial, after Jacob Bernoulli.

Possible outcomes are binary (two possible outcomes): the can be modeled as success or failure, yes or no, or on or off. 

Each outcom is called an **event**

- We assining probabilities to the event

In [2]:
from scipy.stats import bernoulli
bernoulli.rvs(p=0.5, size=10) 

# .rvs creates random samples from the Bernoulli distribution.
# p=0.5 The probability of success (1) is 50%, and the probability of failure (0) is also 50%
# size=1 Generates one random sample. In this case number of coin flips = 1

array([0, 0, 1, 0, 0, 1, 1, 1, 0, 0])

In [None]:
# How many heads?

sum(bernoulli.rvs(p=0.5, size=10))

np.int64(5)

**Fliiping multiple coins**

Generating a **Binomial Random Variable** in Python: 

Understanding
- n (trials)
- p (succes probability)
- size (number of draws)

#### Fair Coin

A **fair coin** is a coin where the probabilities of landing on heads or tails are **equal**, meaning each outcome has a probability of **0.5**. This ensures that over many flips, heads and tails occur approximately **equally often**.

In [3]:
from scipy.stats import binom
binom.rvs(n=10, p=0.5, size=10) 

array([4, 4, 5, 5, 8, 7, 7, 5, 5, 4])

#### Biased coin draws


In a biased coin , one outcome is more likely than the other.

For example:

In [4]:
binom.rvs(n=10, p=0.3, size=10) # Probability p=0.3

array([2, 4, 1, 2, 1, 5, 1, 2, 4, 3])

#### Biased Coin Explanation

This concept is often used probabilility theory, statistics, and machine learning to model non uniform random events, such as:
- Weighted random sampling
- Bayesian probability models
- Simulating real world probabilities (e..g., unfair dice, genetics, or market trends)

    - Slight bias:
        + P(H) = 0.55, P(T) = 0.45
        + Coin is slightly biased toward heads.

    - Extreme bias:
        + P(H) = 0.9, P(T) = 0.1
        + Coin lands on heads most of the time.

    - Different bias:
        + P(H) = 0.4, P(T) = 0.6
        + Coin is biased toward tails.

The probabilities can be anything as long as they add up to 1.

##### Key Points:

- **Unequal Probabilities:** Unlike a fair coin, the outcomes are not equally likely. A coin with p = 0.7 for heads is more likely to land heads than tails.

- **Bernoulli Trial:** Each coin toss is a Bernoulli trial, which means it has two possible outcomes (success or failure) with a fixed probability of success (here, success might be defined as landing heads).

- **Binomial Distribution:** When you perform several independent coin tosses, the number of heads observed follows a binomial distribution. In Python, you can simulate this using functions like numpy.random.binomial.

#### Random generator seed

In [None]:
# Use the random_state parameter of the rvs() function
from scipy.stats import binom
binom.rvs(n=10,p=0.5,random_state=42)

4

In [None]:
# Uee numpy.random.seed()
import numpy as np
np.random.seed(42)
binom.rvs(n=10, p=0.5, size=1)

array([4])

# Probability Distributions


A **probability distribution** describes how the possible values of a random variable are distributed in terms of their likelihood. Depending on whether the random variable is **discrete** or **continuous**, we use different functions to represent this distribution.

---

## 1. Discrete Random Variables

- **Probability Mass Function (PMF):**  
  For a discrete random variable \(X\), the PMF \(p(x)\) gives the probability that \(X\) takes a specific value \(x\).  
  $$
  p(x) = P(X = x)
  $$
  Such that:
  $$
  p(x) \ge 0 
  \quad \text{and} \quad 
  \sum_x p(x) = 1.
  $$

### Example: Binomial Distribution
$$
\mathrm{PMF}(k; n, p) 
= \binom{n}{k} \, p^k \, (1 - p)^{n - k}, 
\quad k = 0, 1, \dots, n.
$$

---

## 2. Continuous Random Variables

- **Probability Density Function (PDF):**  
  For a continuous random variable \(X\), the PDF \(f(x)\) describes the relative likelihood of \(X\) taking on any particular value. While \(f(x)\) itself is not a probability, the probability that \(X\) lies within an interval \([a, b]\) is given by:
  $$
  P(a \le X \le b) = \int_a^b f(x)\, dx.
  $$
  Such that:
  $$
  f(x) \ge 0 
  \quad \text{and} \quad 
  \int_{-\infty}^{\infty} f(x)\, dx = 1.
  $$

### Example: Normal Distribution
$$
f(x) = \frac{1}{\sqrt{2\pi}\,\sigma}
\exp\!\Bigl(-\frac{(x - \mu)^2}{2\sigma^2}\Bigr),
\quad -\infty < x < \infty.
$$

---

## 3. Cumulative Distribution Function (CDF)

For **both** discrete and continuous random variables, the **Cumulative Distribution Function (CDF)** \(F(x)\) gives the probability that the random variable \(X\) is less than or equal to \(x\):
$$
F(x) = P(X \le x).
$$

- **Discrete:** \(F(x)\) is a step function increasing at the discrete values of \(X\).  
- **Continuous:** \(F(x)\) is continuous and differentiable (except possibly at points where the PDF may be discontinuous).

---

## Key Points

1. **Discrete vs. Continuous**  
   - Use a **PMF** for discrete variables.  
   - Use a **PDF** for continuous variables.

2. **Normalization**  
   - Sum (discrete) or integral (continuous) of the distribution function over the entire domain must be **1**.

3. **Non-negativity**  
   - All probabilities (or density values) must be **non-negative**.

4. **CDF**  
   - A unifying function that applies to **both** discrete and continuous cases.

---

### Why Are Probability Distributions Important?
- They form the foundation for **statistical inference** and **data analysis**.  
- They help us compute **expected values**, **variances**, and other moments, which are crucial for understanding the behavior of random variables.


**Density:** Density explains how likely outcomes are within that event range.

# Probability mass functions (pmf)

(Wahrscheinlichkeitsmassenfuntion)

### Binomial PMF



The probability mass function (PMF) of a Binomial distribution is given by:

$$
\text{binomial.pmf}(k, n, p) = \binom{n}{k} \, p^k \, (1-p)^{n-k}.
$$

- $ \binom{n}{k} $ is the binomial coefficient, the number of ways to choose $ k $ successes out of $ n $ trials.

- $ p^k $ is the probability of those $ k $ successes.

- $ (1-p)^{n-k} $ is the probability of the remaining $ n-k $ failures.


In [29]:
# Probability of 2 heads after 10 throws with a fair coin

binomial_pmf = binom.pmf(k=2,n=10,p=0.5)
print(binomial_pmf)

0.04394531250000005


In [30]:
# Probability of 5 heads after 10 throws with a fair coin

binomial_pmf = binom.pmf(k=5,n=10,p=0.5)
print(binomial_pmf)

0.24609375


In [None]:
# Probability of 50 heads after 100 throws with p = 0.3

binomial_pmf = binom.pmf(k=50,n=100,p=0.3)
print(binomial_pmf)

1.3026227131445274e-05


In [32]:
# Probability of 65 heads after 100 throws with p = 0.7

binomial_pmf = binom.pmf(k=65,n=100,p=0.7)
print(binomial_pmf)

0.046779682352730015


**As n gets larger, the probability of getting k becomes smaler for the same p.**

# Probability distribution functions (cdf)

To calculate the probability of getting k or fewer heads from n throws, we use the binomial probability distribution function (cdf), which adds the probabilities of. Getting 0 heads out of n flips, getting heads once out of n flips, and getting all the way up to k heads out of n flips.

The binomial probability distribution allows us to calculate the cumulative probability of getting k heads or fewer from n coin flips with p probability of getting heads.

In python we use the binom.cdf(k,n,p) function with parameters k,n, and p.

Adding the probabilities from the mass function, we get the cumulative distribution function (cdf).

This is a way of getting a range of probabilities rather than the probability of a single event.


In [40]:
# Probability of 5 heads or less after 10 throws with a faier coin
import numpy as np
from scipy.stats import binom
binom.cdf(k=5, n=10, p=0.5)

np.float64(0.623046875)

In [41]:
# Probability of 50 heads or less after 100 throws with p = 0.3
binom.cdf(k=50, n=100, p=0.3)

np.float64(0.9999909653138043)

**Recall that:**

- binom.pmf() calculates the probability of having exactly k heads out of n coin flips.

- binom.cdf() calculates the probability of having k heads or less out of n coin flips.

- binom.sf() calculates the probability of having more than k heads out of n coin flips.

**Predicting the probability of defects**

In [48]:
# Probability of getting 2 or less defective components
prob_two_or_less_defects = binom.cdf(k=2, n=50, p=0.02)
print(prob_two_or_less_defects)

0.921572251649031


In [49]:
# Calculate the probability of getting more than 3 yes responses
prob_more_than_three_yes = binom.sf(k=3, n=8, p=0.65)
print(prob_more_than_three_yes)

0.8939090951171875


In [50]:
# What is the probability of solving 1 or fewer or more than 7 burglaries?
tail_probabilities = binom.cdf(k=1, n=9, p=0.2) + binom.sf(k=7, n=9, p=0.2)
print(tail_probabilities)

0.4362265599999995


# Expected value, mean, and variance

In [60]:
import numpy as pd
import pandas as pd
from scipy.stats import binom
from scipy.stats import describe
# Sample mean from a generated sample of 100 fair coin flips
sample_of_100_flips = binom.rvs(n=1, p=0.5, size=100)
sample_mean_100_flips = describe(sample_of_100_flips).mean
print(sample_mean_100_flips)

0.51


In [61]:
# Sample mean from a generated sample of 1,000 fair coin flips
sample_mean_1000_flips = describe(binom.rvs(n=1, p=0.5, size=1000)).mean
print(sample_mean_1000_flips)

0.499


In [62]:
# Sample mean from a generated sample of 2,000 fair coin flips
sample_mean_2000_flips = describe(binom.rvs(n=1, p=0.5, size=2000)).mean
print(sample_mean_2000_flips)

0.5045


In [63]:
sample = binom.rvs(n=10, p=0.3, size=2000)

# Calculate the sample mean and variance from the sample variable
sample_describe = describe(sample)

# Calculate the sample mean using the values of n and p
mean = 10 * 0.3

# Calculate the sample variance using the value of 1-p
variance = mean * (1 - 0.3)

# Calculate the sample mean and variance for 10 coin flips with p=0.3
binom_stats = binom.stats(n=10, p=0.3)

print(sample_describe.mean, sample_describe.variance, mean, variance, binom_stats)

2.98 2.0726363181590797 3.0 2.0999999999999996 (np.float64(3.0), np.float64(2.1))


In [70]:
from scipy.stats import binom, describe
import numpy as np

# Initialize lists to store the computed averages and variances
averages = []
variances = []

for i in range(1500):
    # 10 trials of 10 coin flips with 25% probability of heads
    sample = binom.rvs(n=10, p=0.25, size=10)
    # Compute descriptive statistics for the sample
    stats = describe(sample)
    averages.append(stats.mean)
    variances.append(stats.variance)

print(averages)
print(variances)

[np.float64(2.2), np.float64(2.4), np.float64(2.7), np.float64(2.1), np.float64(2.1), np.float64(2.8), np.float64(2.2), np.float64(1.8), np.float64(3.2), np.float64(3.2), np.float64(2.0), np.float64(2.8), np.float64(1.8), np.float64(2.5), np.float64(2.8), np.float64(2.3), np.float64(2.9), np.float64(2.5), np.float64(2.5), np.float64(3.6), np.float64(2.7), np.float64(1.7), np.float64(2.2), np.float64(2.7), np.float64(1.9), np.float64(2.4), np.float64(3.8), np.float64(3.2), np.float64(2.3), np.float64(2.9), np.float64(2.0), np.float64(3.1), np.float64(2.7), np.float64(2.3), np.float64(3.1), np.float64(2.0), np.float64(3.1), np.float64(2.5), np.float64(2.1), np.float64(3.1), np.float64(2.5), np.float64(3.4), np.float64(2.7), np.float64(2.8), np.float64(2.5), np.float64(2.8), np.float64(2.4), np.float64(2.9), np.float64(2.9), np.float64(2.2), np.float64(2.1), np.float64(3.0), np.float64(3.6), np.float64(3.0), np.float64(2.6), np.float64(2.8), np.float64(1.9), np.float64(2.1), np.float64(2.

In [71]:
for i in range(0, 1500):
	# 10 draws of 10 coin flips with 25% probability of heads
    sample = binom.rvs(n=10, p=0.25, size=10)
	# Mean and variance of the values in the sample variable
    averages.append(describe(sample).mean)
    variances.append(describe(sample).variance)
  
# Calculate the mean of the averages variable
print("Mean {}".format(describe(averages).mean))

# Calculate the mean of the variances variable
print("Variance {}".format(describe(variances).mean))

Mean 2.492566666666667
Variance 1.8587740740740741


In [72]:
from scipy.stats import binom, describe
import numpy as np

averages = []
variances = []

for i in range(1500):
    # 10 draws of 10 coin flips with 25% probability of heads
    sample = binom.rvs(n=10, p=0.25, size=10)
    # Mean and variance of the values in the sample variable
    averages.append(describe(sample).mean)
    variances.append(describe(sample).variance)
  
# Calculate the mean of the averages variable
print("Mean {}".format(describe(averages).mean))

# Calculate the mean of the variances variable
print("Variance {}".format(describe(variances).mean))

# Calculate the theoretical mean and variance
print(binom.stats(n=10, p=0.25))


Mean 2.5056666666666665
Variance 1.860125925925926
(np.float64(2.5), np.float64(1.875))
