# Probability Mass Function (PMF)

**When to Use**: For discrete variables.

**What It Does**: Gives the probability that a discrete random variable is exactly equal to some value.

**Important Note**: The sum of all probabilities in a PMF is 1.

**Example**: Rolling a die, where the outcome is a discrete number between 1 and 6.

## 1. Calculating the PMF from a Dataset

For discrete random variables, the Probability Mass Function (PMF) can be calculated by counting the occurrences of each value in the dataset and then dividing by the total number of observations.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter

# Sample dataset
data = np.random.randint(0, 10, size=1000)

# Calculate the PMF
counts = Counter(data)
total_count = sum(counts.values())
pmf = {k: v / total_count for k, v in counts.items()}

# Plot the PMF
plt.bar(pmf.keys(), pmf.values(), label='Estimated PMF')
plt.title('PMF Estimation')
plt.xlabel('Value')
plt.ylabel('Probability')
plt.legend()
plt.show()

## 2. Differentiating Between Probability and Probability Mass

**Probability**: The likelihood that a given event will occur. For discrete variables, this is straightforward to calculate.

Example: The probability of rolling a 3 on a six-sided die is 
1
6
6
1

**Probability Mass Function (PMF)**: For discrete variables, the PMF gives the probability of each possible outcome. It assigns a probability to each distinct value.

## 3. Why It Is Possible to Calculate the Probability for a Specific Point with Discrete Variables

For discrete variables, the probability of the variable taking a specific value is non-zero. This is because there are a finite number of possible values the variable can take.

## 4. Estimating Probabilities for Discrete Variables
For discrete variables, each specific value has a distinct probability.

## 5. Common Discrete Distributions and Their Use Cases


### Binomial Distribution:

Use Case: Modeling the number of successes in a fixed number of trials.

#### Key characteristics of situations that follow a binomial distribution
* Fixed number of trials: There's a set number of times the event is repeated (e.g., number of patients in a trial, number of light bulbs tested).
* Independent trials: The outcome of one trial doesn't affect the outcome of other trials (e.g., one patient's response to a drug doesn't influence another patient's response).
* Two possible outcomes: Each trial has only two possible results (e.g., success/failure, yes/no, make/miss).
* Constant probability of success: The probability of success remains the same for each trial (e.g., the probability of a light bulb being defective remains constant throughout the batch).

**Real Life Examples**
* Medical Trials: When testing new drugs, researchers use binomial distributions to understand how likely it is for a certain number of patients to respond well to the treatment. Each patient in the trial is like a test, where the outcome can be success (the drug works) or failure (it doesn't).

* Quality Control: In factories, binomial distributions help ensure product quality. For example, if a factory tests light bulbs and each bulb can either pass (it works) or fail (it's defective), binomial distributions help predict how many defective bulbs might be found in a batch based on a sample.

* Survey Responses: Surveys with yes/no questions use binomial distributions to estimate the number of "yes" responses. For instance, if a survey asks people if they support a policy change, each person's answer is like a trial with two outcomes: yes or no.

* Sports Analytics: Binomial distributions are used in sports to analyze outcomes like free throws in basketball. Each free throw is a trial where the outcome can be success (the shot goes in) or failure (it misses). Analysts use these distributions to predict a player's shooting percentage or the likelihood of a team making a certain number of shots.

* Genetics: When studying inheritance of traits, binomial distributions help model the probability of offspring inheriting a specific trait from their parents. Each offspring is like a trial where the outcome can be inheriting the trait or not inheriting it, depending on the genes from each parent.


**PMF** =P(X=k)=( 
k
n
​
 )p 
k
 (1−p) 
n−k

In [None]:
from scipy.stats import binom

n, p = 10, 0.5  # number of trials, probability of success
k = np.arange(0, n + 1)
binom_pmf = binom.pmf(k, n, p)

plt.bar(k, binom_pmf, label='Binomial PMF')
plt.title('Binomial Distribution PMF')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.legend()
plt.show()


## Poisson Distribution
Use Case: Modeling the number of times an event happens in a fixed interval of time or space.

**Key Characteristics**

* Discrete: It deals with whole numbers of events (you can't have half an event).
* Independent Events: The occurrence of one event doesn't affect the probability of another event happening.
* Constant Rate: The average rate of events happening is constant over the given time or space.
* Rare Events: The probability of an event happening in a short interval is proportional to the length of the interval.

**Real-Life Examples**
* Number of customer calls to a call center in an hour: If you know the average number of calls the call center receives per hour, you can use the Poisson distribution to predict the probability of receiving a certain number of calls in any given hour. This helps in staffing and resource allocation.

* Number of typos on a page of a book: If you have the average number of typos per page in a book, the Poisson distribution can tell you the likelihood of finding a specific number of typos on a single page. This is useful for editors and publishers in quality control.

* Number of cars passing through a toll booth in a minute: Knowing the average rate of cars passing through a toll booth per minute allows you to use the Poisson distribution to estimate the probability of observing a particular number of cars passing through in any given minute. This is helpful for traffic management and planning.

* Number of mutations in a DNA strand: The Poisson distribution can be applied to model the probability of a specific number of mutations occurring in a DNA strand of a certain length, given the average mutation rate. This is important in genetic research and understanding genetic variability.

* Number of earthquakes in a year: If you know the average rate of earthquakes per year in a specific region, the Poisson distribution helps calculate the probability of experiencing a certain number of earthquakes in that region within a year. This aids in seismic hazard assessment and disaster preparedness.

**PMF** =P(X=k)= 
k!
λ 
k
 e 
−λ
 


In [None]:
from scipy.stats import poisson

λ = 5  # rate (lambda)
k = np.arange(0, 20)
poisson_pmf = poisson.pmf(k, λ)

plt.bar(k, poisson_pmf, label='Poisson PMF')
plt.title('Poisson Distribution PMF')
plt.xlabel('Number of Events')
plt.ylabel('Probability')
plt.legend()
plt.show()


## Geometric Distribution
Use Case: Modeling the number of trials until the first success.

**Key Characteristics**

* Discrete: It deals with whole numbers of trials (you can't have half a trial).
* Independent Trials: The outcome of one trial doesn't affect the outcome of other trials.
* Two Outcomes: Each trial has only two possible results (success or failure).
* Constant Probability of Success: The probability of success remains the same for each trial.
* Waiting Time: The random variable is the number of trials needed to get the first success.

### Real-Life Examples
* Number of coin flips until you get heads: When flipping a fair coin over and over, the number of flips needed to get the first heads follows a geometric distribution. This means each flip is like a trial with two outcomes: heads or tails. The geometric distribution helps predict how many flips it might take on average to get heads for the first time.

* Number of raffle tickets bought until you win: If you're buying raffle tickets and each ticket has a chance to win, the number of tickets you need to buy before winning follows a geometric distribution. It helps estimate how many tickets you might need to buy, on average, before winning a prize.

* Number of job interviews until you get a job offer: When applying for jobs, the number of interviews you attend before receiving your first job offer can be modeled with a geometric distribution. Each interview is like a trial with two outcomes: getting an offer or not. The distribution helps estimate how many interviews you might go through before getting an offer.

* Number of attempts to start a faulty car until it starts: If your car has trouble starting, the number of times you try to start it before it finally starts can follow a geometric distribution. Each attempt is like a trial with two outcomes: the car starts or it doesn't. The distribution helps predict how many attempts you might need, on average, before the car starts.

* Number of rolls of a die until you roll a 6: When rolling a die repeatedly, the number of rolls it takes to get a 6 for the first time follows a geometric distribution. Each roll is a trial with six possible outcomes (one for each face of the die). The distribution helps estimate how many rolls, on average, it might take to get a 6.

**PMF** = P(X=k)=(1−p) 
k−1
 p 

where 𝑝
* p is the probability of success on each trial.
𝑘
* k is the number of trials until the first success.

In [None]:
from scipy.stats import geom

p = 0.5  # probability of success
k = np.arange(1, 11)
geom_pmf = geom.pmf(k, p)

plt.bar(k, geom_pmf, label='Geometric PMF')
plt.title('Geometric Distribution PMF')
plt.xlabel('Number of Trials')
plt.ylabel('Probability')
plt.legend()
plt.show()


## Negative Binomial Distribution
Use Case: Modeling the number of trials until a specified number of successes occur.
### Key Characteristics

* Discrete: It deals with whole numbers of failures (you can't have half a failure).
* Independent Trials: The outcome of one trial doesn't affect the outcome of other trials.
* Two Outcomes: Each trial has only two possible results (success or failure).
* Constant Probability of Success: The probability of success remains the same for each trial.
* Fixed Number of Successes: The distribution is defined by the desired number of successes.
*Variable Number of Failures: The random variable is the number of failures before achieving the specified number of successes.

### Real-Life Examples
* Number of misses before making a certain number of free throws: In basketball, if a player has a consistent free throw percentage, the negative binomial distribution can predict how many missed free throws they will have before making a specific number of successful ones. This helps coaches and players understand shooting consistency over multiple attempts.

* Number of unsuccessful sales calls before closing a deal: For salespeople, the negative binomial distribution models the number of unsuccessful calls they make before successfully closing a certain number of deals. It helps sales teams estimate the persistence and effort needed to achieve their sales goals.

* Number of failed attempts before fixing a bug: In software development, the negative binomial distribution can be used to estimate how many unsuccessful attempts a programmer might make before successfully fixing a bug. It helps in managing time and resources during debugging processes.

* Number of lost games before winning a tournament: In sports tournaments, teams may face losses before securing enough wins to advance. The negative binomial distribution helps in predicting how many games a team might lose before achieving a specific number of wins necessary to move forward in the tournament.

* Number of unsuccessful attempts before getting a research grant: In academic research, the negative binomial distribution can model the number of unsuccessful grant proposals a researcher submits before receiving funding for a certain number of them. It assists researchers in understanding the likelihood of success based on past experiences.

**PMF** = P(X=k)=( 
r−1
k+r−1
​
 )p 
r
 (1−p) 
k
 

* p is the probability of success on each trial.
𝑟
* r is the number of successes.
𝑘
* k is the number of failures before achieving 
𝑟
r successes.

In [1]:
from scipy.stats import nbinom

r, p = 5, 0.5  # number of successes, probability of success
k = np.arange(0, 20)
nbinom_pmf = nbinom.pmf(k, r, p)

plt.bar(k, nbinom_pmf, label='Negative Binomial PMF')
plt.title('Negative Binomial Distribution PMF')
plt.xlabel('Number of Failures')
plt.ylabel('Probability')
plt.legend()
plt.show()


NameError: name 'np' is not defined