# **Binomial Distribution**

The binomial distribution is a discrete probability distribution that models the number of successes in a fixed number of independent trials of a binary (yes/no) experiment. It is widely used in statistics to describe the distribution of binary outcomes.

## **Key Characteristics**
1. **Fixed Number of Trials (n)**: The experiment is performed a fixed number of times.
2. **Binary Outcomes**: Each trial has two possible outcomes, typically called "success" and "failure".
3. **Constant Probability (p)**: The probability of success is the same for each trial.
4. **Independence**: The outcomes of each trial are independent of each other.

## **Probability Mass Function (PMF)**:

The probability of getting exactly 𝑘 successes in n trials is given by the PMF:

$$
P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}
$$

where (𝑛 𝑘) (read as "n choose k") is the binomial coefficient, calculated as:

$$
\binom{n}{k} = \frac{n!}{k!(n-k)!}
$$

## **Cumulative Distribution Function (CDF)**
The cumulative probability of getting up to 𝑘 successes is given by the sum of the PMF for all values from 0 to 𝑘:
$$
F(k; n, p) = P(X \leq k) = \sum_{i=0}^{k} \binom{n}{i} p^i (1 - p)^{n - i}
$$

## **Mean and Variance**
- **Mean (Expected Value)**: E[X]=np
- **Variance**: Var(X)=np(1−p)

### **Example**:

Suppose you flip a fair coin 10 times. What is the probability of getting exactly 6 heads?

- **Number of trials (n)**: 10
- **Probability of success (p)**: 0.5 (since the coin is fair)



In [1]:
from scipy.stats import binom

# Number of trials
n = 10
# Probability of success
p = 0.5
# Number of successes
k = 6

# Calculate the probability
prob = binom.pmf(k, n, p)
print("Probability of exactly 6 heads:", prob)


Probability of exactly 6 heads: 0.2050781249999999


**Basic Concept**

- **What is a Binomial distribution?**
    - A Binomial distribution is a discrete probability distribution that models the number of successes in a fixed number of independent Bernoulli trials, each with the same probability of success.

- **How does the Binomial distribution differ from the Bernoulli distribution?**
    - The Bernoulli distribution models a single trial with two outcomes, whereas the Binomial distribution models the number of successes in a fixed number of independent Bernoulli trials

- **What are the parameters of a binomial distribution?**
    - The parameters of binomial of a binomial distribution are the number of trials (n) and the probability of success on each trial (p).

- **How does the Binomial distribution behave as the number of trials increases?**
    - As the number of trials 𝑛 increases, the Binomial distribution approaches the Normal distribution, especially when np and n(1−p) are both large (Central Limit Theorem).

- **Can you give a real-world example of a scenario that follows a Binomial distribution?**
    - An example is the number of heads in 10 flips of a fair coin.

**Application and Usage**

- **When would you use a Binomial distribution in a data science project?**

    - You would use it to model binary outcomes over multiple trials, such as predicting the number of positive responses in a marketing campaign or the number of defective items in a production batch.

- **Can you describe a business scenario where a Binomial distribution might be applicable?**

    - A scenario is evaluating the success rate of a new sales strategy by counting the number of successful sales calls out of a total number of calls.

- **How is the Binomial distribution used in quality control processes?**

    - It is used to model the number of defective items in a sample from a production process to determine if the process is operating within acceptable quality levels.

- **What assumptions must be met for a Binomial distribution to be an appropriate model?**

    - The trials must be independent.
    - Each trial must have the same probability of success.
    - The number of trials 𝑛 must be fixed.

**Statistical Inference**

- **How would you test if a dataset follows a Binomial distribution?**
    - You could use a goodness-of-fit test, such as the Chi-square test, to compare the observed frequencies of successes to the expected frequencies under a Binomial distribution.


 - **What is the maximum likelihood estimator (MLE) for the parameters of a Binomial distribution?**
    - The MLE for the probability of success p^= k/n , where 𝑘 is the number of successes and 𝑛 is the number of trials.


**Theoretical Questions**

- **What is the relationship between the Binomial distribution and the Poisson distribution?**
    - The Poisson distribution can be considered as a limiting case of the Binomial distribution when the number of trials 𝑛 is large and the probability of success 𝑝 is small, such that λ=np remains constant.

- **How is a Poisson distribution different from a binomial distribution?** 
    - A Poisson distribution models the number of events occurring in a fixed interval of time or space, given a known average rate of occurrence, while a binomial distribution models the number of successes in a fixed number of trials with a constant probability of success.

- **How does the Binomial distribution relate to the Normal distribution?**
    - By the Central Limit Theorem, as the number of trials 𝑛 increases, the Binomial distribution approaches a Normal distribution with mean 𝑛𝑝 and variance np(1−p)

- **What is the law of large numbers and how does it apply to the Binomial distribution?**
    - The law of large numbers states that as the number of trials 𝑛 increases, the sample mean X/n of a Binomial random variable 𝑋 approaches the expected value 𝑝.



### How to test if the dataset follows a Binomial distribution?

**Example Scenario**
Suppose you have a dataset representing the number of successes in 10 trials for 100 different sets of trials. You want to test whether the number of successes in these trials follows a Binomial distribution with 
n=10 and p=0.5.

**Steps to Perform the Chi-square Goodness-of-Fit Test**

1. **Formulate Hypotheses**:

- Null Hypothesis (H0): The data follows a Binomial distribution with parameters 𝑛 and 𝑝.
- Alternative Hypothesis (H1): The data does not follow a Binomial distribution with parameters 𝑛 and 𝑝.

2. **Calculate Expected Frequencies**:

- Use the Binomial probability mass function (PMF) to calculate the expected frequency of each possible number of successes (0 to 10).

3. **Calculate Observed Frequencies**:

- Count the number of occurrences of each possible number of successes in the dataset.

4. **Compute the Chi-square Statistic**:
- Compare the observed and expected frequencies using the Chi-square formula:
$$
\chi^2 = \sum_{i=0}^{n} \frac{(O_i - E_i)^2}{E_i}
$$
where 𝑂𝑖 and 𝐸𝑖 are the observed and expected frequencies for each number of successes 𝑖.

5. **Determine the p-value**:
- Compare the computed Chi-square statistic to the Chi-square distribution with the appropriate degrees of freedom to find the p-value.

6. **Decision Making**
- Based on the p-value, we decide whether to reject the null hypothesis. If the p-value is less than the significance level (0.05), we reject the null hypothesis, indicating that the data does not follow a Binomial distribution. Otherwise, we fail to reject the null hypothesis, indicating that the data follows a Binomial distribution.

In [2]:
import numpy as np
from scipy.stats import binom, chisquare

# Sample data: number of successes in 10 trials for 100 sets of trials
# Example data (number of successes in 10 trials, repeated 100 times)
np.random.seed(0)  # for reproducibility
n = 10
p = 0.5
sample_data = np.random.binomial(n, p, 100)

# Calculate observed frequencies
observed_freq, _ = np.histogram(sample_data, bins=np.arange(n + 2) - 0.5)

# Calculate expected frequencies
expected_prob = binom.pmf(np.arange(n + 1), n, p)
expected_freq = expected_prob * len(sample_data)

# Perform the Chi-square test
chi2_stat, p_value = chisquare(f_obs=observed_freq, f_exp=expected_freq)

# Print the results
print("Chi-square statistic:", chi2_stat)
print("p-value:", p_value)

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The data does not follow a Binomial distribution.")
else:
    print("Fail to reject the null hypothesis: The data follows a Binomial distribution.")


Chi-square statistic: 6.231873015873011
p-value: 0.7954205259182582
Fail to reject the null hypothesis: The data follows a Binomial distribution.
