# **Multinomial Distribution**

The multinomial distribution is a generalization of the binomial distribution. While the binomial distribution models the number of successes in a fixed number of independent Bernoulli trials with two possible outcomes (success or failure), the multinomial distribution models the outcomes of 
𝑛 independent trials where each trial can result in one of 𝑘 possible categories. It is used to describe the probabilities of obtaining a specific combination of counts for the 
𝑘 categories.



## **Key Characteristics**

1. **Number of Trials (n)**: The fixed number of independent trials. 
2. **Number of Categories (k)**: The number of possible outcomes for each trial. 
3. **Probability of Each Category (p1, p2,...pk)**: The probability of each category, where pi = 1.

## **Probability Mass Function (PMF)**
The probability of observing counts (x1, x2, ..., xk) for the k categories after n trials is given by:
$$
P(X_1 = x_1, X_2 = x_2, \ldots, X_k = x_k) = \frac{n!}{x_1! x_2! \cdots x_k!} p_1^{x_1} p_2^{x_2} \cdots p_k^{x_k}
$$
where: 
- xi is the number of trials resulting in the i-th category. 
- xi = n 

## **Mean and Variance**
- **Mean**: E[Xi] = npi
- **Variance**: Var(X): npi(1-pi)
- **Covariance**: Cov(X1i, Xj) = -npipj for i=/j.

## **Example**
Suppose you roll a fair six-sided die 10 times. The outcome of each roll can be one of six categories (1 through 6). The probabilities of each category are p1 = p2 = ... = p6 = 1/6. 

The probability of rolling exactly two 1s, three 2s, one 3, one 4, two 5s, and one 6 is:



In [1]:
import numpy as np
from scipy.stats import multinomial

# Number of trials
n = 10

# Probabilities of each category
p = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]

# Generate random samples
samples = np.random.multinomial(n, p, size=1000)

# Calculate PMF for a specific outcome
x = [2, 3, 1, 1, 2, 1]
pmf_value = multinomial.pmf(x, n, p)
print(f"P(X = {x}) = {pmf_value}")


P(X = [2, 3, 1, 1, 2, 1]) = 0.00250057155921354


## **Use Cases**
- **Natural Language Processing**: Modeling word frequencies in text.
- **Genetics**: Modeling the distribution of different genotypes in a population.
- **Quality Control**: Modeling the number of defective items in different categories in a batch.
- **Marketing**: Analyzing customer preferences across multiple product categories.