# <font color="darkblue"> Bayes Theorem

## **Bayes Theorem for the Discrete Case**

In the discrete case, where the parameter $ \theta $ takes discrete values, Bayes' Theorem can be derived from the **definition of conditional probability**.

## **Statement of Bayes Theorem for the Discrete Case**

Bayes' Theorem in the discrete case is given by:

$$
P(\theta | \text{X}) = \frac{P(\text{X} | \theta) P(\theta)}{P(\text{X})}
$$

Where:
- $ P(\theta | \text{X}) $ is the **posterior probability** of $ \theta $ given the data.
- $ P(\text{X} | \theta) $ is the **likelihood**, or the probability of observing the data given $ \theta $.
- $ P(\theta) $ is the **prior probability** of $ \theta $ before observing the data.
- $ P(\text{X}) $ is the **marginal likelihood**, which ensures that the posterior is a valid probability distribution. It is the sum of the likelihood times the prior over all possible values of $ \theta $:

$$
P(\text{X}) = \sum_{\theta} P(\text{X} | \theta) P(\theta)
$$

### <font color="darkgreen"> **Proof of Bayes Theorem (Discrete Case)**

We start with the definition of **conditional probability**:

$$
P(\theta | \text{X}) = \frac{P(\theta \cap \text{X})}{P(\text{X})}
$$

By the definition of joint probability, we know that:

$$
P(\theta \cap \text{X}) = P(\text{X} | \theta) P(\theta)
$$

Thus, substituting into the original equation, we get:

$$
P(\theta | \text{X}) = \frac{P(\text{X} | \theta) P(\theta)}{P(\text{X})}
$$

This is the desired form of **Bayes' Theorem** in the discrete case.

# <font color="darkblue">Understanding Bayesian Approach using Discrete Probabilities

1. Parameter is a random variable

1. Parametric values can be chosen arbitrarily (discretized for a continuous variable)

1. Probable weights (prior probabiliteis) can be assigned to each of these values

1. Compute the likelihood after observing the data for the above set of parametric values

1. Find the posterior probabilites using Bayes Formula

# <font color="darkblue"> Example for Discrete case

Assume that we are interested to find the probability that a city would receive rain on a given day, say Jan 15. From the earlier recorded data of 150 years, it was found that 23 years have recorded rain on Jan 15 (amount of rain fall is not considered). A team of experts have observed that the chances of a rain fall on Jan 15 would be as low as 1%, 5%, 10% and a maximum of 20%. Also, they have noted that the probabilities for the above four plausible values are 0.70, 0.20, 0.07, and 0.03 respectively. This may indicate that the experts might note a lower probability for 10% and 20% chance of rain. Based on these two information (sample data + experts' assessment) we can find the probability of a possible rain fall on Jan 5.

## Solution

### Prior:

$\theta = {0.01,  0.05, 0.10, 0.20}$ with respective probabilites $p(\theta) = 0.70, 0.20, 0.07, 0.03$

### Likelihood:

Since the random variable is event of rainfall that has a binary outcome Yes / No and the parameter is proportion of success we would assume a Biomial distribution for this scenario

$\mathscr{L}(\theta)=\binom{n}{x} \theta^x (1-\theta)^{n-x}$ with $n = 150$ and $x = 23$

For the different values of $\theta$ let us calculate this 

1. For $\theta = 0.01$, $P(\text{X}|\theta= 0.01) = \binom{150}{23} 0.01^{23} (1-0.01)^{150-23} = 2.047 \times 10^{-20}$

2. For $\theta = 0.05$, $P(\text{X}|\theta= 0.05) = \binom{150}{23} 0.05^{23} (1-0.05)^{150-23} = 1.30 \times 10^{-6}$

3. For $\theta = 0.10$, $P(\text{X}|\theta== 0.10) = \binom{150}{23} 0.10^{23} (1-0.10)^{150-23} = 0.01133$

4. For $\theta = 0.20$, $P(\text{X}|\theta=0.20) = \binom{150}{23} 0.20^{23} (1-0.20)^{150-23} = 0.03031$

### Posterior

$$P(\theta | \text{X}) = \frac{P(\text{X} | \theta) P(\theta)}{P(\text{X})}$$

First let us calculate the denominator term $P(\text{X}) = \sum_{\theta} P(\text{X} | \theta) P(\theta)$ using the above prior and likelihood values 

$$P(\text{X})=(0.70)(2.047 \times 10^{-20})+90.20)(1.30 \times 10^{-6})+(0.07)(0.01133)+(0.03)(0.03031) = 0.00170266$$

Hence,

1. For $\theta = 0.01$, $P(\theta|\text{X}) = \frac{(0.70)(2.047 \times 10^{-20})}{0.00170266} = 8.416 \times 10^{-18}$

In a similar way we shall calculate $P(\theta|\text{X})$ for other values of $\theta = 0.05, 0.1, 0.2$ and the posterior probabilities are

0.00015 $(\theta = 0.05)$

0.4658  $(\theta = 0.1)$

0.5340  $(\theta = 0.2)$

From these posterior we could observe that data $(x = 23 n = 150)$ has been used to update the prior beliefs

Prior belief was rain fall on Jan 15 might be rare (probability for 1% chance of rain fall was higher 0.70)

But posterior probability indicates that there could be 10 to 20 % chance for the rain fall which is drawn from the higher posterior probability

# <font color="darkblue"> Revisit the Practical Scenario

[Honda Vs Hero](https://auto.economictimes.indiatimes.com/news/two-wheelers/motorcycles/honda-disputes-hero-splendor-ismarts-fuel-economy-claim-of-102-5-km/litre/47139285)

**Japanese auto maker Honda has questioned its erstwhile Indian partner Hero's claim of <font color="red">102.5 km/litre </font> fuel economy rate for Splendor iSmart bike, saying "such claims are misleading and are far from reality".**


<font color="green">**The Indian firm, Hero MotoCorp on its part hit back saying its fuel efficiency values were certified by iCAT (International Centre for Automotive Technology)**

CAT may test mileage of the bikes of the same brand

Output would be the mileage in <font color="red">---- km/litre </font> for each bike

We may be interested to test the claim in an analytical manner

# <font color="darkblue"> Possible Metrics

1.   Number of bikes having the mileage as per Hero's claim out of a batch of bikes to test

1.   Number of bikes having the mileage as per Hero's claim from an unknown number of bikes to be tested (CAT may decide)

1.   Average mileage from an arbitrary number of bikes to be tested

In [5]:
import numpy as np
import pandas as pd
import scipy as sc
from scipy import stats as st
import matplotlib.pyplot as plt

# <font color="darkblue"> Data Type I

Batch size known, say $n = 100$

Looking for the **number** of bikes have passed the test, say $x = 35$

So underlying data distribution is Binomial$(n,\theta)$ and the aim is to estimate (getting the inference) the parameter $\theta$

In [None]:
theta=np.array([0.1,0.4, 0.5,0.75, 0.9])
prior_theta=np.array([0.01,0.02,0.52,0.2,0.25])

n=100
x=35

likel_b=st.binom.pmf(x,n,theta)

poste_theta=likel_b*prior_theta/sum(likel_b*prior_theta)
poste_theta=np.round(poste_theta,3)

r1=pd.concat([pd.DataFrame(theta),pd.DataFrame(prior_theta),pd.DataFrame(poste_theta)],axis=1)
r1.columns=["Parameter","Prior","Posterior"]
print(r1.to_string(index=False))

In [None]:
fig = plt.figure(figsize = (20, 10))
plt.plot(theta, poste_theta, 'x--', label= 'Posterior')
plt.plot(theta, prior_theta, 'o-', label= 'Prior')
plt.legend()
plt.show()

# <font color="darkblue"> Data Type II

**Number** of bikes passed the test assuming that as many bikes could be tested

Typically we are interested to estimate average number of bikes that have passed the test

Here,it is a count model and we would model this with Poisson distribution to estimate $Pr(\theta|\text{X})$

In [9]:
theta=np.array([30, 40, 50,60])
prior_theta=np.array([0.4,0.25,0.18,0.17])

x=st.randint.rvs(low=30,high=100,size=1)

likel_p=st.poisson.pmf(x,theta)

poste_theta=likel_p*prior_theta/sum(likel_p*prior_theta)
poste_theta=np.round(poste_theta,3)

r2=pd.concat([pd.DataFrame(theta),pd.DataFrame(prior_theta),pd.DataFrame(poste_theta)],axis=1)
r2.columns=["Parameter","Prior","Posterior"]
print(r2.to_string(index=False))

 Parameter  Prior  Posterior
        30   0.40      0.000
        40   0.25      0.068
        50   0.18      0.473
        60   0.17      0.459


# <font color="darkblue"> Data Type III

**Average Mileage** of a batch of bikes passed the test

Batch size pre-defined

Since we are trying to model a numeric variable (**Average Mileage**) we could use a Normal distribution with parameters $\theta, \sigma^2$

In [8]:
t1=[104.54,  99.10 ,  99.70,  99.71, 104.26] #Mileage - plausible values
theta=np.array(t1)
prior_theta=np.array([0.4,0.15,0.1,0.18,0.17])

x=101.25 #Observed average mileage

likel_n=st.norm.pdf(x,theta)

poste_theta=likel_n*prior_theta/sum(likel_n*prior_theta)
poste_theta=np.round(poste_theta,3)

r3=pd.concat([pd.DataFrame(theta),pd.DataFrame(prior_theta),pd.DataFrame(poste_theta)],axis=1)
r3.columns=["Parameter","Prior","Posterior"]
print(r3.to_string(index=False))

 Parameter  Prior  Posterior
    104.54   0.40      0.017
     99.10   0.15      0.144
     99.70   0.10      0.290
     99.71   0.18      0.531
    104.26   0.17      0.018
