# Formative 3 - Probability Distributions, Bayesian Probability, and Gradient Descent Implementation
We as group 10 will be working on a comprehensive assignment designed to deepen our understanding in probability and machine learning.

This notebook presents our implementations and analysis for the following three components:

Part 1: Binomial Distribution.

Part 2: Bayesian Probability Application.

Part 3: Gradien Descent for Linear Regression.

**Part 1: Binomial Distribution – Pancreatic Cancer Risk Analysis**

Problem Statement
The lifetime risk of developing pancreatic cancer is about 1 in 78 (approximately 1.28%). Suppose we randomly sample 200 people. Let X be the number of people in this sample who will develop pancreatic cancer.

1. Binomial Distribution for X
Since each person either develops pancreatic cancer or does not, independently, and the probability of developing it is constant, the random variable X follows a Binomial distribution with

with parameters:

Number of trials:
𝑛
=
200

Probability of success (developing cancer):
𝑝
=
1/78
≈
0.0128

Using the formulas, calculate the (i) mean and (ii) standard deviation of X.

 find the probability that at most eight people develop pancreatic cancer
Is it more likely that five or six people will develop pancreatic cancer? Justify your answer numerically.


# Step 1: Define factorial and combination functions

In [None]:
def factorial(n):
    result = 1
    for i in range(2, n+1):
        result *= i
    return result

def combination(n, k):
    return factorial(n) // (factorial(k) * factorial(n - k))

#Step 2: Define binomial PMF function


In [None]:
def binomial_pmf(n, k, p):
    return combination(n, k) * (p ** k) * ((1 - p) ** (n - k))


#Step 3: Calculate mean and standard deviation


In [None]:
n = 200
p = 1 / 78  # approximately 0.0128

mean = n * p
std_dev = (n * p * (1 - p)) ** 0.5

print(f"Mean (expected number): {mean:.4f}")
print(f"Standard deviation: {std_dev:.4f}")


#Step 4: Calculate probability that at most 8 people develop pancreatic cancer

In [None]:
prob_at_most_8 = 0
for k in range(0, 9):
    prob_at_most_8 += binomial_pmf(n, k, p)

print(f"Probability that at most 8 people develop pancreatic cancer: {prob_at_most_8:.5f}")

#Step 5: Compare probability of exactly 5 vs. 6 people developing pancreatic cancer


In [None]:
prob_5 = binomial_pmf(n, 5, p)
prob_6 = binomial_pmf(n, 6, p)

print(f"P(X = 5): {prob_5:.5f}")
print(f"P(X = 6): {prob_6:.5f}")

if prob_5 > prob_6:
    print("It is more likely that 5 people will develop pancreatic cancer.")
else:
    print("It is more likely that 6 people will develop pancreatic cancer.")

#Step 6: Visualize the distribution using Matplotlib

In [None]:
import matplotlib.pyplot as plt

# Choose k values around the mean for better visualization
k_values = list(range(0, 16))
probabilities = [binomial_pmf(n, k, p) for k in k_values]

plt.figure(figsize=(10, 6))
plt.bar(k_values, probabilities, color='skyblue', edgecolor='black')
plt.xlabel('Number of People Developing Pancreatic Cancer (k)')
plt.ylabel('Probability P(X = k)')
plt.title(f'Binomial Distribution (n={n}, p={p:.4f}) - Pancreatic Cancer Risk')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()
