In [None]:
# Define the z-statistic and explain its relationship to the standard normal distribution. How is the z-statistic used in hypothesis testing?

The z-statistic measures how many standard deviations a data point (or sample mean) is from the population mean.

It is part of the standard normal distribution, which has a mean of 0 and a standard deviation of 1.

Formula:

𝑧
=
𝑥
−
𝜇
𝜎
/
𝑛
z=
σ/
n
​

x−μ
​

where:

𝑥
x = sample mean
𝜇
μ = population mean
𝜎
σ = population standard deviation
𝑛
n = sample size
In hypothesis testing, the z-statistic helps determine if a sample mean differs significantly from the population mean. This is used in z-tests when the population standard deviation is known.


Suppose we want to calculate the z-statistic for a sample mean of 105, a population mean of 100, a population standard deviation of 15, and a sample size of 25.


from math import sqrt

# Given values
sample_mean = 105
population_mean = 100
population_std_dev = 15
sample_size = 25

# Calculate z-score
z_statistic = (sample_mean - population_mean) / (population_std_dev / sqrt(sample_size))

print(f"Z-statistic: {z_statistic}")

In [None]:
# What is a p-value, and how is it used in hypothesis testing? What does it mean if the p-value is very small (e.g., 0.01)?

The p-value represents the probability of observing the given result, or something more extreme, if the null hypothesis is true.
A small p-value (e.g., 0.01) indicates strong evidence against the null hypothesis. Typically, if the p-value is smaller than a predefined significance level (
𝛼
α), like 0.05, we reject the null hypothesis.

Let’s calculate the p-value from a z-statistic using scipy.


from scipy import stats

# Calculate the p-value for the z-statistic from Question 1
p_value = 2 * (1 - stats.norm.cdf(abs(z_statistic)))  # Two-tailed test

print(f"P-value: {p_value}")
If the p-value is less than 0.05, we reject the null hypothesis

In [None]:
# Compare and contrast the binomial and Bernoulli distributions.

Bernoulli Distribution: A special case of the binomial distribution with only one trial (
𝑛
=
1
n=1). The outcome is either success (1) or failure (0).
Binomial Distribution: Models the number of successes in
𝑛
n independent trials, with the probability of success
𝑝
p in each trial.
For
𝑛
=
1
n=1, the binomial distribution is equivalent to the Bernoulli distribution.

In [None]:
# Under what conditions is the binomial distribution used, and how does it relate to the Bernoulli distribution?

Binomial Distribution is used when:
There are
𝑛
n independent trials.
Each trial has only two possible outcomes (success or failure).
The probability of success
𝑝
p remains constant across trials.
Relationship to Bernoulli: The Bernoulli distribution is a binomial distribution with only one trial.

# Simulate a Bernoulli distribution using numpy
import numpy as np

# Bernoulli trial (n=1)
p = 0.6  # Probability of success
bernoulli_trial = np.random.binomial(1, p, 10)
print(f"Bernoulli Trials: {bernoulli_trial}")

# Binomial distribution for n=10 trials
n_trials = 10
binomial_trials = np.random.binomial(n_trials, p, 1000)
print(f"Binomial Trials (1000 samples): {binomial_trials[:10]}")  # Show first 10

In [None]:
# What are the key properties of the Poisson distribution, and when is it appropriate to use this distribution?

Poisson Distribution models the number of events occurring in a fixed interval of time or space, given that the events happen independently and at a constant rate.
Properties:
Discrete distribution.
Mean and variance are both equal to
𝜆
λ (average rate of occurrence).
It is used when modeling the number of rare events (e.g., number of accidents per hour).

# Simulating Poisson distribution
lambda_value = 3  # Average number of events
poisson_data = np.random.poisson(lambda_value, 1000)

# Plotting the distribution
import matplotlib.pyplot as plt
plt.hist(poisson_data, bins=30, density=True)
plt.title("Poisson Distribution")
plt.show()

In [None]:
#  Define "probability distribution" and "probability density function" (PDF). How does a PDF differ from a probability mass function (PMF)?

A probability distribution shows how probabilities are distributed over the possible values of a random variable.
Probability Density Function (PDF): Applies to continuous random variables and describes the likelihood of a variable taking a specific value. The area under the curve equals 1.
Probability Mass Function (PMF): Applies to discrete random variables and gives the probability that a discrete variable is exactly equal to some value.

In [None]:
# Explain the Central Limit Theorem (CLT) with an example.

The Central Limit Theorem (CLT) states that the sampling distribution of the sample mean will be approximately normally distributed if the sample size is large enough, regardless of the population's distribution.
This theorem allows us to make inferences about population means even if the population distribution is not normal.

# Central Limit Theorem Example
population_data = np.random.exponential(scale=2, size=10000)  # Non-normal population

sample_means = [np.mean(np.random.choice(population_data, size=30)) for _ in range(1000)]

# Plotting the sampling distribution of the sample mean
plt.hist(sample_means, bins=30, density=True)
plt.title("Sampling Distribution of Sample Means (CLT)")
plt.show()

In [None]:
# Compare z-scores and t-scores. When should you use a z-score, and when should a t-score be applied instead?

Z-score is used when the population standard deviation is known, and the sample size is large (
𝑛
>
30
n>30).
T-score is used when the population standard deviation is unknown, and the sample size is small (
𝑛
<
30
n<30).
The t-distribution is wider than the z-distribution due to additional uncertainty in estimating the population standard deviation.



In [None]:
#  Given a sample mean of 105, a population mean of 100, a standard deviation of 15, and a sample size of 25, calculate the z-score and p-value. Based on a significance level of 0.05, do you reject or fail to reject the null hypothesis?

Z-score measures how far the sample mean is from the population mean in terms of standard deviations.
P-value tells us the probability of getting a result as extreme as the observed one if the null hypothesis is true.
If the p-value is less than the significance level (
𝛼
=
0.05
α=0.05), we reject the null hypothesis.

from math import sqrt
from scipy import stats

# Given data
sample_mean = 105
population_mean = 100
std_dev = 15
sample_size = 25
alpha = 0.05

# Calculate z-score
z_score = (sample_mean - population_mean) / (std_dev / sqrt(sample_size))

# Calculate p-value for two-tailed test
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

# Hypothesis decision
result = "Reject the null hypothesis" if p_value < alpha else "Fail to reject the null hypothesis"
print(f"Z-score: {z_score}, P-value: {p_value}, Decision: {result}")

Z-Score: 1.67
P-Value: 0.095
Decision: Fail to reject the null hypothesis (no significant difference).

In [None]:
# Simulate a binomial distribution with 10 trials and a probability of success of 0.6 using Python. Generate 1,000 samples and plot the distribution. What is the expected mean and variance?

Binomial Distribution: It models the number of successes in a fixed number of independent trials, where each trial has two possible outcomes (success/failure).

Mean (Expected Value):
𝜇
=
𝑛
⋅
𝑝
μ=n⋅p
Variance:
𝜎
2
=
𝑛
⋅
𝑝
⋅
(
1
−
𝑝
)
σ
2
 =n⋅p⋅(1−p)
Where:

𝑛
n is the number of trials (10),
𝑝
p is the probability of success (0.6).
Expected Mean and Variance:
Mean:
𝜇
=
10
×
0.6
=
6
μ=10×0.6=6
Variance:
𝜎
2
=
10
×
0.6
×
0.4
=
2.4
σ
2
 =10×0.6×0.4=2.4

import numpy as np
import matplotlib.pyplot as plt

# Parameters
n = 10  # Number of trials
p = 0.6  # Probability of success
samples = 1000  # Number of samples

# Generate binomial distribution samples
binom_samples = np.random.binomial(n, p, samples)

# Plot the distribution
plt.hist(binom_samples, bins=range(n+2), density=True, edgecolor='black', alpha=0.7)
plt.title("Binomial Distribution (n=10, p=0.6)")
plt.xlabel("Number of successes")
plt.ylabel("Probability")
plt.show()

# Calculate mean and variance of the distribution
mean = np.mean(binom_samples)
variance = np.var(binom_samples)
print(f"Simulated Mean: {mean}, Simulated Variance: {variance}")

Expected Mean: 6
Expected Variance: 2.4
The code generates 1,000 samples from the binomial distribution and plots the histogram. The calculated mean and variance from the simulation should closely match the expected values.






