
## Q1: Define the z-statistic and explain its relationship to the standard normal distribution. How is the z-statistic used in hypothesis testing?

**Answer**: 
The z-statistic, or z-score, is a measure that indicates how many standard deviations an element is from the mean of a distribution. In hypothesis testing, the z-score is used to determine the probability of observing a sample statistic, assuming the null hypothesis is true.

The z-score is calculated as:
\[ z = \frac{(X - \mu)}{\sigma} \]
where:
- \( X \) is the sample mean,
- \( \mu \) is the population mean, and
- \( \sigma \) is the standard deviation.

In hypothesis testing, the z-statistic helps to assess if the sample mean significantly deviates from the population mean. If the z-score falls beyond a critical threshold (e.g., ±1.96 for a 95% confidence level), the null hypothesis may be rejected.



## Q2: What is a p-value, and how is it used in hypothesis testing? What does it mean if the p-value is very small (e.g., 0.01)?

**Answer**: 
The p-value is the probability of obtaining an observed sample result, or one more extreme, under the assumption that the null hypothesis is true. It quantifies the evidence against the null hypothesis.

- A small p-value (e.g., 0.01) indicates that the observed result is unlikely under the null hypothesis and suggests that we may reject the null hypothesis.
- Generally, if the p-value is below a predetermined significance level (\(\alpha\)), such as 0.05, we reject the null hypothesis. If the p-value is high, we fail to reject it.



## Q3: Compare and contrast the binomial and Bernoulli distributions.

**Answer**: 
- **Bernoulli distribution** models a single trial with two possible outcomes: success (1) with probability \( p \), or failure (0) with probability \( 1 - p \).
- **Binomial distribution** represents the number of successes in a fixed number of independent Bernoulli trials (\( n \)) with a constant probability of success (\( p \)).
- The Bernoulli distribution is essentially a special case of the binomial distribution where \( n = 1 \).



## Q4: Under what conditions is the binomial distribution used, and how does it relate to the Bernoulli distribution?

**Answer**: 
The binomial distribution is used when:
1. The experiment has a fixed number of trials, \( n \).
2. Each trial has two possible outcomes: success or failure.
3. The probability of success, \( p \), is the same for each trial.
4. The trials are independent.

Relationship to Bernoulli: The binomial distribution is the sum of independent Bernoulli trials.



## Q5: What are the key properties of the Poisson distribution, and when is it appropriate to use this distribution?

**Answer**: 
The Poisson distribution is used to model the number of times an event occurs within a fixed interval of time or space. Key properties include:
- It describes rare events.
- The mean (\( \lambda \)) and variance are equal.
- Suitable for events occurring independently at a constant average rate over time.

The distribution is appropriate when counting occurrences within fixed intervals, such as the number of emails received per hour.



## Q6: Define the terms "probability distribution" and "probability density function" (PDF). How does a PDF differ from a probability mass function (PMF)?

**Answer**: 
- A **probability distribution** describes how the values of a random variable are distributed.
- A **Probability Density Function (PDF)** represents the probability distribution for continuous variables. For a given value \( x \), the PDF describes the relative likelihood that the variable will be close to \( x \).

**Difference between PDF and PMF**:
- **PDF** is used for continuous variables, where probabilities are represented over intervals.
- **PMF** applies to discrete variables, giving the probability of each specific value.



## Q7: Explain the Central Limit Theorem (CLT) with an example.

**Answer**: 
The Central Limit Theorem (CLT) states that the sampling distribution of the sample mean will approximate a normal distribution as the sample size increases, regardless of the original distribution.

**Example**:
If we take multiple samples from a non-normal distribution (like a uniform distribution), the means of these samples will form an approximate normal distribution as the number of samples increases.



## Q8: Compare z-scores and t-scores. When should you use a z-score, and when should a t-score be applied instead?

**Answer**: 
- **Z-scores** are used when the population standard deviation is known and the sample size is large (n > 30).
- **T-scores** are applied when the sample size is small (n < 30) and the population standard deviation is unknown.



## Q9: Given a sample mean of 105, a population mean of 100, a standard deviation of 15, and a sample size of 25, calculate the z-score and p-value. Based on a significance level of 0.05, do you reject or fail to reject the null hypothesis?

**Task**: Write Python code to calculate the z-score and p-value for the given data.

**Answer**:
Let's calculate the z-score and p-value using the formulas:
\[ z = \frac{(X - \mu)}{\sigma / \sqrt{n}} \]

### Python Code:


In [None]:

from scipy.stats import norm

# Given data
sample_mean = 105
population_mean = 100
std_dev = 15
sample_size = 25

# Calculate the z-score
z_score = (sample_mean - population_mean) / (std_dev / (sample_size ** 0.5))
# Calculate the p-value for a two-tailed test
p_value = 2 * (1 - norm.cdf(abs(z_score)))

z_score, p_value, "Reject null hypothesis" if p_value < 0.05 else "Fail to reject null hypothesis"



## Q10: Simulate a binomial distribution with 10 trials and a probability of success of 0.6 using Python. Generate 1,000 samples and plot the distribution. What is the expected mean and variance?

**Task**: Use Python to generate the data, plot the distribution, and calculate the mean and variance.

**Answer**:
The mean and variance of a binomial distribution can be calculated as:
- Mean: \( np \)
- Variance: \( np(1 - p) \)

### Python Code:


In [None]:

import numpy as np
import matplotlib.pyplot as plt

# Parameters for binomial distribution
n_trials = 10
p_success = 0.6
n_samples = 1000

# Simulate binomial distribution
data = np.random.binomial(n=n_trials, p=p_success, size=n_samples)

# Plotting the distribution
plt.hist(data, bins=range(n_trials + 2), density=True, edgecolor='black')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.title('Binomial Distribution (n=10, p=0.6)')
plt.show()

# Calculating mean and variance
mean = np.mean(data)
variance = np.var(data)

mean, variance
