# ***Statistics Part 2***

**Q.1 What is hypothesis testing in statistics?**  
Hypothesis testing is a statistical technique used to evaluate assumptions about a population parameter. By analyzing sample data, we determine whether to reject a null hypothesis (the default assumption) in favor of an alternative hypothesis. It helps in making decisions based on data rather than assumptions.

**Q.2 What is the null hypothesis, and how does it differ from the alternative hypothesis?**  
The **null hypothesis (H₀)** states that there is no effect or no difference in the population. It's the baseline assumption tested against. The **alternative hypothesis (H₁ or Ha)** proposes that there is a real effect or difference. The goal of testing is to assess whether the sample data provides sufficient evidence to reject H₀ in favor of H₁.

**Q.3 What is the significance level in hypothesis testing, and why is it important?**  
The **significance level (α)** is the threshold for deciding whether to reject the null hypothesis. It represents the probability of making a **Type I error**—incorrectly rejecting a true null hypothesis. A common value is 0.05, meaning there's a 5% risk of such an error. It helps control the risk of false positives.

**Q.4 What does a P-value represent in hypothesis testing?**  
A **P-value** is the probability of obtaining results as extreme as the observed ones, assuming the null hypothesis is true. It quantifies the strength of the evidence against the null hypothesis. A smaller P-value indicates stronger evidence to reject H₀.

**Q.5 How do you interpret the P-value in hypothesis testing?**  
- If **P-value ≤ α**: There is significant evidence to reject the null hypothesis.  
- If **P-value > α**: There is not enough evidence to reject the null hypothesis.  
It does not indicate the probability that H₀ is true, but rather how likely the observed data is under H₀.

**Q.6 What are Type 1 and Type 2 errors in hypothesis testing?**  
- **Type I error (α):** Rejecting a true null hypothesis (false positive).  
- **Type II error (β):** Failing to reject a false null hypothesis (false negative).  
There's a trade-off between the two: lowering one typically increases the other.

**Q.7 What is the difference between a one-tailed and a two-tailed test in hypothesis testing?**  
- **One-tailed test:** Tests for an effect in only one direction (e.g., greater than or less than).  
- **Two-tailed test:** Tests for an effect in either direction (e.g., not equal to).  
The choice depends on the research question and hypothesis direction.

**Q.8 What is the Z-test, and when is it used in hypothesis testing?**  
The **Z-test** is used to test hypotheses about population means or proportions when the population variance is known and the sample size is large (n > 30). It's appropriate when data is approximately normally distributed.

**Q.9 How do you calculate the Z-score, and what does it represent in hypothesis testing?**  
**Z = (X̄ - μ) / (σ / √n)**  
Where:  
- X̄ = sample mean  
- μ = population mean  
- σ = population standard deviation  
- n = sample size  
The Z-score tells how many standard deviations the sample mean is from the population mean.

**Q.10 What is the T-distribution, and when should it be used instead of the normal distribution?**  
The **T-distribution** is similar to the normal distribution but has thicker tails, which account for greater variability in small samples. It's used when the population standard deviation is unknown and the sample size is small (n ≤ 30).

**Q.11 What is the difference between a Z-test and a T-test?**  
- **Z-test:** Used when the population standard deviation is known and the sample is large.  
- **T-test:** Used when the population standard deviation is unknown and the sample is small.  
The T-distribution adjusts for added uncertainty in small samples.

**Q.12 What is the T-test, and how is it used in hypothesis testing?**  
A **T-test** compares sample means to test if there's a significant difference between groups. It's used in:  
- One-sample T-test: comparing a sample mean to a known value  
- Two-sample T-test: comparing means of two independent groups  
- Paired T-test: comparing means from the same group at different times

**Q.13 What is the relationship between Z-test and T-test in hypothesis testing?**  
Both tests assess differences between means. The T-test is a more flexible version of the Z-test that accounts for small sample sizes and unknown population standard deviation. With large samples, both yield similar results.

**Q.14 What is a confidence interval, and how is it used to interpret statistical results?**  
A **confidence interval (CI)** is a range of values that is likely to contain the true population parameter. For example, a 95% CI means there's a 95% chance the interval contains the true mean. It provides a sense of precision and uncertainty around an estimate.

**Q.15 What is the margin of error, and how does it affect the confidence interval?**  
The **margin of error** is the range added/subtracted from a sample estimate to form the confidence interval. A larger margin of error means less precision. It depends on the sample size and the variability in the data.

**Q.16 How is Bayes' Theorem used in statistics, and what is its significance?**  
**Bayes' Theorem** calculates the probability of an event based on prior knowledge of related events. It's central in **Bayesian statistics**, where prior beliefs are updated with new data to form posterior probabilities, allowing dynamic, evidence-based inference.

**Q.17 What is the Chi-square distribution, and when is it used?**  
The **Chi-square distribution** is used for testing hypotheses about categorical variables. It's commonly applied in goodness of fit tests and tests of independence in contingency tables. It has a single tail and depends on degrees of freedom.

**Q.18 What is the Chi-square goodness of fit test, and how is it applied?**  
The **Chi-square goodness of fit test** checks whether observed frequencies for a categorical variable match expected frequencies based on a theoretical distribution. It's used to see if data fits a hypothesized pattern.

**Q.19 What is the F-distribution, and when is it used in hypothesis testing?**  
The **F-distribution** is used to compare two variances or analyze the overall variability in ANOVA tests. It’s right-skewed and depends on two degrees of freedom: numerator and denominator. It helps assess if group variances differ significantly.

**Q.20 What is an ANOVA test, and what are its assumptions?**  
**ANOVA (Analysis of Variance)** tests whether the means of three or more groups are significantly different. Key assumptions include:  
- Independence of observations  
- Normally distributed groups  
- Equal variances (homoscedasticity) across groups

**Q.21 What are the different types of ANOVA tests?**  
- **One-way ANOVA:** Tests one independent variable across groups  
- **Two-way ANOVA:** Tests two independent variables and their interaction  
- **Repeated Measures ANOVA:** Tests the same subjects under different conditions or over time

**Q.22 What is the F-test, and how does it relate to hypothesis testing?**  
The **F-test** assesses whether two variances are significantly different and is used within **ANOVA** to evaluate if group means differ. A high F-value relative to critical values leads to rejection of the null hypothesis of equal means.



# *** Practical Part - 1***


**Q.1 Write a Python program to generate a random variable and display its value**  
You can use NumPy’s `random` module to generate random variables.

```python
import numpy as np

random_value = np.random.rand()
print(f"Random value: {random_value}")
```

---

**Q.2 Generate a discrete uniform distribution using Python and plot the probability mass function (PMF)**  
A discrete uniform distribution assigns equal probabilities to all outcomes.

```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import randint

x = np.arange(1, 7)
pmf = randint.pmf(x, 1, 7)

plt.stem(x, pmf, use_line_collection=True)
plt.title('PMF of Discrete Uniform Distribution (1-6)')
plt.xlabel('Outcome')
plt.ylabel('Probability')
plt.show()
```

---

**Q.3 Write a Python function to calculate the probability distribution function (PDF) of a Bernoulli distribution**

```python
from scipy.stats import bernoulli

def bernoulli_pdf(p, x):
    return bernoulli.pmf(x, p)

# Example
print("PDF at x=1 for p=0.3:", bernoulli_pdf(0.3, 1))
```

---

**Q.4 Write a Python script to simulate a binomial distribution with n=10 and p=0.5, then plot its histogram**

```python
import numpy as np
import matplotlib.pyplot as plt

data = np.random.binomial(n=10, p=0.5, size=1000)
plt.hist(data, bins=11, density=True, edgecolor='black')
plt.title('Binomial Distribution Histogram (n=10, p=0.5)')
plt.xlabel('Number of Successes')
plt.ylabel('Frequency')
plt.show()
```

---

**Q.5 Create a Poisson distribution and visualize it using Python**

```python
from scipy.stats import poisson
import matplotlib.pyplot as plt
import numpy as np

mu = 3
x = np.arange(0, 10)
pmf = poisson.pmf(x, mu)

plt.stem(x, pmf, use_line_collection=True)
plt.title('Poisson Distribution (λ=3)')
plt.xlabel('k')
plt.ylabel('P(X=k)')
plt.show()
```

---

**Q.6 Write a Python program to calculate and plot the cumulative distribution function (CDF) of a discrete uniform distribution**

```python
from scipy.stats import randint
import matplotlib.pyplot as plt
import numpy as np

x = np.arange(1, 7)
cdf = randint.cdf(x, 1, 7)

plt.step(x, cdf, where='mid')
plt.title('CDF of Discrete Uniform Distribution (1-6)')
plt.xlabel('Outcome')
plt.ylabel('Cumulative Probability')
plt.show()
```

---

**Q.7 Generate a continuous uniform distribution using NumPy and visualize it**

```python
import numpy as np
import matplotlib.pyplot as plt

data = np.random.uniform(0, 10, 1000)
plt.hist(data, bins=20, density=True, edgecolor='black')
plt.title('Histogram of Continuous Uniform Distribution')
plt.xlabel('Value')
plt.ylabel('Density')
plt.show()
```

---

**Q.8 Simulate data from a normal distribution and plot its histogram**

```python
data = np.random.normal(loc=0, scale=1, size=1000)
plt.hist(data, bins=30, density=True, edgecolor='black')
plt.title('Normal Distribution Histogram')
plt.xlabel('Value')
plt.ylabel('Density')
plt.show()
```

---

**Q.9 Write a Python function to calculate Z-scores from a dataset and plot them**

```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import zscore

data = np.random.normal(50, 10, 100)
z_scores = zscore(data)

plt.hist(z_scores, bins=20, edgecolor='black')
plt.title('Z-score Distribution')
plt.xlabel('Z-score')
plt.ylabel('Frequency')
plt.show()
```

---

**Q.10 Implement the Central Limit Theorem (CLT) using Python for a non-normal distribution**

```python
import numpy as np
import matplotlib.pyplot as plt

population = np.random.exponential(scale=2.0, size=10000)
sample_means = [np.mean(np.random.choice(population, 30)) for _ in range(1000)]

plt.hist(sample_means, bins=30, density=True, edgecolor='black')
plt.title('CLT - Distribution of Sample Means (Exponential)')
plt.xlabel('Sample Mean')
plt.ylabel('Frequency')
plt.show()
```

---

**Q.11 Simulate multiple samples from a normal distribution and verify the Central Limit Theorem**

```python
import numpy as np
import matplotlib.pyplot as plt

sample_means = []
for _ in range(1000):
    sample = np.random.normal(loc=50, scale=10, size=30)
    sample_means.append(np.mean(sample))

plt.hist(sample_means, bins=30, edgecolor='black', density=True)
plt.title('CLT Verification with Normal Distribution')
plt.xlabel('Sample Mean')
plt.ylabel('Frequency')
plt.show()
```

---

**Q.12 Write a Python function to calculate and plot the standard normal distribution (mean = 0, std = 1)**

```python
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-4, 4, 1000)
pdf = norm.pdf(x, 0, 1)

plt.plot(x, pdf)
plt.title('Standard Normal Distribution')
plt.xlabel('Z-score')
plt.ylabel('Density')
plt.grid(True)
plt.show()
```

---

**Q.13 Generate random variables and calculate their corresponding probabilities using the binomial distribution**

```python
from scipy.stats import binom

n, p = 10, 0.5
x = range(0, 11)
probs = binom.pmf(x, n, p)

for val, prob in zip(x, probs):
    print(f"P(X={val}) = {prob:.4f}")
```

---

**Q.14 Write a Python program to calculate the Z-score for a given data point and compare it to a standard normal distribution**

```python
from scipy.stats import norm

x = 75
mean = 70
std = 5

z = (x - mean) / std
p = norm.cdf(z)

print(f"Z-score: {z:.2f}")
print(f"Probability of value ≤ {x}: {p:.4f}")
```

---

**Q.15 Implement hypothesis testing using Z-statistics for a sample dataset**

```python
import numpy as np
from scipy.stats import norm

sample = np.array([105, 110, 102, 108, 100])
pop_mean = 100
sample_mean = np.mean(sample)
sample_std = np.std(sample, ddof=1)
n = len(sample)

z = (sample_mean - pop_mean) / (sample_std / np.sqrt(n))
p_value = 2 * (1 - norm.cdf(abs(z)))

print(f"Z-statistic: {z:.2f}")
print(f"P-value: {p_value:.4f}")
```

---

**Q.16 Create a confidence interval for a dataset using Python and interpret the result**

```python
import numpy as np
from scipy.stats import norm

data = np.random.normal(50, 10, 100)
mean = np.mean(data)
std_err = np.std(data, ddof=1) / np.sqrt(len(data))

z = norm.ppf(0.975)  # 95% confidence
margin = z * std_err
ci = (mean - margin, mean + margin)

print(f"95% Confidence Interval: {ci}")
```

---

**Q.17 Generate data from a normal distribution, then calculate and interpret the confidence interval for its mean**

```python
data = np.random.normal(100, 15, 200)
mean = np.mean(data)
std_err = np.std(data, ddof=1) / np.sqrt(len(data))
z = norm.ppf(0.975)
ci = (mean - z * std_err, mean + z * std_err)

print(f"95% Confidence Interval for Mean: {ci}")
```

---

**Q.18 Write a Python script to calculate and visualize the probability density function (PDF) of a normal distribution**

```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

x = np.linspace(60, 140, 100)
pdf = norm.pdf(x, loc=100, scale=15)

plt.plot(x, pdf)
plt.title('PDF of Normal Distribution (μ=100, σ=15)')
plt.xlabel('Value')
plt.ylabel('Density')
plt.grid(True)
plt.show()
```

---

**Q.19 Use Python to calculate and interpret the cumulative distribution function (CDF) of a Poisson distribution**

```python
from scipy.stats import poisson

lambda_val = 4
x = 6
cdf = poisson.cdf(x, mu=lambda_val)

print(f"CDF of Poisson(λ=4) at x=6: {cdf:.4f}")
```

---

**Q.20 Simulate a random variable using a continuous uniform distribution and calculate its expected value**

```python
data = np.random.uniform(5, 15, 1000)
expected_value = np.mean(data)
print(f"Expected value of simulated uniform distribution: {expected_value:.2f}")
```

---

**Q.21 Write a Python program to compare the standard deviations of two datasets and visualize the difference**

```python
import matplotlib.pyplot as plt

data1 = np.random.normal(50, 5, 100)
data2 = np.random.normal(50, 15, 100)

print(f"Std Dev of data1: {np.std(data1):.2f}")
print(f"Std Dev of data2: {np.std(data2):.2f}")

plt.hist(data1, alpha=0.5, label='SD = 5')
plt.hist(data2, alpha=0.5, label='SD = 15')
plt.legend()
plt.title('Comparison of Standard Deviations')
plt.show()
```

---

**Q.22 Calculate the range and interquartile range (IQR) of a dataset generated from a normal distribution**

```python
data = np.random.normal(100, 20, 1000)
range_val = np.max(data) - np.min(data)
iqr = np.percentile(data, 75) - np.percentile(data, 25)

print(f"Range: {range_val:.2f}")
print(f"IQR: {iqr:.2f}")
```

---

**Q.23 Implement Z-score normalization on a dataset and visualize its transformation**

```python
from scipy.stats import zscore
import matplotlib.pyplot as plt

data = np.random.normal(100, 20, 1000)
normalized = zscore(data)

plt.hist(normalized, bins=30, edgecolor='black')
plt.title('Z-score Normalized Data')
plt.xlabel('Z-score')
plt.ylabel('Frequency')
plt.show()
```

---

**Q.24 Write a Python function to calculate the skewness and kurtosis of a dataset generated from a normal distribution**

```python
from scipy.stats import skew, kurtosis

data = np.random.normal(0, 1, 1000)
print(f"Skewness: {skew(data):.4f}")
print(f"Kurtosis: {kurtosis(data):.4f}")
```

---
