In [None]:
Q1: What is Estimation Statistics? Explain point estimate and interval estimate.
Q2. Write a Python function to estimate the population mean using a sample mean and standard
deviation.
Q3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.
Q4. Create a hypothesis that states whether the average weight of male college students is greater than
the average weight of female college students.
Q5. Write a Python script to conduct a hypothesis test on the difference between two population means,
given a sample from each population.
Q6: What is a null and alternative hypothesis? Give some examples.
Q7: Write down the steps involved in hypothesis testing.
Q8. Define p-value and explain its significance in hypothesis testing.
Q9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom
parameter set to 10.
Q10. Write a Python program to calculate the two-sample t-test for independent samples, given two
random samples of equal size and a null hypothesis that the population means are equal.
Q11: What is Student’s t distribution? When to use the t-Distribution.
Q12: What is t-statistic? State the formula for t-statistic.
Q13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random
sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50.
Estimate the population mean revenue with a 95% confidence interval.
Q14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a
clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a
standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.
Q15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a
standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight
is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5
pounds with a significance level of 0.01.
Q16. Two groups of students are given different study materials to prepare for a test. The first group (n1 =
30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean
score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two
groups are equal with a significance level of 0.01.
Q17. A marketing company wants to estimate the average number of ads watched by viewers during a TV
program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard
deviation of 1.5. Estimate the population mean with a 99% confidence interval.

In [None]:
### Q1: What is Estimation Statistics? Explain point estimate and interval estimate.

**Estimation Statistics** is the process of making inferences about a population based on information obtained from a sample. It involves estimating population parameters (such as mean, variance, proportion) using sample statistics.

- **Point Estimate**: A point estimate is a single value given as the estimate of a population parameter. For example, the sample mean (\(\bar{x}\)) is a point estimate of the population mean (\(\mu\)).

  **Example**: If we have a sample mean of 50 from a sample, then 50 is the point estimate of the population mean.

- **Interval Estimate**: An interval estimate provides a range of values within which the parameter is expected to lie, with a certain level of confidence. This range is known as a confidence interval.

  **Example**: If we estimate that the population mean lies between 45 and 55 with 95% confidence, then [45, 55] is a 95% confidence interval for the population mean.

### Q2. Write a Python function to estimate the population mean using a sample mean and standard deviation.

```python
import scipy.stats as stats
import numpy as np

def estimate_population_mean(sample, confidence=0.95):
    sample_mean = np.mean(sample)
    sample_std = np.std(sample, ddof=1)
    n = len(sample)
    
    t_critical = stats.t.ppf((1 + confidence) / 2, df=n-1)
    margin_of_error = t_critical * (sample_std / np.sqrt(n))
    
    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error
    
    return sample_mean, (lower_bound, upper_bound)

# Example usage
sample = [10, 12, 15, 14, 13, 12, 14, 15, 16, 12]
mean, confidence_interval = estimate_population_mean(sample)
print(f"Sample Mean: {mean}")
print(f"95% Confidence Interval: {confidence_interval}")
```

### Q3: What is Hypothesis Testing? Why is it used? State the importance of Hypothesis Testing.

**Hypothesis Testing** is a statistical method used to make decisions about the properties of a population based on sample data. It involves testing an assumption (hypothesis) about a population parameter.

**Why it is used**:
- To determine if there is enough statistical evidence to support a certain belief or hypothesis about a population parameter.
- To make informed decisions based on sample data.

**Importance of Hypothesis Testing**:
- It provides a systematic way to evaluate and make decisions about population parameters.
- It helps in validating research findings.
- It controls the risk of making incorrect conclusions by specifying a significance level.

### Q4. Create a hypothesis that states whether the average weight of male college students is greater than the average weight of female college students.

**Hypothesis**:
- Null Hypothesis (\(H_0\)): The average weight of male college students is equal to or less than the average weight of female college students (\(\mu_m \leq \mu_f\)).
- Alternative Hypothesis (\(H_1\)): The average weight of male college students is greater than the average weight of female college students (\(\mu_m > \mu_f\)).

### Q5. Write a Python script to conduct a hypothesis test on the difference between two population means, given a sample from each population.

```python
import scipy.stats as stats
import numpy as np

def hypothesis_test_mean_difference(sample1, sample2, alpha=0.05):
    mean1, mean2 = np.mean(sample1), np.mean(sample2)
    std1, std2 = np.std(sample1, ddof=1), np.std(sample2, ddof=1)
    n1, n2 = len(sample1), len(sample2)
    
    # Pooled standard deviation
    sp = np.sqrt(((n1 - 1) * std1**2 + (n2 - 1) * std2**2) / (n1 + n2 - 2))
    
    # t-statistic
    t_stat = (mean1 - mean2) / (sp * np.sqrt(1/n1 + 1/n2))
    
    # Degrees of freedom
    df = n1 + n2 - 2
    
    # Critical t-value
    t_critical = stats.t.ppf(1 - alpha/2, df)
    
    # p-value
    p_value = (1 - stats.t.cdf(abs(t_stat), df)) * 2
    
    return t_stat, t_critical, p_value

# Example usage
sample1 = [72, 75, 78, 74, 73, 76, 77, 79, 81, 74]
sample2 = [68, 65, 70, 69, 66, 67, 70, 71, 69, 68]
t_stat, t_critical, p_value = hypothesis_test_mean_difference(sample1, sample2)
print(f"t-statistic: {t_stat}")
print(f"Critical t-value: {t_critical}")
print(f"p-value: {p_value}")
```

### Q6: What is a null and alternative hypothesis? Give some examples.

- **Null Hypothesis (\(H_0\))**: The null hypothesis is a statement of no effect, no difference, or no relationship. It is assumed true until evidence indicates otherwise.

  **Examples**:
  - \(H_0\): The mean weight of men is equal to the mean weight of women.
  - \(H_0\): A new drug has no effect on blood pressure.

- **Alternative Hypothesis (\(H_1\))**: The alternative hypothesis is a statement that contradicts the null hypothesis. It indicates the presence of an effect, difference, or relationship.

  **Examples**:
  - \(H_1\): The mean weight of men is not equal to the mean weight of women.
  - \(H_1\): A new drug decreases blood pressure.

### Q7: Write down the steps involved in hypothesis testing.

1. **State the Hypotheses**:
   - Null hypothesis (\(H_0\))
   - Alternative hypothesis (\(H_1\))

2. **Choose the Significance Level (\(\alpha\))**:
   - Common values are 0.05, 0.01, or 0.10.

3. **Collect Data**:
   - Obtain a random sample from the population(s).

4. **Calculate the Test Statistic**:
   - Depending on the test, calculate the test statistic (e.g., t-statistic, z-statistic).

5. **Determine the Critical Value or p-value**:
   - Compare the test statistic to the critical value from the distribution table, or
   - Calculate the p-value.

6. **Make a Decision**:
   - If the test statistic exceeds the critical value or the p-value is less than \(\alpha\), reject \(H_0\).
   - Otherwise, fail to reject \(H_0\).

7. **Draw a Conclusion**:
   - Based on the decision, state the conclusion in the context of the research question.

### Q8. Define p-value and explain its significance in hypothesis testing.

**p-value**: The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. It quantifies the evidence against the null hypothesis.

**Significance**:
- A low p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so we reject \(H_0\).
- A high p-value (> 0.05) indicates weak evidence against the null hypothesis, so we fail to reject \(H_0\).
- The p-value helps determine the significance of results and whether the observed data is consistent with the null hypothesis.

### Q9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom parameter set to 10.

```python
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats

df = 10  # degrees of freedom
x = np.linspace(-4, 4, 1000)
y = stats.t.pdf(x, df)

plt.plot(x, y, label=f"t-distribution (df={df})")
plt.title("Student's t-Distribution")
plt.xlabel("t")
plt.ylabel("Probability Density")
plt.legend()
plt.grid(True)
plt.show()
```

### Q10. Write a Python program to calculate the two-sample t-test for independent samples, given two random samples of equal size and a null hypothesis that the population means are equal.

```python
import scipy.stats as stats
import numpy as np

def two_sample_t_test(sample1, sample2):
    t_stat, p_value = stats.ttest_ind(sample1, sample2)
    return t_stat, p_value

# Example usage
sample1 = np.random.normal(50, 10, 30)
sample2 = np.random.normal(55, 10, 30)
t_stat, p_value = two_sample_t_test(sample1, sample2)
print(f"t-statistic: {t_stat}")
print(f"p-value: {p_value}")
```

### Q11: What is Student’s t distribution? When to use the t-Distribution.

**Student’s t distribution** is a probability distribution that is symmetric and bell-shaped, like the normal distribution, but has heavier tails. It is used when the sample size is small and/or the population standard deviation is unknown.

**When to use**:
- When estimating the mean of

 a normally distributed population with an unknown standard deviation, especially with small sample sizes (n < 30).
- When conducting t-tests (e.g., one-sample t-test, two-sample t-test).

### Q12: What is t-statistic? State the formula for t-statistic.

**t-statistic**: The t-statistic is a ratio that measures the deviation of the sample mean from the population mean relative to the variability of the sample.

**Formula for one-sample t-test**:
\[
t = \frac{\bar{x} - \mu}{s / \sqrt{n}}
\]
where:
- \(\bar{x}\) = sample mean
- \(\mu\) = population mean (under the null hypothesis)
- \(s\) = sample standard deviation
- \(n\) = sample size

### Q13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50. Estimate the population mean revenue with a 95% confidence interval.

To estimate the population mean revenue with a 95% confidence interval:

\[
\bar{x} = 500, \quad s = 50, \quad n = 50
\]

Using the t-distribution (since the sample size is reasonably large, the normal distribution could also be used):

\[
t_{0.025, 49} \approx 2.0096
\]

Margin of error:

\[
ME = t \times \frac{s}{\sqrt{n}} = 2.0096 \times \frac{50}{\sqrt{50}} \approx 14.2
\]

Confidence interval:

\[
500 \pm 14.2 \implies [485.8, 514.2]
\]

So, the 95% confidence interval is \([485.8, 514.2]\).

### Q14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.

**Hypotheses**:
- Null Hypothesis (\(H_0\)): \(\mu = 10\) (The drug decreases blood pressure by 10 mmHg)
- Alternative Hypothesis (\(H_1\)): \(\mu < 10\) (The drug decreases blood pressure by less than 10 mmHg)

\[
\bar{x} = 8, \quad \mu = 10, \quad s = 3, \quad n = 100
\]

t-statistic:

\[
t = \frac{\bar{x} - \mu}{s / \sqrt{n}} = \frac{8 - 10}{3 / \sqrt{100}} = -6.67
\]

Using a t-distribution with 99 degrees of freedom, the critical t-value for a one-tailed test at \(\alpha = 0.05\) is approximately -1.660.

Since -6.67 < -1.660, we reject \(H_0\). There is significant evidence to support the hypothesis that the drug decreases blood pressure by less than 10 mmHg.

### Q15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5 pounds with a significance level of 0.01.

**Hypotheses**:
- Null Hypothesis (\(H_0\)): \(\mu = 5\)
- Alternative Hypothesis (\(H_1\)): \(\mu < 5\)

\[
\bar{x} = 4.8, \quad \mu = 5, \quad s = 0.5, \quad n = 25
\]

t-statistic:

\[
t = \frac{\bar{x} - \mu}{s / \sqrt{n}} = \frac{4.8 - 5}{0.5 / \sqrt{25}} = -2
\]

Using a t-distribution with 24 degrees of freedom, the critical t-value for a one-tailed test at \(\alpha = 0.01\) is approximately -2.492.

Since -2 > -2.492, we fail to reject \(H_0\). There is not enough evidence to support the hypothesis that the true mean weight of the products is less than 5 pounds.

### Q16. Two groups of students are given different study materials to prepare for a test. The first group (n1 = 30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two groups are equal with a significance level of 0.01.

**Hypotheses**:
- Null Hypothesis (\(H_0\)): \(\mu_1 = \mu_2\)
- Alternative Hypothesis (\(H_1\)): \(\mu_1 \neq \mu_2\)

\[
\bar{x}_1 = 80, \quad s_1 = 10, \quad n_1 = 30
\]
\[
\bar{x}_2 = 75, \quad s_2 = 8, \quad n_2 = 40
\]

Pooled standard deviation (\(s_p\)):

\[
s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}} = \sqrt{\frac{(29)(100) + (39)(64)}{68}} \approx 9.04
\]

t-statistic:

\[
t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} = \frac{80 - 75}{9.04 \sqrt{\frac{1}{30} + \frac{1}{40}}} \approx 2.63
\]

Using a t-distribution with \(n_1 + n_2 - 2 = 68\) degrees of freedom, the critical t-value for a two-tailed test at \(\alpha = 0.01\) is approximately ±2.660.

Since 2.63 < 2.660, we fail to reject \(H_0\). There is not enough evidence to support the hypothesis that the population means for the two groups are different.

### Q17. A marketing company wants to estimate the average number of ads watched by viewers during a TV program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard deviation of 1.5. Estimate the population mean with a 99% confidence interval.

\[
\bar{x} = 4, \quad s = 1.5, \quad n = 50
\]

Using the t-distribution (since the sample size is reasonably large, the normal distribution could also be used):

\[
t_{0.005, 49} \approx 2.68
\]

Margin of error:

\[
ME = t \times \frac{s}{\sqrt{n}} = 2.68 \times \frac{1.5}{\sqrt{50}} \approx 0.57
\]

Confidence interval:

\[
4 \pm 0.57 \implies [3.43, 4.57]
\]

So, the 99% confidence interval is \([3.43, 4.57]\).