In [2]:
# ### Q1: What is Estimation Statistics? Explain point estimate and interval estimate.

# - **Estimation Statistics**: Estimation in statistics is the process of inferring the value of a population parameter based on a sample. It involves using sample data to provide an estimate of a population parameter (such as the mean or variance).
  
# - **Point Estimate**: A point estimate is a single value given as an estimate of an unknown population parameter. For example, if you have a sample mean of 50, then the point estimate for the population mean is 50.

# - **Interval Estimate**: An interval estimate provides a range of values within which the true population parameter is likely to fall. This range is usually accompanied by a confidence level, such as a 95% confidence interval, indicating the degree of certainty that the true parameter lies within the interval.

# ---

# ### Q2: Write a Python function to estimate the population mean using a sample mean and standard deviation.

# ```python
# import scipy.stats as stats

# def estimate_population_mean(sample_mean, sample_std, sample_size, confidence_level=0.95):
#     # Calculate the standard error
#     standard_error = sample_std / (sample_size ** 0.5)
    
#     # Calculate the margin of error
#     margin_of_error = stats.t.ppf((1 + confidence_level) / 2, df=sample_size-1) * standard_error
    
#     # Calculate the confidence interval
#     lower_bound = sample_mean - margin_of_error
#     upper_bound = sample_mean + margin_of_error
    
#     return lower_bound, upper_bound

# # Example usage:
# sample_mean = 50
# sample_std = 10
# sample_size = 30
# confidence_interval = estimate_population_mean(sample_mean, sample_std, sample_size)
# print(confidence_interval)
# ```

# ---

# ### Q3: What is Hypothesis Testing? Why is it used? State the importance of Hypothesis Testing.

# - **Hypothesis Testing**: Hypothesis testing is a statistical method used to determine whether there is enough evidence to reject a null hypothesis. It involves comparing observed data against a hypothesis to assess the validity of the hypothesis.
  
# - **Why is it used?**: It is used to make inferences about a population based on sample data, and to test assumptions or theories.
  
# - **Importance**: Hypothesis testing allows researchers to assess whether their data supports a specific claim or theory, and it helps guide decision-making in various fields, including business, medicine, and social sciences.

# ---

# ### Q4: Create a hypothesis that states whether the average weight of male college students is greater than the average weight of female college students.

# - **Null Hypothesis (H₀)**: The average weight of male college students is equal to or less than the average weight of female college students.
#   - \( H₀: \mu_{males} \leq \mu_{females} \)
  
# - **Alternative Hypothesis (H₁)**: The average weight of male college students is greater than the average weight of female college students.
#   - \( H₁: \mu_{males} > \mu_{females} \)

# ---

# ### Q5: Write a Python script to conduct a hypothesis test on the difference between two population means, given a sample from each population.

# ```python
# from scipy import stats

# def hypothesis_test_difference_mean(sample1, sample2, alpha=0.05):
#     # Perform a two-sample t-test
#     t_stat, p_value = stats.ttest_ind(sample1, sample2)
    
#     # Compare the p-value with the significance level
#     if p_value < alpha:
#         return "Reject Null Hypothesis: There is a significant difference."
#     else:
#         return "Fail to Reject Null Hypothesis: No significant difference."

# # Example usage
# sample1 = [20, 22, 21, 19, 23]
# sample2 = [30, 32, 33, 31, 29]
# print(hypothesis_test_difference_mean(sample1, sample2))
# ```

# ---

# ### Q6: What is a null and alternative hypothesis? Give some examples.

# - **Null Hypothesis (H₀)**: A hypothesis that there is no significant effect or relationship in the population. It is the default assumption.
  
# - **Alternative Hypothesis (H₁)**: A hypothesis that contradicts the null hypothesis. It suggests that there is a significant effect or relationship.

# **Examples**:
# 1. A factory produces light bulbs, and we want to test if the average lifespan of the bulbs is 1000 hours.
#    - \( H₀: \mu = 1000 \)
#    - \( H₁: \mu \neq 1000 \)

# 2. Testing if a new drug improves recovery time:
#    - \( H₀: \mu_{\text{new drug}} = \mu_{\text{current drug}} \)
#    - \( H₁: \mu_{\text{new drug}} < \mu_{\text{current drug}} \)

# ---

# ### Q7: Write down the steps involved in hypothesis testing.

# 1. **State the hypotheses**: Formulate the null hypothesis (H₀) and the alternative hypothesis (H₁).
# 2. **Choose the significance level (\( \alpha \))**: Typically set at 0.05 or 0.01.
# 3. **Select the appropriate test**: Based on data type, use t-test, z-test, chi-square, etc.
# 4. **Calculate the test statistic**: Perform the test and compute the test statistic.
# 5. **Determine the p-value**: The p-value helps assess whether the observed results are statistically significant.
# 6. **Make a decision**: Compare the p-value with the significance level to either reject or fail to reject the null hypothesis.
# 7. **Interpret the result**: Based on the decision, conclude whether or not the null hypothesis is supported.

# ---

# ### Q8: Define p-value and explain its significance in hypothesis testing.

# - **p-value**: The p-value is the probability that the observed data (or something more extreme) would occur if the null hypothesis were true. It helps measure the strength of the evidence against the null hypothesis.
  
# - **Significance**: If the p-value is less than the significance level (\( \alpha \)), we reject the null hypothesis. A smaller p-value indicates stronger evidence against the null hypothesis.

# ---

# ### Q9: Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom parameter set to 10.

# ```python
# import matplotlib.pyplot as plt
# import numpy as np
# import scipy.stats as stats

# # Degrees of freedom
# df = 10

# # Generate values for the x-axis
# x = np.linspace(-5, 5, 1000)

# # Generate the t-distribution
# y = stats.t.pdf(x, df)

# # Plot the t-distribution
# plt.plot(x, y, label=f"t-distribution (df={df})")
# plt.title("Student's t-Distribution (df=10)")
# plt.xlabel("x")
# plt.ylabel("Probability Density")
# plt.legend()
# plt.grid(True)
# plt.show()
# ```

# ---

# ### Q10: Write a Python program to calculate the two-sample t-test for independent samples, given two random samples of equal size and a null hypothesis that the population means are equal.

# ```python
# from scipy import stats

# def two_sample_t_test(sample1, sample2):
#     t_stat, p_value = stats.ttest_ind(sample1, sample2)
#     return t_stat, p_value

# # Example usage
# sample1 = [23, 21, 22, 24, 25]
# sample2 = [30, 29, 28, 32, 31]
# t_stat, p_value = two_sample_t_test(sample1, sample2)
# print(f"T-statistic: {t_stat}, P-value: {p_value}")
# ```

# ---

# ### Q11: What is Student’s t distribution? When to use the t-Distribution.

# - **Student’s t distribution**: The t-distribution is a type of probability distribution that is used when the sample size is small and the population standard deviation is unknown. It is similar to the normal distribution but with heavier tails.

# - **When to use**: Use the t-distribution when:
#   - The sample size is small (typically \( n < 30 \)),
#   - The population standard deviation is unknown.

# ---

# ### Q12: What is t-statistic? State the formula for t-statistic.

# - **t-statistic**: The t-statistic is used to determine if the difference between the sample mean and the population mean (or between two sample means) is statistically significant. It is calculated as:

# \[
# t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}
# \]

# Where:
# - \( \bar{X} \) is the sample mean,
# - \( \mu \) is the population mean (or comparison mean),
# - \( s \) is the sample standard deviation,
# - \( n \) is the sample size.

# ---

# ### Q13: A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50. Estimate the population mean revenue with a 95% confidence interval.

# ```python
# import scipy.stats as stats

# sample_mean = 500
# sample_std = 50
# sample_size = 50
# confidence_level = 0.95

# # Calculate the standard error
# standard_error = sample_std / (sample_size ** 0.5)

# # Calculate the margin of error
# margin_of_error = stats.t.ppf((1 + confidence_level) / 2, df=sample_size-1) * standard_error

# # Confidence interval
# lower_bound = sample_mean - margin_of_error
# upper_bound = sample_mean + margin_of_error

# print(f"95% Confidence Interval: ({lower_bound}, {upper_bound})")
# ```

# ---

# ### Q14: A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.

# ```python
# # Given data
# sample_mean = 8
# population_mean = 10
# std_dev = 3
# sample_size = 100
# alpha = 0.05

# # Calculate the standard error
# standard_error = std_dev / (sample_size ** 0.5)

# # Calculate the t-statistic
# t_stat = (sample_mean - population_mean) / standard_error

# # Find the critical t-value for a one-tailed test with 99 degrees of freedom
# t_critical = stats.t.ppf(1 - alpha, df=sample_size-1)

# # Compare the t-statistic with the critical value
# if t_stat < -t_critical:
#     print("Reject Null Hypothesis: There is a significant decrease in blood pressure.")
# else:
#     print("Fail to Reject Null Hypothesis: No significant decrease in blood pressure.")
# ```

# ---

# ### Q15: An electronics company produces a certain type of product with a mean weight of 5 pounds and a standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5 pounds with a significance level of 0.01.

# ```python
# # Given data
# sample_mean = 4.8
# population_mean = 5
# std_dev = 0.5
# sample_size = 25
# alpha = 0.01

# # Calculate the standard error
# standard_error = std_dev / (sample_size ** 0.5)

# # Calculate the t-statistic
# t_stat = (sample_mean - population_mean) / standard_error

# # Find the critical t-value for a one-tailed test
# t_critical = stats.t.ppf(1 - alpha, df=sample_size-1)

# # Compare the t-statistic with the critical value
# if t_stat < -t_critical:
#     print("Reject Null Hypothesis: The mean weight is less than 5 pounds.")
# else:
#     print("Fail to Reject Null Hypothesis: No significant evidence that the mean weight is less than