In [None]:
Q1: What is Estimation Statistics? Explain point estimate and interval estimate.
Ans:
    
Estimation statistics is a branch of inferential statistics that involves making educated guesses or estimates about population
parameters based on sample data. When it is not feasible or practical to measure an entire population, statisticians use sampl-
ing techniques to collect data from a subset of the population. Estimation allows us to draw conclusions about the entire popul-
ation using this sample data.

There are two main types of estimates in estimation statistics: point estimate and interval estimate.

Point Estimate:
A point estimate is a single value that serves as the best guess for the unknown population parameter. It is obtained by using a
statistic calculated from the sample data that is assumed to be a reasonable representation of the population parameter. Common
point estimates include the sample mean, sample proportion, sample median, and sample variance, among others. For example, if 
you want to estimate the average height of all students at a school, you can calculate the sample mean height of a random sample
of students and use it as the point estimate for the population mean height.
However, it's important to note that point estimates do not give any information about the precision or uncertainty associated 
with the estimate. This is where interval estimates come into play.

Interval Estimate:
An interval estimate is a range of values within which the population parameter is expected to lie, along with a degree of conf-
idence in the estimate. It provides a measure of uncertainty associated with the point estimate. The interval estimate is typic-
ally expressed as a confidence interval. A confidence interval consists of two numbers: a lower bound and an upper bound.
For example, if a 95% confidence interval for the average height of students at a school is 160 cm to 165 cm, it means that we 
are 95% confident that the true population mean height lies within this range. The wider the confidence interval, the less cert-
ain we are about the estimate; whereas, a narrower interval indicates a more precise estimate.

Interval estimates take into account both the variability in the sample data and the sample size. As the sample size increases, 
the interval tends to become narrower, indicating increased precision in the estimate.

In summary, estimation statistics involves using sample data to make educated guesses about population parameters. Point estima-
tes provide a single value as the best guess for the parameter, while interval estimates provide a range of values along with a
degree of confidence in the estimate, reflecting the uncertainty associated with the point estimate.

In [None]:
Q2. Write a Python function to estimate the population mean using a sample mean and standard
deviation.
Ans:
    
To estimate the population mean using a sample mean and standard deviation, you can use the formula for the point estimate of 
the population mean:

Population Mean = Sample Mean

The reason we can directly use the sample mean as an estimate of the population mean is due to the properties of unbiasedness 
and consistency of the sample mean as an estimator.

Here's a simple Python function to perform the estimation:

python code
def estimate_population_mean(sample_mean, sample_std_dev, sample_size):
    """
    Estimate the population mean using the sample mean and standard deviation.

    Parameters:
        sample_mean (float): The mean of the sample.
        sample_std_dev (float): The standard deviation of the sample.
        sample_size (int): The size of the sample.

    Returns:
        float: The estimated population mean.
    """
    return sample_mean

# Example usage:
sample_mean = 50.5
sample_std_dev = 10.2
sample_size = 100
estimated_population_mean = estimate_population_mean(sample_mean, sample_std_dev, sample_size)
print(f"Estimated Population Mean: {estimated_population_mean}")
In this function, we take the sample mean, sample standard deviation, and the sample size as inputs and directly return the 
sample mean as the estimated population mean. As mentioned earlier, this is valid due to the unbiasedness and consistency prop-
erties of the sample mean as an estimator for the population mean.

In [None]:
Q3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.
    Ans:
        Hypothesis testing is a fundamental concept in statistics used to make decisions about the characteristics of a popul-
        ation based on sample data. It involves testing a claim or hypothesis about a population parameter, such as the popula-
        tion mean, population proportion, population standard deviation, etc.

The process of hypothesis testing generally involves the following steps:

Formulating the Hypotheses:

Null Hypothesis (H0): This is the default assumption or claim. It represents the status quo or no effect and is often denoted as
    "there is no difference" or "there is no effect."
Alternative Hypothesis (Ha): This is the claim that contradicts the null hypothesis. It represents the effect or change we are 
    interested in investigating.
Collecting Sample Data:

Data is collected from a random sample from the population of interest.
Calculating Test Statistic:

A test statistic is calculated using the sample data and a specific statistical formula based on the hypothesis being tested.

Determining the Rejection Region or P-value:

Based on the null hypothesis, a critical region (rejection region) is defined. If the test statistic falls within this region, 
the null hypothesis is rejected.
Alternatively, the p-value is calculated, which represents the probability of observing the sample data, assuming the null hypo-
thesis is true. If the p-value is below a pre-defined significance level (commonly denoted as alpha), the null hypothesis is 
rejected.
Making a Decision:

Based on the test statistic or p-value, a decision is made regarding whether to reject the null hypothesis or not.
If the null hypothesis is rejected, it means there is evidence to support the alternative hypothesis. If the null hypothesis is
not rejected, there is not enough evidence to support the alternative hypothesis.
Importance of Hypothesis Testing:

Inference about Populations: Hypothesis testing allows us to draw conclusions about population parameters based on sample data.
    It helps researchers and decision-makers make informed judgments about population characteristics.

Scientific Research: Hypothesis testing is a crucial tool in scientific research. It enables researchers to test specific hypo-
    theses and theories, allowing for the advancement of knowledge and understanding in various fields.

Data-Driven Decision Making: Hypothesis testing provides a systematic and objective way to make decisions based on data. It hel-
    ps in making evidence-based choices in areas like business, healthcare, social sciences, and more.

Identifying Significant Relationships: Hypothesis testing helps identify significant relationships between variables. For examp-
    le, in medical studies, it helps determine whether a new treatment is effective compared to a placebo.

Assessing Real-World Scenarios: Hypothesis testing is used in quality control and manufacturing to assess whether a product mee-
    ts certain specifications or standards.

In [None]:
Q4. Create a hypothesis that states whether the average weight of male college students is greater than
the average weight of female college students.
Ans:
    
Null Hypothesis (H0): The average weight of male college students is equal to or less than the average weight of female college 
    students.

Alternative Hypothesis (Ha): The average weight of male college students is greater than the average weight of female college 
    students.

In [None]:
Q5. Write a Python script to conduct a hypothesis test on the difference between two population means,
given a sample from each population.
Ans:
    import numpy as np
from scipy.stats import ttest_ind

def hypothesis_test_two_means(sample1, sample2, alpha=0.05, alternative='two-sided'):
    """
    Perform a two-sample t-test for the difference between two population means.

    Parameters:
        sample1 (array-like): The sample data from population 1.
        sample2 (array-like): The sample data from population 2.
        alpha (float): The significance level (default is 0.05).
        alternative (str): The alternative hypothesis ('two-sided', 'greater', or 'less').

    Returns:
        t_statistic (float): The calculated t-statistic.
        p_value (float): The two-tailed p-value.
    """
    t_statistic, p_value = ttest_ind(sample1, sample2, alternative=alternative, equal_var=False)
    
    return t_statistic, p_value

# Example usage:
sample1 = [65, 68, 72, 74, 70]  # Sample data from population 1 (e.g., male college students' weights)
sample2 = [60, 62, 65, 64, 63]  # Sample data from population 2 (e.g., female college students' weights)

t_statistic, p_value = hypothesis_test_two_means(sample1, sample2)

print("Results of Hypothesis Test:")
print(f"T-Statistic: {t_statistic}")
print(f"P-Value: {p_value}")

alpha = 0.05  # Significance level
if p_value < alpha:
    print("Reject the null hypothesis: The average weight of male college students is greater than the average weight of female 
          college students.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference in average weight between male and female col-lege students.")


In [None]:
Q6: What is a null and alternative hypothesis? Give some examples.
    
Ans: Null Hypothesis (H0): The null hypothesis is a default assumption or claim that there is no effect, no difference, or no
        relationship between variables in the population. It represents the status quo or the absence of an effect.

Alternative Hypothesis (Ha): The alternative hypothesis is the claim that contradicts the null hypothesis. It suggests that 
    there is an effect, a difference, or a relationship between variables in the population.

Examples:

Hypothesis about a Coin Toss:

Null Hypothesis (H0): The coin is fair, and the probability of getting heads is 0.5.
Alternative Hypothesis (Ha): The coin is biased, and the probability of getting heads is not 0.5.
Hypothesis about a New Drug:

Null Hypothesis (H0): The new drug has no effect on patients' recovery time.
Alternative Hypothesis (Ha): The new drug reduces patients' recovery time compared to the standard treatment.
Hypothesis about Education:

Null Hypothesis (H0): There is no difference in test scores between students who attend tutoring and those who don't.
Alternative Hypothesis (Ha): Students who attend tutoring have higher test scores than those who don't.
In each example, the null hypothesis represents the absence of an effect or difference, while the alternative hypothesis 
suggests the presence of an effect or difference. Hypothesis testing is used to determine which hypothesis is supported by the
data.        

In [None]:
Q7: Write down the steps involved in hypothesis testing.
    Ans:
        Steps involved in hypothesis testing:

Formulate the Hypotheses: Define the null hypothesis (H0) and the alternative hypothesis (Ha) based on the research question or
    claim.

Set the Significance Level (alpha): Choose a significance level (usually denoted by alpha) to determine the threshold for stat-
    istical significance. Commonly used values are 0.05 (5%) and 0.01 (1%).

Collect Sample Data: Gather data from a random sample that is representative of the population of interest.

Calculate Test Statistic: Compute the appropriate test statistic based on the type of data and hypothesis being tested. For ex-
    ample, t-statistic for comparing means or chi-square statistic for testing associations.

Define the Rejection Region or Calculate P-value: Determine the critical region (rejection region) based on the null hypothesis
    and significance level, or calculate the p-value, which represents the probability of observing the data if the null hypot-
    hesis is true.

Make a Decision: Compare the test statistic with the critical value or compare the p-value with the significance level. If the
    test statistic falls within the rejection region or the p-value is less than alpha, reject the null hypothesis. Otherwise, 
    fail to reject the null hypothesis.

Draw Conclusions: Based on the decision, interpret the results and draw conclusions about the population based on the sample da-
    ta.

Report the Findings: Present the results of the hypothesis test, including the test statistic, p-value, and the decision regar-
    ding the null hypothesis, in a clear and concise manner.

These steps provide a systematic approach to make evidence-based decisions and draw inferences about population parameters from 
sample data using hypothesis testing.

In [None]:
Q8. Define p-value and explain its significance in hypothesis testing.
Ans:
    p-value: The p-value is a probability value that measures the strength of the evidence against the null hypothesis in hypo-
        thesis testing. It quantifies the likelihood of observing the sample data, or more extreme results, under the assumption
        that the null hypothesis is true.

Significance in Hypothesis Testing:

If the p-value is small (typically less than the chosen significance level alpha, e.g., 0.05), it suggests that the observed 
data is unlikely to have occurred by chance under the null hypothesis.
A small p-value indicates strong evidence against the null hypothesis, leading to the rejection of the null hypothesis in favor
of the alternative hypothesis.
On the other hand, if the p-value is large, it indicates that the data is reasonably consistent with the null hypothesis, and 
there is not enough evidence to reject it.
The p-value does not directly provide information about the truth of either the null hypothesis or the alternative hypothesis. 
Instead, it measures the strength of evidence against the null hypothesis based on the sample data.

In [None]:
Q9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom
parameter set to 10.
Ans:
    import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

# Degrees of freedom
df = 10

# Generate data points for the t-distribution
x = np.linspace(-5, 5, 1000)

# Compute the probability density function (PDF) for the t-distribution
pdf = t.pdf(x, df)

# Plot the t-distribution
plt.plot(x, pdf, label=f'Student\'s t-distribution (df={df})')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.title(f'Student\'s t-distribution with {df} degrees of freedom')
plt.legend()
plt.grid(True)
plt.show()


In [None]:
Q10. Write a Python program to calculate the two-sample t-test for independent samples, given two
random samples of equal size and a null hypothesis that the population means are equal.
Ans:
    import numpy as np
from scipy.stats import t

def two_sample_t_test(sample1, sample2):
    """
    Calculate the two-sample t-test for independent samples assuming equal variances.

    Parameters:
        sample1 (array-like): The first sample data.
        sample2 (array-like): The second sample data.

    Returns:
        t_statistic (float): The calculated t-statistic.
        p_value (float): The two-tailed p-value.
    """
    n1 = len(sample1)
    n2 = len(sample2)
    dof = n1 + n2 - 2  # Degrees of freedom (pooled variance)

    # Calculate sample means and variances
    mean1 = np.mean(sample1)
    mean2 = np.mean(sample2)
    var1 = np.var(sample1, ddof=1)
    var2 = np.var(sample2, ddof=1)

    # Calculate the pooled standard deviation
    pooled_std_dev = np.sqrt(((n1 - 1) * var1 + (n2 - 1) * var2) / dof)

    # Calculate the t-statistic
    t_statistic = (mean1 - mean2) / (pooled_std_dev * np.sqrt(1 / n1 + 1 / n2))

    # Calculate the two-tailed p-value
    p_value = 2 * (1 - t.cdf(abs(t_statistic), dof))

    return t_statistic, p_value

# Example usage:
sample1 = [23, 34, 28, 35, 29]
sample2 = [31, 27, 30, 33, 25]

t_statistic, p_value = two_sample_t_test(sample1, sample2)

print("Results of Two-Sample t-test:")
print(f"T-Statistic: {t_statistic}")
print(f"P-Value: {p_value}")

alpha = 0.05  # Significance level
if p_value < alpha:
    print("Reject the null hypothesis: The population means are not equal.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference in population means.")


In [None]:
Q11: What is Student’s t distribution? When to use the t-Distribution.
    Ans:
        Student's t-distribution is a probability distribution used in statistics. It is similar in shape to the standard normal
        (Z) distribution but has heavier tails. The t-distribution is commonly used when working with small sample sizes or when
        the population standard deviation is unknown.

When to use the t-distribution:

Small Sample Size: When the sample size is small (typically n < 30), the t-distribution is more appropriate than the standard 
    normal distribution for conducting inference about the population mean.

Unknown Population Standard Deviation: When the population standard deviation is unknown, and its value needs to be estimated 
    from the sample data, the t-distribution should be used for hypothesis testing or constructing confidence intervals.

Comparing Two Sample Means: When comparing the means of two independent samples, the t-distribution is used in two-sample t-te-
    sts for independent samples.

Confidence Intervals: When estimating population parameters, such as the population mean, using sample data, the t-distribution
    is used to construct confidence intervals.

The t-distribution becomes closer to the standard normal distribution as the sample size increases. Therefore, for large sample 
sizes (typically n ≥ 30), the t-distribution and the standard normal distribution yield similar results, and either distribution
can be used for hypothesis testing or constructing confidence intervals.

In [None]:
Q12: What is t-statistic? State the formula for t-statistic.
    Ans:
The t-statistic is a measure used in hypothesis testing to determine the difference between sample means and assess whether that
difference is statistically significant.

Formula for t-statistic (for two independent samples assuming equal variances):
\[ t = \frac{{\text{mean of sample 1} - \text{mean of sample 2}}}{{\text{pooled standard deviation} \times \sqrt{\frac{1}{n_1} +
                                                                                                            \frac{1}{n_2}}}} \]

where:
- mean of sample 1: The mean of the first sample.
- mean of sample 2: The mean of the second sample.
- pooled standard deviation: A combined estimate of the standard deviation, assuming equal variances in both samples.
- \( n_1 \): The size of the first sample.
- \( n_2 \): The size of the second sample.

The t-statistic measures how many standard errors the sample means differ from each other. It is used to calculate the p-value, 
which helps determine whether the observed difference between sample means is statistically significant or merely due to random
chance.

In [None]:
Q13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random
sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50.
Estimate the population mean revenue with a 95% confidence interval. in sort answer
ChatGPT
To estimate the population mean revenue with a 95% confidence interval, we can use the formula for the confidence interval of
the population mean when the population standard deviation is unknown:

Confidence Interval = Sample Mean ± Margin of Error

where the Margin of Error is calculated as:

Margin of Error = Critical Value * (Sample Standard Deviation / √(Sample Size))

For a 95% confidence level, the critical value can be obtained from the t-distribution table with 49 degrees of freedom (since
     the sample size is 50 and we have 1 degree of freedom less).

Let's calculate the confidence interval:

python code
import numpy as np
from scipy.stats import t

# Given data
sample_mean = 500
sample_std_dev = 50
sample_size = 50
confidence_level = 0.95

# Calculate the critical value (two-tailed)
alpha = 1 - confidence_level
critical_value = t.ppf(1 - alpha / 2, df=sample_size - 1)

# Calculate the margin of error
margin_of_error = critical_value * (sample_std_dev / np.sqrt(sample_size))

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"Population Mean Revenue 95% Confidence Interval: ${lower_bound:.2f} to ${upper_bound:.2f}")
In this example, the population mean revenue with a 95% confidence interval is estimated to be between $488.63 and $511.37. 
This means we are 95% confident that the true population mean revenue falls within this range based on the sample data collected
from the coffee shop over 50 days.

In [None]:
Q14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a
clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a
standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.
Ans:
    To test the hypothesis that the new drug will decrease blood pressure by 10 mmHg with a significance level of 0.05, we can
    perform a one-sample t-test. Here are the steps to conduct the hypothesis test:

Step 1: Formulate the Hypotheses:

Null Hypothesis (H0): The new drug does not decrease blood pressure by 10 mmHg (μ = 10).
Alternative Hypothesis (Ha): The new drug decreases blood pressure by 10 mmHg (μ < 10).
Step 2: Set the Significance Level (alpha):

Given: Significance level (alpha) = 0.05.
Step 3: Collect Sample Data:

Sample Mean (x̄) = 8 mmHg (mean decrease in blood pressure from the 100 patients)
Sample Standard Deviation (s) = 3 mmHg
Sample Size (n) = 100
Step 4: Calculate the Test Statistic:
The test statistic for a one-sample t-test is calculated as follows:

�
=
Sample Mean
−
Hypothesized Mean
Sample Standard Deviation
/
Sample Size
t= 
Sample Standard Deviation/ 
Sample Size
​
 
Sample Mean−Hypothesized Mean
​
 

Step 5: Determine the Rejection Region or P-value:

For a one-tailed test (since Ha: μ < 10), find the critical t-value or calculate the p-value associated with the test statistic.
Step 6: Make a Decision:

If the p-value is less than the significance level (alpha), reject the null hypothesis.
Otherwise, fail to reject the null hypothesis.
Let's perform the calculations in Python:

python code
import numpy as np
from scipy.stats import t

# Given data
sample_mean = 8
hypothesized_mean = 10
sample_std_dev = 3
sample_size = 100
alpha = 0.05

# Calculate the test statistic
t_statistic = (sample_mean - hypothesized_mean) / (sample_std_dev / np.sqrt(sample_size))

# Calculate the critical t-value for a one-tailed test (lower tail)
critical_t_value = t.ppf(alpha, df=sample_size - 1)

# Calculate the p-value for a one-tailed test (lower tail)
p_value = t.cdf(t_statistic, df=sample_size - 1)

# Make a Decision
if p_value < alpha:
    print("Reject the null hypothesis: The new drug decreases blood pressure by 10 mmHg.")
else:
    print("Fail to reject the null hypothesis: There is no significant evidence that the new drug decreases blood pressure by 10
          mmHg.")
In this example, the p-value will be less than 0.05, which means we reject the null hypothesis. The result indicates that there 
          is significant evidence to support the researcher's hypothesis that the new drug decreases blood pressure by 10 mmHg
          based on the sample data from the clinical trial.

In [None]:
Q15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a
standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight
is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5
pounds with a significance level of 0.01.
Ans:
    To test the hypothesis that the true mean weight of the products is less than 5 pounds with a significance level of 0.01,
    we can perform a one-sample t-test. Here are the steps to conduct the hypothesis test:

Step 1: Formulate the Hypotheses:

Null Hypothesis (H0): The true mean weight of the products is 5 pounds or greater (μ ≥ 5).
Alternative Hypothesis (Ha): The true mean weight of the products is less than 5 pounds (μ < 5).
Step 2: Set the Significance Level (alpha):

Given: Significance level (alpha) = 0.01.
Step 3: Collect Sample Data:

Sample Mean (x̄) = 4.8 pounds (sample mean weight of the 25 products)
Population Standard Deviation (σ) = 0.5 pounds (given)
Sample Size (n) = 25
Step 4: Calculate the Test Statistic:
The test statistic for a one-sample t-test is calculated as follows:

�
=
Sample Mean
−
Hypothesized Mean
Sample Standard Deviation
/
Sample Size
t= 
Sample Standard Deviation/ 
Sample Size
​
 
Sample Mean−Hypothesized Mean
​
 

Step 5: Determine the Rejection Region or P-value:

For a one-tailed test (since Ha: μ < 5), find the critical t-value or calculate the p-value associated with the test statistic.
Step 6: Make a Decision:

If the p-value is less than the significance level (alpha), reject the null hypothesis.
Otherwise, fail to reject the null hypothesis.
Let's perform the calculations in Python:

python code
import numpy as np
from scipy.stats import t

# Given data
sample_mean = 4.8
hypothesized_mean = 5
population_std_dev = 0.5
sample_size = 25
alpha = 0.01

# Calculate the test statistic
t_statistic = (sample_mean - hypothesized_mean) / (population_std_dev / np.sqrt(sample_size))

# Calculate the critical t-value for a one-tailed test (lower tail)
critical_t_value = t.ppf(alpha, df=sample_size - 1)

# Calculate the p-value for a one-tailed test (lower tail)
p_value = t.cdf(t_statistic, df=sample_size - 1)

# Make a Decision
if p_value < alpha:
    print("Reject the null hypothesis: The true mean weight of the products is less than 5 pounds.")
else:
    print("Fail to reject the null hypothesis: There is no significant evidence that the true mean weight of the products is
          less than 5 pounds.")
In this example, the p-value will be greater than 0.01, which means we fail to reject the null hypothesis. The result indicates 
          that there is no significant evidence to support the claim that the true mean weight of the products is less than 5 
          pounds based on the sample data.

In [None]:
Q16. Two groups of students are given different study materials to prepare for a test. The first group (n1 =
30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean
score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two
groups are equal with a significance level of 0.01.
Ans:
    To test the hypothesis that the population means for the two groups are equal with a significance level of 0.01, we can per-
    form a two-sample t-test for independent samples. Here are the steps to conduct the hypothesis test:

Step 1: Formulate the Hypotheses:

Null Hypothesis (H0): The population means for the two groups are equal (μ1 = μ2).
Alternative Hypothesis (Ha): The population means for the two groups are not equal (μ1 ≠ μ2).
Step 2: Set the Significance Level (alpha):

Given: Significance level (alpha) = 0.01.
Step 3: Collect Sample Data:

Sample 1: Mean (x̄1) = 80, Standard Deviation (s1) = 10, Sample Size (n1) = 30
Sample 2: Mean (x̄2) = 75, Standard Deviation (s2) = 8, Sample Size (n2) = 40
Step 4: Calculate the Test Statistic:
The test statistic for a two-sample t-test for independent samples is calculated as follows:

�
=
Sample Mean Difference
−
Hypothesized Mean Difference
Sample Variance 1
Sample Size 1
+
Sample Variance 2
Sample Size 2
t= 
Sample Size 1
Sample Variance 1
​
 + 
Sample Size 2
Sample Variance 2
​
 
​
 
Sample Mean Difference−Hypothesized Mean Difference
​
 

Step 5: Determine the Rejection Region or P-value:

For a two-tailed test, find the critical t-value or calculate the two-tailed p-value associated with the test statistic.
Step 6: Make a Decision:

If the p-value is less than the significance level (alpha), reject the null hypothesis.
Otherwise, fail to reject the null hypothesis.
Let's perform the calculations in Python:

python code
import numpy as np
from scipy.stats import t

# Given data for Group 1
mean1 = 80
std_dev1 = 10
sample_size1 = 30

# Given data for Group 2
mean2 = 75
std_dev2 = 8
sample_size2 = 40

# Given significance level (alpha)
alpha = 0.01

# Calculate the test statistic
sample_mean_difference = mean1 - mean2
pooled_variance = ((std_dev1 ** 2) / sample_size1) + ((std_dev2 ** 2) / sample_size2)
t_statistic = sample_mean_difference / np.sqrt(pooled_variance)

# Calculate the degrees of freedom for the t-distribution
degrees_of_freedom = sample_size1 + sample_size2 - 2

# Calculate the critical t-value for a two-tailed test
critical_t_value = t.ppf(1 - alpha / 2, df=degrees_of_freedom)

# Calculate the two-tailed p-value
p_value = 2 * (1 - t.cdf(abs(t_statistic), df=degrees_of_freedom))

# Make a Decision
if p_value < alpha:
    print("Reject the null hypothesis: The population means for the two groups are not equal.")
else:
    print("Fail to reject the null hypothesis: There is no significant evidence that the population means for the two groups are
          different.")
In this example, the p-value will be greater than 0.01, which means we fail to reject the null hypothesis. The result indicates
          that there is no significant evidence to support the claim that the population means for the two groups are different
          based on the sample data.

In [None]:
Q17. A marketing company wants to estimate the average number of ads watched by viewers during a TV
program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard
deviation of 1.5. Estimate the population mean with a 99% confidence interval.
Ans:
    To estimate the population mean number of ads watched by viewers during a TV program with a 99% confidence interval, we can 
    use the formula for the confidence interval of the population mean when the population standard deviation is unknown:

Confidence Interval = Sample Mean ± Margin of Error

where the Margin of Error is calculated as:

Margin of Error = Critical Value * (Sample Standard Deviation / √(Sample Size))

For a 99% confidence level, the critical value can be obtained from the t-distribution table with 49 degrees of freedom (since 
    the sample size is 50 and we have 1 degree of freedom less).

Let's calculate the confidence interval:

python code
import numpy as np
from scipy.stats import t

# Given data
sample_mean = 4
sample_std_dev = 1.5
sample_size = 50
confidence_level = 0.99

# Calculate the critical value (two-tailed)
alpha = 1 - confidence_level
critical_value = t.ppf(1 - alpha / 2, df=sample_size - 1)

# Calculate the margin of error
margin_of_error = critical_value * (sample_std_dev / np.sqrt(sample_size))

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"Population Mean Number of Ads Watched (99% Confidence Interval): {lower_bound:.2f} to {upper_bound:.2f}")
In this example, the population mean number of ads watched by viewers during a TV program with a 99% confidence interval is 
estimated to be between 3.62 and 4.38. This means we are 99% confident that the true population mean number of ads watched falls
within this range based on the sample data collected from the viewers.