1) What is Estimation Statistics? Explain point estimate and interval estimate.

Estimation statistics is a branch of statistics that deals with making predictions or inferences about a population parameter based on a sample of data. The goal is to estimate a parameter, such as the population mean or proportion, based on the information available in the sample

Point estimate is a single value that is used to estimate a population parameter. For example, if we want to estimate the population mean, we can use the sample mean as a point estimate. The sample mean is a point estimate because it provides a single value that represents our best guess of the population mean based on the sample data

Interval estimate, on the other hand, is a range of values that is used to estimate a population parameter. An interval estimate includes a lower and an upper bound, and it provides a range of values within which we believe the population parameter is likely to fall. For example, a 95% confidence interval for the population mean would provide a range of values that we believe contains the true population mean with 95% probability based on the sample data

Interval estimates are generally preferred over point estimates because they provide more information about the uncertainty associated with the estimate. Point estimates can be very sensitive to the particular sample that was used to compute them, and they do not provide any information about the precision of the estimate. Interval estimates, on the other hand, provide a measure of the precision of the estimate and allow us to make statements about the probability of the population parameter being within a certain range of values

2) Write a Python function to estimate the population mean using a sample mean and standard
deviation.

In [1]:
import math
def estimate_population_mean(sample_mean, sample_std_dev, sample_size):
    std_error = sample_std_dev / math.sqrt(sample_size)
    margin_error = 1.96 * std_error
    lower_bound = sample_mean - margin_error
    upper_bound = sample_mean + margin_error
    return {'mean': sample_mean, 'confidence_interval': (lower_bound, upper_bound)}

3) What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.

Hypothesis testing is a statistical method used to make inferences about a population based on a sample of data. The goal of hypothesis testing is to determine whether there is enough evidence to reject or fail to reject a null hypothesis about a population parameter, based on the observed sample data

The null hypothesis is a statement about the population that we want to test, and the alternative hypothesis is the statement that we want to accept if the null hypothesis is rejected. The test involves calculating a test statistic based on the sample data, and then comparing the test statistic to a critical value or p-value to determine whether the observed sample data is likely to have occurred by chance or not

Hypothesis testing is used in many fields, including medicine, psychology, economics, and engineering, to test theories and make decisions based on data. It is used to make informed decisions about whether to accept or reject a hypothesis, and to provide evidence for or against a particular theory or claim

The importance of hypothesis testing lies in its ability to help us make decisions based on data, and to draw meaningful conclusions about the population based on a sample. By using hypothesis testing, we can determine whether a particular theory or claim is supported by the available data, or whether we need to revise or reject it. Hypothesis testing also allows us to quantify the level of uncertainty associated with our conclusions, and to make decisions based on the probability of different outcomes

4) Create a hypothesis that states whether the average weight of male college students is greater than
the average weight of female college students.

The null hypothesis for this hypothesis test could be:
1) H0 : The average weight of male college students is equal to or less than the average weight of female college students
The alternative hypothesis could be:
2) Ha : The average weight of male college students is greater than the average weight of female college students


We can then collect a sample of male and female college students, calculate the sample means and standard deviations for each group, and perform a hypothesis test using a t-test or other appropriate statistical test to determine whether there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis

5) Write a Python script to conduct a hypothesis test on the difference between two population means,
given a sample from each population.

In [2]:
import numpy as np
from scipy.stats import t
alpha = 0.05
x = np.array([83, 67, 91, 85, 72, 89, 78, 81, 76, 84])
y = np.array([79, 63, 87, 81, 68, 86, 75, 78, 73, 81])
x_mean = np.mean(x)
y_mean = np.mean(y)
x_std = np.std(x, ddof=1)
y_std = np.std(y, ddof=1)
s_pool = np.sqrt(((len(x)-1)*x_std**2 + (len(y)-1)*y_std**2) / (len(x) + len(y) - 2))

t_stat = (x_mean - y_mean) / (s_pool * np.sqrt(1/len(x) + 1/len(y)))
df = len(x) + len(y) - 2
t_crit = t.ppf(1 - alpha/2, df)
p_value = 2 * (1 - t.cdf(abs(t_stat), df))
if abs(t_stat) > t_crit:
    print("Reject the null hypothesis. There is evidence that the population means are different.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence to conclude that the population means are different.")
print("Test statistic:", t_stat)
print("Critical value:", t_crit)
print("p-value:", p_value)

Fail to reject the null hypothesis. There is not enough evidence to conclude that the population means are different.
Test statistic: 1.0410336718226145
Critical value: 2.10092204024096
p-value: 0.31164176430629564


6) What is a null and alternative hypothesis? Give some examples.

In statistical hypothesis testing, the null hypothesis and alternative hypothesis are two competing hypotheses that are used to make inferences about a population based on a sample of data. The null hypothesis is the hypothesis that we want to test, while the alternative hypothesis is the alternative to the null hypothesis that we will accept if we reject the null hypothesis

The null hypothesis often represents the status quo, or the default position that we assume to be true unless there is evidence to the contrary. The alternative hypothesis represents the position that we want to prove, and is often the opposite of the null hypothesis

1) ex1: Null hypothesis: The average height of men and women is the same.
Alternative hypothesis: The average height of men and women is different


2) ex2 : Null hypothesis: The new drug has no effect on blood pressure.
Alternative hypothesis: The new drug lowers blood pressure

7) Write down the steps involved in hypothesis testing.

1) State the null hypothesis and alternative hypothesis: The null hypothesis (H0) is the hypothesis that we want to test, while the alternative hypothesis (Ha) is the alternative to the null hypothesis that we will accept if we reject the null hypothesis.

2) Determine the level of significance: The level of significance (alpha) is the probability of rejecting the null hypothesis when it is actually true. It is typically set at 0.05 or 0.01.

3) Choose the appropriate test statistic: The choice of test statistic depends on the type of data and the research question being investigated.

4) Set up the decision rule: The decision rule is a set of criteria for rejecting the null hypothesis based on the test statistic and level of significance. It specifies the critical value or p-value that will be used to determine whether to reject or fail to reject the null hypothesis.

5)Collect and analyze the data: Collect the data and calculate the appropriate test statistic.

6) Calculate the p-value: The p-value is the probability of obtaining a test statistic as extreme or more extreme than the observed value, assuming the null hypothesis is true.
7) Make a decision: Compare the p-value to the level of significance and the critical value from the decision rule. If the p-value is less than the level of significance or the test statistic falls outside the critical region, reject the null hypothesis. If the p-value is greater than the level of significance or the test statistic falls within the acceptance region, fail to reject the null hypothesis.

8) Interpret the results: If the null hypothesis is rejected, interpret the results in the context of the research question and alternative hypothesis. If the null hypothesis is not rejected, interpret the results in the context of the limitations of the study and the possibility of a type II error

8) Define p-value and explain its significance in hypothesis testing.

In statistical hypothesis testing, the p-value is the probability of obtaining a test statistic as extreme or more extreme than the observed value, assuming the null hypothesis is true. It measures the strength of the evidence against the null hypothesis

If the p-value is less than the level of significance (usually set at 0.05 or 0.01), then there is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. This means that the observed data is unlikely to have occurred by chance if the null hypothesis is true, and that there is strong evidence in favor of the alternative hypothesis.

Conversely, if the p-value is greater than the level of significance, then there is insufficient evidence to reject the null hypothesis. This means that the observed data could have occurred by chance if the null hypothesis is true, and that there is not enough evidence to support the alternative hypothesis

The significance of the p-value in hypothesis testing lies in its ability to provide a quantitative measure of the strength of the evidence against the null hypothesis. A small p-value indicates strong evidence against the null hypothesis, while a large p-value indicates weak evidence against the null hypothesis.

It is important to note that the p-value is not the probability that the alternative hypothesis is true, nor is it the probability that the null hypothesis is false. It is simply a measure of the strength of the evidence against the null hypothesis based on the observed data and the assumptions made about the population. The interpretation of the p-value should always be done in the context of the research question and the limitations of the study

9) Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom
parameter set to 10.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
df = 10 
x = np.linspace(-5, 5, 1000) 
y = stats.t.pdf(x, df) 

plt.plot(x, y, label='t-distribution')
plt.legend()
plt.title(f"Student's t-distribution with {df} degrees of freedom")
plt.xlabel('x')
plt.ylabel('PDF')
plt.show()

10) Write a Python program to calculate the two-sample t-test for independent samples, given two
random samples of equal size and a null hypothesis that the population means are equal.

In [4]:
import numpy as np
from scipy.stats import ttest_ind
sample1 = np.random.normal(5, 2, 50)
sample2 = np.random.normal(4.5, 1.5, 50)
t_stat, p_value = ttest_ind(sample1, sample2)
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis, there is a significant difference between the population means.")
else:
    print("Fail to reject the null hypothesis, there is not enough evidence to support a difference between the population means.")
    
print(f"t-statistic: {t_stat}")
print(f"p-value: {p_value}")

Fail to reject the null hypothesis, there is not enough evidence to support a difference between the population means.
t-statistic: 1.5748180558950942
p-value: 0.11852134941407882


11) What is Student’s t distribution? When to use the t-Distribution.

Student's t-distribution is a probability distribution that arises when the population standard deviation is unknown and the sample size is small (typically less than 30). It is a bell-shaped distribution that looks similar to a standard normal distribution, but with fatter tails. The t-distribution is named after William Gosset, who wrote under the pseudonym "Student".

The t-distribution is used when we need to make inferences about a population mean based on a sample mean, but we don't know the population standard deviation. In this case, we can use the t-distribution to calculate the probability of observing a sample mean as extreme as the one we have, if the population mean is equal to a hypothesized value. This is the basis of hypothesis testing and confidence interval estimation for the population mean.

The t-distribution has a parameter called degrees of freedom (df), which is related to the sample size. As the sample size increases, the t-distribution becomes more similar to a standard normal distribution, and the number of degrees of freedom increases. When the sample size is large (typically greater than 30), we can use the standard normal distribution instead of the t-distribution, because the population standard deviation can be estimated from the sample standard deviation with reasonable accuracy

12) What is t-statistic? State the formula for t-statistic.

The t-statistic is a measure of how far the sample mean is from the hypothesized population mean in units of the standard error. It is used in hypothesis testing and is the ratio of the difference between the sample mean and the hypothesized population mean to the standard error of the sample mean.

The formula for t-statistic is:

t = (x̄ - μ) / (s / √n)

where:

x̄ is the sample mean
μ is the hypothesized population mean
s is the sample standard deviation
n is the sample size.
The t-statistic measures the number of standard errors that the sample mean is away from the hypothesized population mean. If the t-statistic is large, it indicates that the sample mean is far away from the hypothesized population mean, and we may reject the null hypothesis. If the t-statistic is small, it indicates that the sample mean is close to the hypothesized population mean, and we may fail to reject the null hypothesis.

The t-distribution is used to determine the probability of observing a t-statistic as extreme as the one we have, if the null hypothesis is true. If this probability is very small (typically less than 0.05), we reject the null hypothesis and conclude that the sample mean is significantly different from the hypothesized population mean.





13) A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random
sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50.
Estimate the population mean revenue with a 95% confidence interval.

To estimate the population mean revenue with a 95% confidence interval, we can use the t-distribution because the population standard deviation is unknown and the sample size is less than 30. Here are the steps to calculate the confidence interval:

Calculate the sample size: n = 50
Calculate the sample mean revenue: x̄ = $500
Calculate the standard error of the mean: s/√n = $50/√50 = $7.07
Find the t-value with 49 degrees of freedom and a 95% confidence level: t = 2.009 (from t-table or calculator)
Calculate the margin of error: t-value x standard error = 2.009 x $7.07 = $14.20
Calculate the lower and upper bounds of the confidence interval: lower bound = x̄ - margin of error = $500 - $14.20 = $485.80; upper bound = x̄ + margin of error = $500 + $14.20 = $514.20
Therefore, we can say with 95% confidence that the population mean revenue for the coffee shop is between $485.80 and $514.20 per day

14) A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a
clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a
standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.

To test the hypothesis with a significance level of 0.05, we can use a one-sample t-test with the following null and alternative hypotheses:

Null hypothesis: The true mean decrease in blood pressure is equal to 10 mmHg (µ = 10).
Alternative hypothesis: The true mean decrease in blood pressure is less than 10 mmHg (µ < 10).
Here are the steps to conduct the t-test:

Calculate the t-statistic:
t = (x̄ - µ) / (s / √n)
t = (8 - 10) / (3 / √100)
t = -2.82

Find the critical t-value at a significance level of 0.05 with 99 degrees of freedom (n - 1): t_critical = -1.66 (from t-table or calculator)

Compare the t-statistic with the critical t-value:
Since the t-statistic (-2.82) is less than the critical t-value (-1.66), we reject the null hypothesis.

Calculate the p-value:
We can calculate the p-value using a t-distribution calculator or by looking up the area under the t-distribution curve with 99 degrees of freedom to the left of the t-statistic (-2.82). The p-value is approximately 0.003.

Compare the p-value with the significance level:
Since the p-value (0.003) is less than the significance level (0.05), we reject the null hypothesis.

Therefore, we can conclude that the new drug significantly decreases blood pressure by more than 10 mmHg at a significance level of 0.05

15) An electronics company produces a certain type of product with a mean weight of 5 pounds and a
standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight
is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5
pounds with a significance level of 0.01.

To test the hypothesis that the true mean weight of the products is less than 5 pounds with a significance level of 0.01, we can use a one-sample t-test with the following null and alternative hypotheses:

Null hypothesis: The true mean weight of the products is equal to 5 pounds (µ = 5).
Alternative hypothesis: The true mean weight of the products is less than 5 pounds (µ < 5).
Here are the steps to conduct the t-test:

Calculate the t-statistic:
t = (x̄ - µ) / (s / √n)
t = (4.8 - 5) / (0.5 / √25)
t = -2
Find the critical t-value at a significance level of 0.01 with 24 degrees of freedom (n - 1): t_critical = -2.492 (from t-table or calculator)

Compare the t-statistic with the critical t-value:

Since the t-statistic (-2) is greater than the critical t-value (-2.492), we fail to reject the null hypothesis.

Calculate the p-value:
We can calculate the p-value using a t-distribution calculator or by looking up the area under the t-distribution curve with 24 degrees of freedom to the left of the t-statistic (-2). The p-value is approximately 0.029.

Compare the p-value with the significance level:
Since the p-value (0.029) is greater than the significance level (0.01), we fail to reject the null hypothesis.

Therefore, we cannot conclude that the true mean weight of the products is less than 5 pounds at a significance level of 0.01.

16) Two groups of students are given different study materials to prepare for a test. The first group (n1 =30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean
score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two
groups are equal with a significance level of 0.01.

To test the hypothesis that the population means for the two groups are equal with a significance level of 0.01, we can use a two-sample t-test with the following null and alternative hypotheses:

Null hypothesis: The population means for the two groups are equal (µ1 = µ2).
Alternative hypothesis: The population means for the two groups are not equal (µ1 ≠ µ2).
Here are the steps to conduct the t-test:

Calculate the pooled standard deviation:
s_p = sqrt(((n1-1)*s1^2 + (n2-1)*s2^2) / (n1 + n2 - 2))
s_p = sqrt(((30-1)*10^2 + (40-1)*8^2) / (30 + 40 - 2))
s_p = 9.14
Calculate the t-statistic:
t = (x̄1 - x̄2) / (s_p * sqrt(1/n1 + 1/n2))
t = (80 - 75) / (9.14 * sqrt(1/30 + 1/40))
t = 2.30

Find the critical t-value at a significance level of 0.01 with 68 degrees of freedom (n1 + n2 - 2): t_critical = ±2.660 (from t-table or calculator)

Compare the t-statistic with the critical t-value:

Since the t-statistic (2.30) is less than the critical t-value (±2.660), we fail to reject the null hypothesis.

Calculate the p-value:
We can calculate the p-value using a t-distribution calculator or by looking up the area under the t-distribution curve with 68 degrees of freedom to the right of the absolute value of the t-statistic (2.30). The p-value is approximately 0.013.

Compare the p-value with the significance level:
Since the p-value (0.013) is less than the significance level (0.01), we reject the null hypothesis.

Therefore, we conclude that there is significant evidence to suggest that the population means for the two groups are not equal at a significance level of 0.01

17) A marketing company wants to estimate the average number of ads watched by viewers during a TV
program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard
deviation of 1.5. Estimate the population mean with a 99% confidence interval.

In [8]:

import scipy.stats as stats

sample_size = 50
sample_mean = 4
sample_std_dev = 1.5
confidence_level = 0.99
degrees_of_freedom = sample_size - 1
critical_value = stats.t.ppf(q=(1 - confidence_level)/2, df=degrees_of_freedom)
margin_of_error = critical_value * (sample_std_dev / (sample_size ** 0.5))

lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"Population mean estimate: {sample_mean:.2f}")
print(f"99% Confidence Interval: ({lower_bound:.2f}, {upper_bound:.2f})")

Population mean estimate: 4.00
99% Confidence Interval: (4.57, 3.43)
