# Assignment

# Q1

Estimation statistics is a branch of statistics that deals with the process of estimating the value of a population parameter based on a sample statistic. It involves using sample data to make inferences about the population from which the sample was drawn.

Point estimate is a single numerical value that is used to estimate an unknown population parameter based on the sample data. For example, the sample mean is a point estimate of the population mean. Point estimates are often used when only one estimate is needed, and the precision of the estimate is not a primary concern.

Interval estimate, on the other hand, is a range of values that is used to estimate an unknown population parameter. The range is constructed using the sample data and is called a confidence interval. The confidence interval provides a range of values within which the population parameter is likely to lie. The level of confidence, typically expressed as a percentage, indicates how confident we are that the true population parameter lies within the given interval.

# Q2

In [1]:
import math

def estimate_population_mean(sample_mean, sample_std_dev, sample_size):
    # Calculate the standard error of the mean
    std_err = sample_std_dev / math.sqrt(sample_size)
    
    # Calculate the margin of error using a 95% confidence level
    margin_error = 1.96 * std_err
    
    # Calculate the lower and upper bounds of the confidence interval
    lower_bound = sample_mean - margin_error
    upper_bound = sample_mean + margin_error
    
    # Return the estimate of the population mean
    return (lower_bound + upper_bound) / 2

population_mean_estimate = estimate_population_mean(sample_mean, sample_std_dev, sample_size)
print("Estimated population mean:", population_mean_estimate)


In [3]:
estimate_population_mean(50,10,100)

50.0

# Q3

Hypothesis testing is a statistical method used to determine whether a given hypothesis about a population parameter is true or false. In hypothesis testing, we start with a null hypothesis, which is a statement that the population parameter is equal to a certain value. We then collect sample data and calculate a test statistic, which is used to assess the likelihood of observing the sample data if the null hypothesis is true. Based on this likelihood, we either reject or fail to reject the null hypothesis.

Hypothesis testing is used to make inferences about a population based on sample data. It is an important tool in scientific research, as it allows researchers to test their theories and make conclusions about the real-world phenomena they are studying. Hypothesis testing is used in a wide range of fields, including biology, psychology, economics, and engineering

# Q4

Null hypothesis: The average weight of male college students is equal to or less than the average weight of female college students.

Alternative hypothesis: The average weight of male college students is greater than the average weight of female college students.

# Q5

In [11]:
import numpy as np
from scipy import stats

# Sample data from two populations
pop1 = np.random.normal(50, 10, size=100)
pop2 = np.random.normal(45, 12, size=100)

# Calculate the sample means and standard deviations
sample_mean1 = np.mean(pop1)
sample_mean2 = np.mean(pop2)
sample_std1 = np.std(pop1, ddof=1)
sample_std2 = np.std(pop2, ddof=1)

# Calculate the standard error of the difference between means
std_error = np.sqrt((sample_std1**2 / len(pop1)) + (sample_std2**2 / len(pop2)))

# Calculate the t-statistic
t_stat = (sample_mean1 - sample_mean2) / std_error

# Calculate the degrees of freedom
df = len(pop1) + len(pop2) - 2

# Calculate the p-value using a two-tailed t-test
p_value = stats.t.sf(np.abs(t_stat), df) * 2

# Print the results
print("Sample mean 1:", sample_mean1)
print("Sample mean 2:", sample_mean2)
print("Sample standard deviation 1:", sample_std1)
print("Sample standard deviation 2:", sample_std2)
print("Standard error of the difference between means:", std_error)
print("T-statistic:", t_stat)
print("Degrees of freedom:", df)
print("P-value:", p_value)


Sample mean 1: 50.49459465178218
Sample mean 2: 43.49801245901711
Sample standard deviation 1: 10.20668411526356
Sample standard deviation 2: 12.077676222061067
Standard error of the difference between means: 1.5812863863124627
T-statistic: 4.424614195965475
Degrees of freedom: 198
P-value: 1.5936549362686843e-05


# Q6

In statistical hypothesis testing, a null hypothesis is a statement that assumes there is no significant difference between two or more populations, or no significant relationship between variables in a population. An alternative hypothesis is a statement that contradicts the null hypothesis, by asserting that there is a significant difference or relationship.

Here are some examples of null and alternative hypotheses:

Example 1: A company wants to test whether a new website design has increased the average time users spend on their website.

Null hypothesis: The new website design has not increased the average time users spend on the company's website.

Alternative hypothesis: The new website design has increased the average time users spend on the company's website.

Example 2: A researcher wants to test whether a new drug is effective in reducing blood pressure.

Null hypothesis: The new drug is not effective in reducing blood pressure.

Alternative hypothesis: The new drug is effective in reducing blood pressure.m

# Q7

State the research question and the null and alternative hypotheses.

Determine the appropriate statistical test to use based on the research question and the type of data.

Collect sample data from the populations of interest.

Calculate the appropriate test statistic based on the sample data and the null hypothesis.

Determine the p-value for the test statistic, which represents the probability of observing a test statistic as extreme as or more extreme than the observed one, assuming the null hypothesis is true.

Compare the p-value to the significance level (also called alpha), which is the pre-determined threshold for rejecting the null hypothesis. If the p-value is less than or equal to the significance level, the null hypothesis is rejected in favor of the alternative hypothesis. If the p-value is greater than the significance level, the null hypothesis cannot be rejected.

Interpret the results of the hypothesis test and draw conclusions based on the evidence from the sample data.

Consider the limitations and assumptions of the hypothesis test, and evaluate the implications of the results for the research question and broader population(s) of interest.

# Q8

In hypothesis testing, the p-value is the probability of obtaining a test statistic as extreme as or more extreme than the observed one, assuming the null hypothesis is true. The p-value is a key concept in hypothesis testing because it provides a measure of the strength of evidence against the null hypothesis.

If the p-value is small (typically less than or equal to the pre-determined significance level), it suggests that the observed data is unlikely to have occurred by chance if the null hypothesis is true. This leads to the rejection of the null hypothesis in favor of the alternative hypothesis. On the other hand, if the p-value is large, it suggests that the observed data is reasonably likely to have occurred by chance if the null hypothesis is true, and there is insufficient evidence to reject the null hypothesis.

# Q9

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

# Generate t-distribution data
df = 10
x = np.linspace(t.ppf(0.01, df), t.ppf(0.99, df), 100)
y = t.pdf(x, df)

# Create plot
fig, ax = plt.subplots()
ax.plot(x, y, 'r-', lw=2, alpha=0.6, label='t-distribution (df=10)')

ax.legend(loc='best', frameon=False)
ax.set_xlabel('x')
ax.set_ylabel('pdf(x)')
ax.set_title('Student\'s t-distribution plot')

plt.show()


# Q10

In [10]:
import numpy as np
from scipy.stats import ttest_ind

# Generate random samples
sample1 = np.random.normal(loc=10, scale=2, size=50)
sample2 = np.random.normal(loc=12, scale=2, size=50)

# Calculate the two-sample t-test
statistic, pvalue = ttest_ind(sample1, sample2)

# Print results
print("Sample 1 mean:", np.mean(sample1))
print("Sample 2 mean:", np.mean(sample2))
print("Test statistic:", statistic)
print("p-value:", pvalue)


Sample 1 mean: 10.253698514651791
Sample 2 mean: 12.133848873820416
Test statistic: -4.852235814927287
p-value: 4.6048966068248664e-06


# Q11

Student's t-distribution is a probability distribution that is used to model the behavior of the sample means of small samples (n < 30) when the population standard deviation is unknown. It is a type of continuous probability distribution that is used in hypothesis testing, confidence interval estimation, and other statistical inference applications.

The t-distribution is similar to the standard normal distribution but has heavier tails, which reflects the increased uncertainty that arises from estimating the population standard deviation from a small sample size. The shape of the t-distribution depends on the degrees of freedom, which is equal to n - 1 for a sample size of n.

The t-distribution is typically used in situations where the population standard deviation is unknown and the sample size is small. This occurs in a variety of applications, such as medical studies, psychology, engineering, and social sciences, where researchers may not have access to a large sample size or the population standard deviation.

The t-distribution is used to calculate confidence intervals and test hypotheses about the population mean. In particular, it is used in situations where the sample size is small and the population standard deviation is unknown. When the sample size is large (n ≥ 30), the t-distribution approaches the normal distribution, and the standard normal distribution can be used in place of the t-distribution.

# Q12

The t-statistic is a measure of how many standard errors a sample mean is away from the null hypothesis mean. It is used in hypothesis testing to determine if there is a significant difference between two sample means.

The formula for the t-statistic is:

t = (x̄1 - x̄2) / (s_p * sqrt(1/n1 + 1/n2))

where:

x̄1 and x̄2 are the sample means for two independent samples
s_p is the pooled standard deviation, calculated as:
s_p = sqrt(((n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2))

s1 and s2 are the standard deviations of the two samples
n1 and n2 are the sample sizes of the two samples
The t-statistic follows a t-distribution with degrees of freedom equal to (n1 + n2 - 2) under the null hypothesis of equal population means. By comparing the calculated t-statistic to the appropriate t-distribution with the degrees of freedom, we can determine the p-value and make a decision regarding the null hypothesis.

# Q13

To estimate the population mean revenue with a 95% confidence interval, we can use the following formula:

Confidence interval = sample mean ± margin of error

where the margin of error is:

Margin of error = t-value * (standard deviation / sqrt(sample size))

The t-value is based on the level of confidence and the degrees of freedom, which is equal to the sample size minus one (df = 50 - 1 = 49) in this case. For a 95% confidence interval and 49 degrees of freedom, the t-value is 2.0096, which we can look up in a t-distribution table or calculate using a statistical software.

Substituting the values given in the problem, we have:

Margin of error = 2.0096 * (50 / sqrt(50)) = 14.14

Therefore, the 95% confidence interval for the population mean revenue is:

Confidence interval = 500 ± 14.14 = (485.86, 514.14)

# Q14

To test the hypothesis with a significance level of 0.05, we can use a one-sample t-test with the following null and alternative hypotheses:

Null hypothesis (H0): The true mean decrease in blood pressure is equal to 10 mmHg.
Alternative hypothesis (Ha): The true mean decrease in blood pressure is less than 10 mmHg.
We can calculate the test statistic, t-value, using the formula:

t = (sample mean - hypothesized mean) / (standard deviation / sqrt(sample size))

Substituting the given values, we get:

t = (8 - 10) / (3 / sqrt(100)) = -4.47

The degrees of freedom for this test are (sample size - 1) = 99. At a significance level of 0.05 and with 99 degrees of freedom, the critical t-value from a t-distribution table is -1.660. Since the calculated t-value (-4.47) is less than the critical t-value (-1.660), we can reject the null hypothesis.

Therefore, we can conclude that there is sufficient evidence to suggest that the new drug decreases blood pressure by less than 10 mmHg.

# Q15

To test the hypothesis that the true mean weight of the products is less than 5 pounds with a significance level of 0.01, we can use a one-sample t-test with the following null and alternative hypotheses:

Null hypothesis (H0): The true mean weight of the products is equal to 5 pounds.
Alternative hypothesis (Ha): The true mean weight of the products is less than 5 pounds.
We can calculate the test statistic, t-value, using the formula:

t = (sample mean - hypothesized mean) / (standard deviation / sqrt(sample size))

Substituting the given values, we get:

t = (4.8 - 5) / (0.5 / sqrt(25)) = -2.0

The degrees of freedom for this test are (sample size - 1) = 24. At a significance level of 0.01 and with 24 degrees of freedom, the critical t-value from a t-distribution table is -2.492. Since the calculated t-value (-2.0) is greater than the critical t-value (-2.492), we fail to reject the null hypothesis.

Therefore, we cannot conclude that the true mean weight of the products is less than 5 pounds with a significance level of 0.01.

# Q16

To test the hypothesis that the population means for the two groups are equal with a significance level of 0.01, we can use a two-sample t-test with the following null and alternative hypotheses:

Null hypothesis (H0): The population means for the two groups are equal.
Alternative hypothesis (Ha): The population means for the two groups are not equal.
We can calculate the test statistic, t-value, using the formula:

t = (sample mean difference - hypothesized difference) / (pooled standard deviation * sqrt(1/n1 + 1/n2))

where:

sample mean difference = mean1 - mean2
hypothesized difference = 0 (since we are testing for equality of means)
pooled standard deviation = sqrt(((n1-1)*s1^2 + (n2-1)*s2^2) / (n1+n2-2))
Substituting the given values, we get:

sample mean difference = 80 - 75 = 5
s1 = 10, n1 = 30, s2 = 8, n2 = 40

pooled standard deviation = sqrt(((30-1)*10^2 + (40-1)*8^2) / (30+40-2)) = 9.06

t = (5 - 0) / (9.06 * sqrt(1/30 + 1/40)) = 2.27

The degrees of freedom for this test are (n1 + n2 - 2) = 68. At a significance level of 0.01 and with 68 degrees of freedom, the critical t-value from a t-distribution table is ±2.625. Since the calculated t-value (2.27) is less than the critical t-value (2.625), we fail to reject the null hypothesis.

Therefore, we cannot conclude that the population means for the two groups are not equal with a significance level of 0.01.

# Q17

CI = x̄ ± z*(σ/√n)

where:

x̄ is the sample mean
z is the z-score associated with the desired confidence level (99%)
σ is the population standard deviation (unknown)
n is the sample size
Since the population standard deviation is unknown, we can use the sample standard deviation as an estimate. The z-score associated with a 99% confidence level can be found using a standard normal distribution table or using a calculator function.

Assuming a normal distribution, the z-score associated with a 99% confidence level is 2.576.

Substituting the values into the formula, we get:CI = x̄ ± z*(σ/√n)


CI = 4 ± 2.576*(1.5/√50)
   = 4 ± 0.652
   
   
   Therefore, the 99% confidence interval for the population mean is (3.348, 4.652). We can interpret this interval as follows: if we were to repeat the sampling process many times, we would expect the true population mean to fall within this interval 99% of the time.

