# ANSWER 1
## t-test:
The t-test is used when the sample size is small (typically n < 30) or when the population standard deviation is unknown.
It uses the t-distribution, which has fatter tails than the standard normal (Z) distribution, making it more appropriate for smaller sample sizes.

The t-test is commonly used for testing hypotheses about a single sample mean or the difference between two sample means.

Example scenario: Testing whether there is a significant difference between the average test scores of two different groups of students, where the sample sizes are small.
## z-test:
The z-test is used when the sample size is large (typically n ≥ 30) and when the population standard deviation is known or when the sample size is large enough to approximate the population standard deviation.

It uses the standard normal (Z) distribution, which has a mean of 0 and a standard deviation of 1.

The z-test is commonly used for testing hypotheses about a single sample mean or the difference between two sample means when the sample size is large.

Example scenario: Testing whether the average height of a population is significantly different from a known value, where the sample size is large.

# ANSWER 2 
## One-tailed test:
In a one-tailed (or one-sided) test, the hypothesis test is conducted in only one direction (either greater than or less than).

It is used when the research question is specifically focused on whether the sample mean is greater than or less than a certain value.

The critical region for the test is located in one tail of the distribution.

Example scenario: Testing whether a new drug improves test scores, where the research hypothesis is that the drug increases scores (one-tailed in the positive direction).
## Two-tailed test:
In a two-tailed (or two-sided) test, the hypothesis test is conducted in both directions (greater than and less than).

It is used when the research question is concerned with whether the sample mean is different from a certain value in either direction.

The critical region for the test is divided between both tails of the distribution.

Example scenario: Testing whether a new teaching method has an effect on test scores, where the research hypothesis is that the method changes scores (two-tailed).

# ANSWER 3
## Type 1 error (False Positive):
A Type 1 error occurs when the null hypothesis (H0) is rejected when it is actually true.

It represents a false positive result, where we mistakenly conclude that there is a significant effect when there is none.

The probability of making a Type 1 error is denoted by the significance level (α) and is set before conducting the test.

Example scenario: In a criminal trial, an innocent person is found guilty (rejecting the null hypothesis of innocence) based on insufficient evidence.
## Type 2 error (False Negative):
A Type 2 error occurs when the null hypothesis (H0) is not rejected when it is actually false.

It represents a false negative result, where we fail to detect a significant effect when it exists.

The probability of making a Type 2 error is denoted by β (beta) and is dependent on the sample size, the effect size, and the significance level (α).

Example scenario: In a medical test, a person with a disease is wrongly diagnosed as not having the disease (failing to reject the null hypothesis of no disease) due to an inaccurate test.

# ANSWER 4
Bayes's Theorem is a fundamental concept in probability theory used to update the probability of a hypothesis based on new evidence. It is expressed as: P(A|B) = P(B|A) * P(A) / P(B)

Where:P(A|B) is the conditional probability of event A given event B has occurred.

P(B|A) is the conditional probability of event B given event A has occurred.

P(A) and P(B) are the probabilities of events A and B, respectively.
## Example scenario: A medical test to detect a rare disease has the following characteristics:
The probability of having the disease (A) is 1 in 1000 (P(A) = 0.001).

The probability of the test correctly detecting the disease (true positive rate) is 99% (P(B|A) = 0.99).

The probability of the test correctly ruling out the disease (true negative rate) is 98% (P(not B|not A) = 0.98).

Suppose a person tests positive for the disease (B). We want to find the probability that the person actually has the disease (P(A|B)).

Using Bayes's Theorem:
P(A|B) = P(B|A) * P(A) / P(B)

P(A|B) = 0.99 * 0.001 / (P(B|A) * P(A) + P(B|not A) * P(not A))

P(A|B) = 0.99 * 0.001 / (0.99 * 0.001 + 0.02 * 0.999)

P(A|B) ≈ 0.0495

Therefore, the probability that the person actually has the disease (given they tested positive) is approximately 0.0495 or 4.95%.

# ANSWER 5
A confidence interval is a range of values that is likely to contain the true population parameter (e.g., mean or proportion) with a specified level of confidence.

Example scenario: Suppose we want to estimate the average height of students in a college. We take a random sample of 100 students and find their heights. The sample mean height (x̄) is 170 cm, and the sample standard deviation (s) is 5 cm.

In [16]:
import scipy.stats as stats

sample_mean = 170
sample_std_dev = 5
sample_size = 100
confidence_level = 0.95

# the critical value from the standard normal (Z) distribution
critical_value = stats.norm.ppf((1 + confidence_level) / 2)

# the margin of error
margin_of_error = critical_value * (sample_std_dev / (sample_size ** 0.5))

# Calculate the 95% confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("The 95% confidence interval for the population mean height is:",lower_bound,"cm",",",upper_bound ,"cm")

The 95% confidence interval for the population mean height is: 169.02001800772996 cm , 170.97998199227004 cm


# ANSWER 6
Bayes' Theorem allows us to update the probability of an event based on new evidence. The formula is as follows:

P(A|B) = P(B|A) * P(A) / P(B)

Where:

P(A|B) is the conditional probability of event A given event B has occurred.

P(B|A) is the conditional probability of event B given event A has occurred.

P(A) and P(B) are the probabilities of events A and B, respectively.
## Sample problem:
Suppose a factory produces two types of products: Product A and Product B. Historically, 10% of the products are of type A (P(A) = 0.10), and 5% of the products are of type B (P(B) = 0.05). An inspector has a 90% chance of correctly identifying a product as type A if it is, in fact, of type A (P(A|B) = 0.90), and a 80% chance of correctly identifying a product as type B if it is of type B (P(B|A) = 0.80). The inspector selects a product at random and identifies it as type A. What is the probability that the product is actually of type A (P(A|B))?

Solution:
Using Bayes' Theorem:

P(A|B) = P(B|A) * P(A) / P(B)

P(A|B) = 0.90 * 0.10 / (P(B|A) * P(A) + P(B|not A) * P(not A))

P(A|B) = 0.90 * 0.10 / (0.80 * 0.10 + 0.95 * 0.90)

P(A|B) ≈ 0.105

The probability that the product is actually of type A (given it was identified as type A by the inspector) is approximately 0.105 or 10.5%.

# ANSWER 7
Since the sample size (n) is not provided,cannot calculate the exact confidence interval. However,the 95% confidence interval for the population mean lies within ± (1.96 * 5) = ± 9.8 units from the sample mean of 50. So, the range would be approximately (40.2, 59.8). This means that we are 95% confident that the true population mean is within this range.

# ANSWER 8 
The margin of error in a confidence interval is a measure of the uncertainty or variability associated with estimating a population parameter (e.g., mean or proportion) based on a sample. It represents the range around the sample estimate within which we expect the true population parameter to lie with a certain level of confidence.

The formula for the margin of error is given by:

Margin of Error = Critical Value * (sample_std_dev / √sample_size)

Where:

Critical Value: It depends on the desired confidence level and the chosen statistical distribution (e.g., Z-distribution for large sample sizes or t-distribution for small sample sizes).

sample_std_dev: The standard deviation of the sample.

√sample_size: The square root of the sample size.

## sample size affects the margin of error:
1. As the sample size increases, the margin of error decreases. In other words, a larger sample size leads to a smaller margin of error.
2. As the sample size increases, the sample estimate becomes more precise and closer to the true population parameter.
## Example scenario:
Let's consider a survey aimed at estimating the average income of employees in a company. We conduct two separate surveys:

Survey 1: Sample size (n) = 100, Sample mean income (x̄) = $60,000, Sample standard deviation (s) = $8,000.

Survey 2: Sample size (n) = 400, Sample mean income (x̄) = $60,000, Sample standard deviation (s) = $8,000.

Using a 95% confidence level (Z-critical value ≈ 1.96 for large sample sizes):

For Survey 1:
Margin of Error = 1.96 * ($8,000 / √100) ≈ $1,568

For Survey 2:
Margin of Error = 1.96 * ($8,000 / √400) ≈ $784

In this example, the larger sample size in Survey 2 results in a smaller margin of error compared to Survey 1. As the sample size increases, the margin of error decreases, providing a more precise estimate of the population mean income. This demonstrates the importance of larger sample sizes when conducting surveys or experiments to obtain more reliable and accurate results.

# ANSWER 9
z = (x - μ) / σ
Where:

z is the z-score.

x = 75 

μ = 70 

σ = 5 

z = (75 - 70) / 5

z = 1

Interpretation of the results:

The calculated z-score is 1. This means that the data point with a value of 75 is one standard deviation above the population mean of 70.

Z-scores are a way to standardize data points and express their distance from the mean in terms of standard deviations. A positive z-score indicates that the data point is above the mean, while a negative z-score indicates that it is below the mean. The magnitude of the z-score tells us how far the data point is from the mean in standard deviation units.A z-score of 1 means that the data point is 1 standard deviation above the mean.

# ANSWER 10

In [17]:
import scipy.stats as stats

sample_mean = 6
sample_std_dev = 2.5
sample_size = 50
hypothesized_mean = 0
confidence_level = 0.95
# the t-statistic
t_statistic = (sample_mean - hypothesized_mean) / (sample_std_dev / (sample_size ** 0.5))

# the critical value from the t-distribution
critical_value = stats.t.ppf(1 - (1 - confidence_level) / 2, df=sample_size - 1)

# Perform the hypothesis test
if (t_statistic) > critical_value:
    print("Reject the null hypothesis:The weight loss drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis:There is no significant evidence that the drug is effective.")

Reject the null hypothesis:The weight loss drug is significantly effective.


# ANSWER 11

In [18]:
sample_proportion = 0.65
sample_size = 500
confidence_level = 0.95

# the critical value from the standard normal (Z) distribution
critical_value = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# the margin of error
margin_of_error = critical_value * ( (sample_proportion * (1 - sample_proportion)) / sample_size ) ** 0.5

# the confidence interval
confidence_interval = (sample_proportion - margin_of_error, sample_proportion + margin_of_error)

print("95% Confidence Interval:", confidence_interval)


95% Confidence Interval: (0.6081925393809212, 0.6918074606190788)


# ANSWER 12

In [19]:
sample_mean_A = 85
sample_std_dev_A = 6
sample_size_A = 30

sample_mean_B = 82
sample_std_dev_B = 5
sample_size_B = 25

significance_level = 0.01

# Calculate the degrees of freedom
degrees_of_freedom = sample_size_A + sample_size_B - 2

# the standard deviation
std_dev = ((sample_std_dev_A ** 2) * (sample_size_A - 1) + (sample_std_dev_B ** 2) * (sample_size_B - 1)) / degrees_of_freedom
std_dev = std_dev ** 0.5

# the t-statistic
t_statistic = (sample_mean_A - sample_mean_B) / (std_dev * ((1 / sample_size_A) + (1 / sample_size_B)) ** 0.5)

# the critical value from the t-distribution
critical_value = stats.t.ppf(1 - (significance_level / 2), df=degrees_of_freedom)

# Perform the hypothesis test
if t_statistic > critical_value:
    print("Reject the null hypothesis:There is a significant difference in student performance between the two teaching methods.")
else:
    print("Fail to reject the null hypothesis:There is no significant evidence of a difference in student performance between the two teaching methods.")

Fail to reject the null hypothesis:There is no significant evidence of a difference in student performance between the two teaching methods.


# ANSWER 13

In [20]:
population_mean = 60
population_std_dev = 8
sample_mean = 65
sample_size = 50
confidence_level = 0.90

# the critical value from the standard normal (Z) distribution
critical_value = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# the margin of error
margin_of_error = critical_value * (population_std_dev / (sample_size ** 0.5))

# the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print("90% Confidence Interval:", confidence_interval)


90% Confidence Interval: (63.13906055411732, 66.86093944588268)


# ANSWER 14

In [21]:
sample_mean = 0.25
sample_std_dev = 0.05
sample_size = 30
hypothesized_mean = 0
confidence_level = 0.90

# the t-statistic
t_statistic = (sample_mean - hypothesized_mean) / (sample_std_dev / (sample_size ** 0.5))

# the degrees of freedom
degrees_of_freedom = sample_size - 1

# the critical value from the t-distribution
critical_value = stats.t.ppf(1 - (1 - confidence_level) / 2, df=degrees_of_freedom)

# Perform the hypothesis test
if t_statistic > critical_value:
    print("Reject the null hypothesis:Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis:There is no significant evidence that caffeine has an effect on reaction time.")


Reject the null hypothesis:Caffeine has a significant effect on reaction time.
