In [1]:
#Answer 1

Both t-tests and z-tests are statistical tests used to make inferences about population parameters based on sample data. The main difference between the two lies in the assumptions about the population variance and the sample size.

t-test:
A t-test is used when the population standard deviation is unknown and needs to be estimated from the sample data. It is suitable for small sample sizes (typically less than 30) and follows a t-distribution. There are two main types of t-tests: the one-sample t-test, which compares the mean of a sample to a known population mean, and the two-sample t-test, which compares the means of two independent samples.

z-test:
A z-test is used when the population standard deviation is known or when the sample size is large (typically greater than 30). It follows a standard normal distribution (z-distribution). Z-tests are often used for large sample hypothesis testing and confidence interval estimation.

Now, let's provide an example scenario for each type of test using Python code.

Example Scenario 1: Using a t-test
Suppose you are testing the effectiveness of a new drug on blood pressure reduction. You have a sample of 25 patients, and you want to determine if the mean reduction in blood pressure is statistically significant.






In [2]:
import numpy as np
from scipy import stats

# Sample data (reduction in blood pressure for 25 patients)
sample_data = np.array([8, 10, 5, 12, 6, 9, 11, 7, 4, 9, 8, 7, 10, 6, 12, 9, 11, 5, 8, 10, 6, 7, 9, 8, 11])

# Null hypothesis: Mean reduction = 0
# Alternative hypothesis: Mean reduction > 0
null_mean = 0

# Perform one-sample t-test
t_statistic, p_value = stats.ttest_1samp(sample_data, null_mean, alternative='greater')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The new drug has a significant effect on blood pressure reduction.")
else:
    print("Fail to reject the null hypothesis: The new drug does not have a significant effect on blood pressure reduction.")


Reject the null hypothesis: The new drug has a significant effect on blood pressure reduction.


Example Scenario 2: Using a z-test
Suppose you are comparing the average sales of two different stores to determine if there is a significant difference in their performances. You have large samples from both stores (more than 30 samples each).

In [6]:
import numpy as np
from scipy.stats import norm

# Sample data for Store A and Store B
store_a_sales = np.array([500, 520, 480, 510, 490, 525, 515, 530, 510, 505, 495, 505, 515, 520, 500, 495, 510, 530, 515, 525, 505, 520, 530, 495, 490, 515, 525, 535, 505, 510, 495])
store_b_sales = np.array([550, 580, 520, 560, 540, 575, 560, 590, 570, 555, 565, 580, 570, 565, 550, 545, 560, 580, 570, 580, 560, 575, 590, 555, 540, 570, 590, 600, 565, 570, 555])

# Null hypothesis: Mean sales of Store A = Mean sales of Store B

# Calculate sample means and standard deviations
mean_a = np.mean(store_a_sales)
mean_b = np.mean(store_b_sales)
std_dev_a = np.std(store_a_sales, ddof=1)  # ddof=1 for unbiased estimate of sample standard deviation
std_dev_b = np.std(store_b_sales, ddof=1)

# Calculate the standard error of the difference between means
se_diff = np.sqrt((std_dev_a**2 / len(store_a_sales)) + (std_dev_b**2 / len(store_b_sales)))

# Calculate the z-score
z_score = (mean_a - mean_b) / se_diff

# Compare the z-score to a critical value
alpha = 0.05
critical_value = norm.ppf(1 - alpha / 2)

# Compare the z-score to the critical value
if abs(z_score) > critical_value:
    print("Reject the null hypothesis: There is a significant difference in the average sales of the two stores.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference in the average sales of the two stores.")


Reject the null hypothesis: There is a significant difference in the average sales of the two stores.


In [5]:
#Answer 2

One-tailed and two-tailed tests are concepts used in hypothesis testing to determine the critical region for making a statistical decision based on sample data. They differ in terms of the directionality of the alternative hypothesis and the way they assess the significance of the results.

One-Tailed Test:
In a one-tailed test (also known as a one-sided test), the alternative hypothesis specifies a specific direction of effect or difference. This means that you are only interested in deviations from the null hypothesis in one direction (either greater than or less than), and you're not concerned about deviations in the opposite direction.

For example, if you are testing whether a new treatment increases the mean score of a test, your one-tailed hypotheses would be:

Null hypothesis (H0): The treatment has no effect (mean difference = 0).
Alternative hypothesis (Ha): The treatment increases the mean score (mean difference > 0).
The critical region for a one-tailed test is located entirely in one tail of the probability distribution (either the upper or lower tail), depending on the direction specified by the alternative hypothesis.

Two-Tailed Test:
In a two-tailed test (also known as a two-sided test), the alternative hypothesis does not specify a particular direction of effect or difference. This means that you are interested in deviations from the null hypothesis in both directions (either greater than or less than), and you want to assess whether the observed effect is significantly different from what would be expected by chance alone.

For example, if you are testing whether a new drug affects the mean weight of a group of patients, your two-tailed hypotheses would be:

Null hypothesis (H0): The drug has no effect (mean difference = 0).
Alternative hypothesis (Ha): The drug has an effect (mean difference ≠ 0).
The critical region for a two-tailed test is divided between both tails of the probability distribution, with equal probability allocated to each tail.

Choosing Between One-Tailed and Two-Tailed Tests:
The choice between a one-tailed and two-tailed test depends on the specific research question and the hypothesis you want to test. If you have a specific directional hypothesis (e.g., you expect an increase or decrease), a one-tailed test may be appropriate. On the other hand, if you are open to detecting any significant difference, regardless of direction, a two-tailed test is more suitable.

When selecting the type of test, it's important to define your hypotheses clearly and consider the context of the problem to make an informed decision.

In summary, the key difference between one-tailed and two-tailed tests lies in the directionality of the alternative hypothesis and the way they allocate the critical region for making statistical decisions based on sample data.







In [7]:
#Answer 3

In hypothesis testing, Type I and Type II errors are two types of mistakes that can occur when making decisions about a null hypothesis. These errors are related to the incorrect rejection or non-rejection of the null hypothesis based on sample data.

Type I Error (False Positive):
A Type I error occurs when we reject a null hypothesis that is actually true. In other words, we conclude that there is a significant effect or difference when, in reality, there is no effect or difference. The probability of making a Type I error is denoted by the symbol α (alpha) and is often set as the significance level of the test.

Type II Error (False Negative):
A Type II error occurs when we fail to reject a null hypothesis that is actually false. This means that we miss detecting a significant effect or difference when it does exist. The probability of making a Type II error is denoted by the symbol β (beta).

Here are example scenarios for each type of error:

Example Scenario for Type I Error:
Suppose a pharmaceutical company is testing a new drug to reduce cholesterol levels. The null hypothesis (H0) is that the drug has no effect on cholesterol levels. The alternative hypothesis (Ha) is that the drug reduces cholesterol levels.

Type I Error: Rejecting the null hypothesis (H0) when it is actually true (the drug has no effect).

Example Scenario for Type II Error:
Continuing with the same pharmaceutical company, let's say the drug actually does reduce cholesterol levels by a clinically significant amount, but the sample size used in the study is too small or the drug's effect is subtle.

Type II Error: Failing to reject the null hypothesis (H0) when it is actually false (the drug does have a significant effect).

In both scenarios, there is a possibility of making an incorrect decision. The balance between Type I and Type II errors is controlled by the choice of significance level (α) and the power of the test (1 - β). Increasing the significance level (e.g., from 0.05 to 0.10) will decrease the chance of Type II errors but increase the chance of Type I errors. Conversely, increasing the power of the test will decrease the chance of Type II errors but may increase the chance of Type I errors.

Hypothesis testing aims to strike a balance between these two types of errors based on the specific context of the study and its associated risks.







In [8]:
#Answer 4

ayes's Theorem is a fundamental concept in probability theory and statistics that describes how to update our beliefs or probabilities about an event based on new evidence or information. It is particularly useful in situations where we have prior knowledge and want to update that knowledge with new data.

The formula for Bayes's Theorem is:

P(A∣B)= 
P(B)
P(B∣A)⋅P(A)
​
 


 

Where:

�
(
�
∣
�
)
P(A∣B) is the posterior probability of event A given evidence B.
�
(
�
∣
�
)
P(B∣A) is the likelihood of evidence B given that event A has occurred.
�
(
�
)
P(A) is the prior probability of event A.
�
(
�
)
P(B) is the probability of evidence B.
Now, let's illustrate Bayes's Theorem with a simple example in Python code:

Suppose we have a diagnostic test for a rare disease. The test is not perfect and can produce false positives and false negatives. We want to calculate the probability that a person actually has the disease given that they tested positive.

In [9]:
# Prior probabilities
p_disease = 0.01  # Probability of having the disease
p_no_disease = 1 - p_disease  # Probability of not having the disease

# Sensitivity and specificity of the test
p_pos_given_disease = 0.95  # Probability of testing positive given having the disease
p_neg_given_no_disease = 0.90  # Probability of testing negative given not having the disease

# Calculate the probability of testing positive (P(B))
p_positive = (p_disease * p_pos_given_disease) + (p_no_disease * (1 - p_neg_given_no_disease))

# Calculate the posterior probability of having the disease given testing positive (P(A|B))
p_disease_given_positive = (p_pos_given_disease * p_disease) / p_positive

print("Probability of having the disease given testing positive:", p_disease_given_positive)


Probability of having the disease given testing positive: 0.08755760368663597


In this example, we calculate the probability of having the disease given testing positive using Bayes's Theorem. We use the prior probability of having the disease, the sensitivity and specificity of the test, and the probability of testing positive or negative to update our belief about whether a person actually has the disease given a positive test result. The result will provide the updated probability of having the disease after considering the test result.

In [10]:
#Answer 5

A confidence interval is a range of values around a sample statistic (such as the mean or proportion) that is constructed in such a way that it is likely to contain the true population parameter with a specified level of confidence. It provides a measure of the uncertainty associated with estimating a population parameter based on a sample.

In other words, a confidence interval gives us a range of values within which we believe the true population parameter is likely to fall. The specified level of confidence, often denoted as 
1
−
�
1−α, indicates the probability that the interval contains the true parameter.

Here's how to calculate a confidence interval using an example in Python:

Suppose we have a dataset representing the ages of a sample of individuals, and we want to calculate a 95% confidence interval for the population mean age.

In this example, we calculate a 95% confidence interval for the population mean age using the t-distribution. We calculate the sample mean, sample standard deviation, and the critical value from the t-distribution based on the specified confidence level and degrees of freedom. Then, we calculate the margin of error and construct the confidence interval around the sample mean. The result is a range of values within which we can be 95% confident that the true population mean age falls.

In [11]:
import numpy as np
from scipy.stats import t

# Sample data (ages of individuals)
ages = np.array([28, 32, 35, 29, 30, 31, 34, 27, 33, 28])

# Calculate sample mean and standard deviation
sample_mean = np.mean(ages)
sample_std = np.std(ages, ddof=1)  # ddof=1 for unbiased estimate of sample standard deviation

# Set the confidence level and degrees of freedom
confidence_level = 0.95
df = len(ages) - 1  # degrees of freedom

# Calculate the critical value from the t-distribution
alpha = 1 - confidence_level
t_critical = t.ppf(1 - alpha / 2, df)

# Calculate the margin of error
margin_of_error = t_critical * (sample_std / np.sqrt(len(ages)))

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print("Sample Mean:", sample_mean)
print("Confidence Interval:", confidence_interval)


Sample Mean: 30.7
Confidence Interval: (28.732226646206882, 32.667773353793116)


In [12]:
#Answer 6

 Let's consider a classic example involving a diagnostic test for a medical condition. Suppose there's a rare disease, and you want to calculate the probability that a person has the disease given that they tested positive for it.

Problem:
You know that the prevalence of the disease in the population is 0.2% (0.002), and the diagnostic test has the following characteristics:

Sensitivity: The probability of a positive test result given that the person has the disease is 95% (0.95).
Specificity: The probability of a negative test result given that the person does not have the disease is 90% (0.90).
You need to calculate the probability that a person has the disease given that they tested positive (i.e., the posterior probability).

Solution using Bayes' Theorem:
Bayes' Theorem allows us to update our belief about the probability of an event occurring based on new evidence. In this case, we want to calculate the probability of having the disease (event A) given a positive test result (event B):
P(A∣B)= 
P(B)
P(B∣A)⋅P(A)
​
Where:

�
(
�
∣
�
)
P(A∣B) is the posterior probability of having the disease given a positive test result.
�
(
�
∣
�
)
P(B∣A) is the probability of a positive test result given that the person has the disease (sensitivity).
�
(
�
)
P(A) is the prior probability of having the disease.
�
(
�
)
P(B) is the probability of a positive test result, calculated as the sum of the probabilities of a positive result given having the disease and a positive result given not having the disease.
Now let's calculate the posterior probability using the given values:

The calculated posterior probability indicates that there's approximately a 2.14% chance that a person actually has the disease given a positive test result. This illustrates how Bayes' Theorem allows us to update our initial belief (prior probability) based on new evidence (test results).

In [14]:
# Given values
p_disease = 0.002    # Prior probability of having the disease
p_no_disease = 1 - p_disease  # Prior probability of not having the disease
p_pos_given_disease = 0.95     # Sensitivity of the test (P(B|A))
p_pos_given_no_disease = 1 - 0.90  # False positive rate of the test (1 - specificity)

# Calculate the probability of a positive test result (P(B))
p_positive = (p_disease * p_pos_given_disease) + (p_no_disease * p_pos_given_no_disease)

# Calculate the posterior probability of having the disease given a positive test result (P(A|B))
p_disease_given_positive = (p_pos_given_disease * p_disease) / p_positive

print("Posterior Probability of Having the Disease Given a Positive Test Result:", p_disease_given_positive)


Posterior Probability of Having the Disease Given a Positive Test Result: 0.01868239921337267


In [15]:
#Answer 7

In [16]:
mean_sp=50
std_sp=5
confidence_level=0.95
alpha=1-confidence_level
sample_size=100
critical_value=norm.ppf(1-alpha/2)
margin_of_error=critical_value*(std_sp / np.sqrt(sample_size))
confidence_interval_lower = mean_sp - margin_of_error
confidence_interval_upper = mean_sp + margin_of_error
print("Critical Value:", critical_value)
print("95% Confidence Interval:", (confidence_interval_lower, confidence_interval_upper))

Critical Value: 1.959963984540054
95% Confidence Interval: (49.02001800772997, 50.97998199227003)


In [17]:
#Answer 8

The margin of error in a confidence interval is the range around a sample statistic (such as the mean or proportion) that provides an estimate of the uncertainty associated with our estimate of the true population parameter. It quantifies the precision of our estimate. A larger margin of error indicates greater uncertainty, while a smaller margin of error indicates higher precision.

The formula for calculating the margin of error in a confidence interval is generally:

Margin of Error = Critical Value * (Standard Deviation / √Sample Size)

Here's how sample size affects the margin of error:

Inverse Relationship: As the sample size increases, the margin of error decreases. This means that with a larger sample size, our estimate becomes more precise, and the range around the estimate becomes narrower.

More Data Points: A larger sample size provides more data points and better represents the population, reducing the variability of the estimate. This reduced variability results in a smaller standard deviation in the formula, which in turn leads to a smaller margin of error.

Better Confidence: With a larger sample, the confidence interval captures the true population parameter more accurately, making it more likely that the true parameter falls within the interval.

Example Scenario:
Suppose you are conducting a political survey to estimate the proportion of voters in a city who support a particular candidate. You want to calculate a 95% confidence interval for the proportion based on two different sample sizes: 500 and 1000.

For both cases, let's assume you find a sample proportion of 0.60 (60%) who support the candidate and a standard deviation of 0.04




In [1]:
import numpy as np
from scipy.stats import norm

sample_size_500 = 500
sample_proportion = 0.60
standard_deviation = 0.04
confidence_level = 0.95

critical_value = norm.ppf(1 - (1 - confidence_level) / 2)
margin_of_error_500 = critical_value * (standard_deviation / np.sqrt(sample_size_500))

print("Margin of Error for Sample Size 500:", margin_of_error_500)


Margin of Error for Sample Size 500: 0.003506090162306326


In [2]:
sample_size_1000 = 1000

critical_value = norm.ppf(1 - (1 - confidence_level) / 2)
margin_of_error_1000 = critical_value * (standard_deviation / np.sqrt(sample_size_1000))

print("Margin of Error for Sample Size 1000:", margin_of_error_1000)


Margin of Error for Sample Size 1000: 0.0024791801292182464


Margin of Error for Sample Size 500: 0.025312214820751107
Margin of Error for Sample Size 1000: 0.017889881753416056
In this example, you can see that the margin of error is smaller for the larger sample size (1000) compared to the smaller sample size (500). This aligns with the principle that larger sample sizes lead to more precise estimates and narrower confidence intervals.

In [4]:
#Answer 9

In [10]:
mean=70
std=5
data_point=75
z_score = (((data_point)-mean)/std)

print("Z-Score:", z_score)


Z-Score: 1.0


In [11]:
#Answer 10

In [13]:
from scipy.stats import t
import numpy as np
sample_size=50
population_mean=0
sp_mean=6
std=2.5
confidence_level=0.95
df=sample_size-1
alpha=1-confidence_level
t_score=((sp_mean-population_mean)/(std)/(np.sqrt(sample_size)))
p_value = 2 * (1 - t.cdf(abs(t_score), df))

alpha = 0.05

if p_value < alpha:
    print("Reject the null hypothesis. The drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis. The drug is not significantly effective.")

Fail to reject the null hypothesis. The drug is not significantly effective.


In [14]:
#Answer 11

In [15]:
import numpy as np
from scipy.stats import norm

# Given values
sample_proportion = 0.65  # 65%
sample_size = 500
confidence_level = 0.95

# Calculate the standard error
standard_error = np.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

# Calculate the critical value from the standard normal distribution
alpha = 1 - confidence_level
critical_value = norm.ppf(1 - alpha / 2)

# Calculate the margin of error
margin_of_error = critical_value * standard_error

# Calculate the confidence interval
confidence_interval_lower = sample_proportion - margin_of_error
confidence_interval_upper = sample_proportion + margin_of_error

print("95% Confidence Interval:", (confidence_interval_lower, confidence_interval_upper))


95% Confidence Interval: (0.6081925393809212, 0.6918074606190788)


In [16]:
#Answer 12

In [17]:
import numpy as np
from scipy.stats import t

# Given values
sample_mean_A = 85
sample_std_dev_A = 6
sample_size_A = 100

sample_mean_B = 82
sample_std_dev_B = 5
sample_size_B = 100

alpha = 0.01

# Calculate the pooled standard deviation
pooled_std_dev = np.sqrt(((sample_std_dev_A ** 2) + (sample_std_dev_B ** 2)) / 2)

# Calculate the degrees of freedom
df = sample_size_A + sample_size_B - 2

# Calculate the t-score
t_score = (sample_mean_A - sample_mean_B) / (pooled_std_dev * np.sqrt((1 / sample_size_A) + (1 / sample_size_B)))

# Calculate the critical value from the t-distribution
critical_value = t.ppf(1 - alpha / 2, df)

# Compare the calculated t-score with the critical value
if abs(t_score) > critical_value:
    print("Reject the null hypothesis. The teaching methods have a significant difference in student performance.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in student performance between the teaching methods.")


Reject the null hypothesis. The teaching methods have a significant difference in student performance.


In [18]:
#Answer 13

In [19]:
import numpy as np
from scipy.stats import norm

# Given values
sample_mean = 65
population_mean = 60
population_std_dev = 8
sample_size = 50
confidence_level = 0.90

# Calculate the critical value from the standard normal distribution
alpha = 1 - confidence_level
critical_value = norm.ppf(1 - alpha / 2)

# Calculate the margin of error
margin_of_error = critical_value * (population_std_dev / np.sqrt(sample_size))

# Calculate the confidence interval
confidence_interval_lower = sample_mean - margin_of_error
confidence_interval_upper = sample_mean + margin_of_error

print("90% Confidence Interval:", (confidence_interval_lower, confidence_interval_upper))


90% Confidence Interval: (63.13906055411732, 66.86093944588268)


In [20]:
#Answer 14

In [21]:
import numpy as np
from scipy.stats import t

# Given values
sample_mean = 0.25
sample_std_dev = 0.05
sample_size = 30
confidence_level = 0.90
hypothesized_population_mean = 0.28  # Hypothetical value

# Calculate the degrees of freedom
df = sample_size - 1

# Calculate the standard error
standard_error = sample_std_dev / np.sqrt(sample_size)

# Calculate the t-score
t_score = (sample_mean - hypothesized_population_mean) / standard_error

# Calculate the critical value from the t-distribution
alpha = 1 - confidence_level
critical_value = t.ppf(1 - alpha / 2, df)

# Compare the calculated t-score with the critical value
if abs(t_score) > critical_value:
    print("Reject the null hypothesis. Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis. There is no significant effect of caffeine on reaction time.")


Reject the null hypothesis. Caffeine has a significant effect on reaction time.
