Q.No-01    What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

Ans :-

Both t-tests and z-tests are statistical hypothesis tests used to make inferences about population parameters based on sample data. They are often used to determine whether the observed differences between sample groups or observed sample statistics are statistically significant or if they could have occurred by chance. The key difference between these tests lies in the scenarios where they are applicable, particularly when it comes to the population parameters being tested and the sample size.

**1. `T-test :-`**

A t-test is used when the sample size is relatively small (typically less than 30) or when the population standard deviation is unknown. There are different types of t-tests, including the one-sample t-test, two-sample (independent) t-test, and paired t-test. The t-test takes into account the variability in the sample data and calculates the t-statistic to test the hypothesis.

*    **`Example Scenario for a T-test :-`**

Suppose you want to test whether the mean height of a sample of 20 students in a particular school is significantly different from the known average height of the entire student population (population mean). Since the population standard deviation is not known, you would use a one-sample t-test to analyze the data.

In [1]:
import numpy as np
from scipy.stats import ttest_1samp

# Sample data (heights of 20 students)
sample_data = [160, 162, 165, 167, 168, 160, 163, 166, 164, 166,
               163, 162, 167, 168, 162, 164, 166, 165, 167, 168]

# Population mean height
population_mean = 165

# Perform one-sample t-test
t_statistic, p_value = ttest_1samp(sample_data, population_mean)

print("T-statistic:", t_statistic)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject the null hypothesis: The mean height is significantly different.")
else:
    print("Fail to reject the null hypothesis: The mean height is not significantly different.")

T-statistic: -0.6064971548933317
P-value: 0.5513632477619692
Fail to reject the null hypothesis: The mean height is not significantly different.


**2. `Z-test :-`**

A z-test is used when the sample size is large (typically greater than 30) and the population standard deviation is known. This test assumes that the sample mean follows a normal distribution and uses the z-statistic to test the hypothesis.

*    **`Example Scenario for a Z-test :-`**

Imagine a situation where you have a sample of 100 IQ scores from a population with a known standard deviation of 15. You want to determine if the average IQ score in the sample is significantly different from a hypothesized population mean IQ score of 100. In this case, you could use a z-test because the sample size is sufficiently large and the population standard deviation is known.

In [2]:
import numpy as np
from scipy.stats import norm

# Sample data (IQ scores of 100 individuals)
sample_data = [105, 110, 98, 115, 102, 108, 98, 105, 112, 101,
               106, 115, 100, 105, 110, 105, 98, 112, 108, 103,
               107, 98, 102, 100, 115, 98, 105, 108, 110, 106,
               112, 105, 103, 98, 115, 102, 108, 105, 100, 112,
               110, 107, 98, 105, 103, 115, 101, 108, 112, 105,
               98, 102, 110, 115, 105, 106, 112, 98, 100, 105,
               108, 115, 110, 98, 103, 106, 112, 105, 101, 108,
               100, 102, 115, 105, 98, 112, 110, 107, 106, 100,
               103, 108, 115, 105, 112, 98, 110, 107, 102, 106]

# Population mean IQ
population_mean_iq = 100
population_std_iq = 15  # Known population standard deviation

# Perform z-test
z_score = (np.mean(sample_data) - population_mean_iq) / (population_std_iq / np.sqrt(len(sample_data)))
p_value = 1 - norm.cdf(z_score)  # One-sided test

print("Z-score:", z_score)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject the null hypothesis: The mean IQ is significantly different.")
else:
    print("Fail to reject the null hypothesis: The mean IQ is not significantly different.")

Z-score: 3.703378504241635
P-value: 0.0001063735468233018
Reject the null hypothesis: The mean IQ is significantly different.


`In summary `, choose between a t-test and a z-test based on the following factors:
- Sample size: Use a t-test for smaller sample sizes and a z-test for larger sample sizes.
- Population standard deviation: If it's known, a z-test can be used; if it's unknown, a t-test might be more appropriate.

--------------------------------------------------------------------------------------------------------------------

Q.N0-02    Differentiate between one-tailed and two-tailed tests.

Ans :-

One-Tailed Test and Two-Tailed Test are concepts commonly used in statistical hypothesis testing to determine whether there is enough evidence to support or reject a null hypothesis based on sample data. These tests are used to make decisions about population parameters based on sample statistics. The key difference between them lies in the directionality of the test and the area of the distribution used for comparison.

In [3]:
import numpy as np
from scipy import stats

# Sample data (exam scores)
sample_scores = np.array([85, 90, 88, 78, 92, 82, 88, 96, 80, 87])

# Hypothesized population mean (null hypothesis)
hypothesized_mean = 85

**1. `One-Tailed Test :-`**

In a one-tailed test, the alternative hypothesis specifies a specific direction of effect or change. It focuses on whether the sample statistic is significantly different from the hypothesized population parameter in only one direction. There are two possible configurations for a one-tailed test: a "greater-than" test or a "less-than" test.

- **Greater-Than Test -** In this case, the alternative hypothesis states that the population parameter is greater than a certain value. The critical region for the test is located in the upper tail of the distribution.

- **Less-Than Test -** In this case, the alternative hypothesis states that the population parameter is less than a certain value. The critical region for the test is located in the lower tail of the distribution.

The decision to reject the null hypothesis in a one-tailed test depends on whether the sample statistic falls within the critical region corresponding to the specified direction.

In [4]:
# Perform one-tailed t-test (greater than)
t_statistic, p_value_one_tailed = stats.ttest_1samp(sample_scores, hypothesized_mean, alternative='greater')

print("One-Tailed Test:")
print(f"t-statistic: {t_statistic}")
print(f"P-value: {p_value_one_tailed}")
if p_value_one_tailed < 0.05:
    print("Reject the null hypothesis (mean is greater).")
else:
    print("Fail to reject the null hypothesis.")


One-Tailed Test:
t-statistic: 0.9163242579854516
P-value: 0.1916965732297759
Fail to reject the null hypothesis.


**2. `Two-Tailed Test :-`**


In a two-tailed test, the alternative hypothesis does not specify a particular direction of effect. Instead, it tests whether the sample statistic is significantly different from the hypothesized population parameter in either direction. The critical region for the test is divided between the two tails of the distribution.

The decision to reject the null hypothesis in a two-tailed test depends on whether the sample statistic falls into either of the two critical regions. The test considers the possibility of an effect in both directions.

In [5]:
# Perform two-tailed t-test
t_statistic, p_value_two_tailed = stats.ttest_1samp(sample_scores, hypothesized_mean)

print("\nTwo-Tailed Test:")
print(f"t-statistic: {t_statistic}")
print(f"P-value: {p_value_two_tailed}")
if p_value_two_tailed < 0.05:
    print("Reject the null hypothesis (mean is different).")
else:
    print("Fail to reject the null hypothesis.")


Two-Tailed Test:
t-statistic: 0.9163242579854516
P-value: 0.3833931464595518
Fail to reject the null hypothesis.


`To summarize` :
- One-Tailed Test: Used when you are specifically interested in whether a parameter is greater than or less than a certain value.
- Two-Tailed Test: Used when you want to detect any significant difference, regardless of the direction.

The choice between a one-tailed and a two-tailed test depends on the research question, the nature of the hypothesis, and the directionality of the effect you're investigating. It's important to carefully consider which test is appropriate based on the specific context of your study.

--------------------------------------------------------------------------------------------------------------------

Q.No-03    Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
each type of error.

Ans :-

In hypothesis testing, Type 1 and Type 2 errors are two potential mistakes that can occur when making decisions about a population based on a sample. These errors are related to the acceptance or rejection of a null hypothesis ($H_0$) under a certain level of significance. Here's what each error represents:

**1. `Type 1 Error (False Positive) :-`**

A Type 1 error occurs when you incorrectly reject a true null hypothesis. In other words, you conclude that there is a significant effect or relationship when, in reality, there is no such effect or relationship in the population. This error is also known as a false positive or alpha error and is denoted by the symbol α.

*    **`Example Scenario for Type 1 Error -`**

Imagine a pharmaceutical company testing a new drug to see if it's effective in treating a certain condition. The null hypothesis (H0) states that the drug has no effect, while the alternative hypothesis (Ha) states that the drug is effective. After conducting a clinical trial, the company finds a small improvement in the condition among the participants who took the drug. However, due to a random fluctuation or noise in the data, the improvement could have occurred by chance. If the company wrongly rejects the null hypothesis and concludes that the drug is effective, it commits a Type 1 error. In reality, the drug might not be effective, and the observed improvement might have been due to chance.

In [6]:
import numpy as np
from scipy import stats

# Example data for participants who took the drug
improvements = np.array([1.5, 2.0, 0.8, 1.2, 1.0, 1.3, 1.7, 1.4, 1.6, 1.9])

# Define the null hypothesis (H0) and alternative hypothesis (Ha)
null_hypothesis = "The drug has no effect."
alternative_hypothesis = "The drug is effective."

# Set the significance level (alpha)
alpha = 0.05

# Perform a one-sample t-test
t_statistic, p_value = stats.ttest_1samp(improvements, popmean=0)

# Print the results
print("Hypothesis Testing Results:")
print("Null Hypothesis:", null_hypothesis)
print("Alternative Hypothesis:", alternative_hypothesis)
print("Significance Level (alpha):", alpha)
print("T-statistic:", t_statistic)
print("P-value:", p_value)

# Check if the null hypothesis should be rejected based on the p-value
if p_value < alpha:
    print("Reject the null hypothesis: The drug is effective.")
else:
    print("Fail to reject the null hypothesis: The drug has no significant effect.")

Hypothesis Testing Results:
Null Hypothesis: The drug has no effect.
Alternative Hypothesis: The drug is effective.
Significance Level (alpha): 0.05
T-statistic: 11.963133553428964
P-value: 7.904336215502891e-07
Reject the null hypothesis: The drug is effective.


**2. `Type 2 Error (False Negative) :-`**


A Type 2 error occurs when you fail to reject a false null hypothesis. In this case, you fail to identify a significant effect or relationship that actually exists in the population. This error is also known as a false negative or beta error and is denoted by the symbol β.

*    **`Example Scenario for Type 2 Error -`**

Continuing with the pharmaceutical example, suppose the new drug actually has a moderate effect on the condition, but it's not very strong. The company conducts a clinical trial, but due to the variability in individual responses, the improvement observed among the participants isn't statistically significant. As a result, the company fails to reject the null hypothesis and concludes that the drug is not effective. This is a Type 2 error because the company failed to identify a true effect that existed in the population.

In [7]:
import numpy as np
from scipy import stats

# Example data for participants who took the drug
improvements = np.array([1.5, 2.0, 0.8, 1.2, 1.0, 1.3, 1.7, 1.4, 1.6, 1.9])

# Define the null hypothesis (H0) and alternative hypothesis (Ha)
null_hypothesis = "The drug has no effect."
alternative_hypothesis = "The drug is effective."

# Set the significance level (alpha)
alpha = 0.05

# Perform a one-sample t-test
t_statistic, p_value = stats.ttest_1samp(improvements, popmean=0)

# Print the results
print("Hypothesis Testing Results:")
print("Null Hypothesis:", null_hypothesis)
print("Alternative Hypothesis:", alternative_hypothesis)
print("Significance Level (alpha):", alpha)
print("T-statistic:", t_statistic)
print("P-value:", p_value)

# Check if the null hypothesis should be rejected based on the p-value
if p_value < alpha:
    print("Reject the null hypothesis: The drug is effective.")
else:
    print("Fail to reject the null hypothesis: The drug has no significant effect.")

Hypothesis Testing Results:
Null Hypothesis: The drug has no effect.
Alternative Hypothesis: The drug is effective.
Significance Level (alpha): 0.05
T-statistic: 11.963133553428964
P-value: 7.904336215502891e-07
Reject the null hypothesis: The drug is effective.


`In hypothesis testing`, there's often a trade-off between Type 1 and Type 2 errors. Adjusting the significance level (α) can influence the probabilities of these errors. Decreasing the likelihood of one type of error typically increases the likelihood of the other. The balance between these errors is a critical consideration when designing experiments and interpreting their results.

--------------------------------------------------------------------------------------------------------------------

Q.No-04    Explain Bayes's theorem with an example.

Ans :-

**`Bayes's Theorem`** is a fundamental concept in probability theory and statistics. It describes how to update our beliefs or probabilities about an event when new evidence or information becomes available. The theorem is named after the Reverend Thomas Bayes, who introduced the idea of updating probabilities based on new data.

    Mathematically, Bayes's Theorem is stated as follows:

$$ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} $$

    Where:
- $ P(A|B) $ is the probability of event A occurring given that event B has occurred.
- $ P(B|A) $ is the probability of event B occurring given that event A has occurred.
- $ P(A) $ is the prior probability of event A occurring.
- $ P(B) $ is the prior probability of event B occurring.

    Now, let's illustrate Bayes's Theorem with a classic example known as the "diagnostic test" scenario:

Imagine you're a doctor and you have a patient who is exhibiting symptoms that could be associated with a rare disease. Let's call the disease "Disease X." The prevalence of Disease X in the general population is very low, say 1 in 1000 people (0.1%).

You have a diagnostic test for Disease X, but it's not perfect. The test has a sensitivity of 95%, meaning it correctly identifies 95% of the people who actually have Disease X. It also has a specificity of 90%, meaning it correctly identifies 90% of the people who do not have the disease.

`Now, let's use Bayes's Theorem to calculate the probability that a patient actually has Disease X given that the test comes back positive` (i.e., $ P(Disease\ X|Positive\ Test) $):

- $ P(Disease\ X) = 0.001 $ (prevalence of the disease in the population)
- $ P(Positive\ Test|Disease\ X) = 0.95 $ (sensitivity of the test)
- $ P(Positive\ Test|No\ Disease\ X) = 0.10 $ (1 - specificity of the test)
- $ P(No\ Disease\ X) = 1 - P(Disease\ X) = 0.999 $

`To calculate` $ P(Positive\ Test) $, `we use the law of total probability:`

$P(Positive\ Test) = P(Positive\ Test|Disease_X) * P(Disease\ X) + P(Positive\ Test|No\ Disease\ X * 
P(No\ Disease\ X))$

`Now plug these values into Bayes's Theorem:`

$ P(Disease\ X|Positive\ Test) = \frac{P(Positive\ Test|Disease\ X) \cdot P(Disease\ X)}{P(Positive\ Test)} $

    Substitute the values and calculate to get the result.

By plugging in the numbers, you can see how Bayes's Theorem allows you to update your belief about the likelihood of having the disease based on the test result. Even though the test is not perfect, the probability of having the disease increases significantly if the test comes back positive, due to the low prevalence of the disease in the population.

**`In Python`**

In [8]:
# Given data
p_disease_x = 0.001  # Prevalence of Disease X
p_positive_given_disease_x = 0.95  # Sensitivity of the test
p_positive_given_no_disease_x = 0.10  # 1 - Specificity of the test

# Calculate the complement probabilities
p_no_disease_x = 1 - p_disease_x

# Calculate the denominator of Bayes's Theorem
p_positive_test = (p_positive_given_disease_x * p_disease_x) + (p_positive_given_no_disease_x * p_no_disease_x)

# Calculate the posterior probability using Bayes's Theorem
p_disease_x_given_positive_test = (p_positive_given_disease_x * p_disease_x) / p_positive_test

# Conclusion based on the result
if p_disease_x_given_positive_test > 0.5:
    conclusion = "The probability of having Disease X is high given the positive test result."
else:
    conclusion = "The probability of having Disease X is low given the positive test result."

print("Given data:")
print(f"Prevalence of Disease X: {p_disease_x}")
print(f"Sensitivity of the test: {p_positive_given_disease_x}")
print(f"Specificity of the test: {1 - p_positive_given_no_disease_x}\n")

print("Calculations:")
print(f"Probability of not having Disease X: {p_no_disease_x}")
print(f"Probability of a positive test result: {p_positive_test}")
print(f"Probability of having Disease X given a positive test result: {p_disease_x_given_positive_test:.6f}\n")

print("Conclusion:")
print(conclusion)

Given data:
Prevalence of Disease X: 0.001
Sensitivity of the test: 0.95
Specificity of the test: 0.9

Calculations:
Probability of not having Disease X: 0.999
Probability of a positive test result: 0.10085000000000001
Probability of having Disease X given a positive test result: 0.009420

Conclusion:
The probability of having Disease X is low given the positive test result.


--------------------------------------------------------------------------------------------------------------------

Q.No-05    What is a confidence interval? How to calculate the confidence interval, explain with an example.

Ans :-

**`A confidence interval`** is a statistical range that provides an estimate of the true value of a population parameter, such as a mean or a proportion, along with a level of confidence in the accuracy of that estimate. It quantifies the uncertainty associated with estimating a population parameter based on a sample from that population. The confidence interval consists of two values: a lower bound and an upper bound, which define a range within which the true parameter value is likely to fall.

The level of confidence associated with the interval (usually expressed as a percentage) indicates the probability that the true parameter value lies within the calculated interval. For example, a 95% confidence interval implies that if we were to repeat the sampling and calculation process many times, about 95% of those intervals would contain the true parameter value.

    Calculating a confidence interval involves the following steps -

1. **`Collect a Sample`**: Gather a random sample from the population of interest.

2. **`Calculate the Sample Statistic`**: Compute the sample statistic (e.g., sample mean, sample proportion) that you want to estimate.

3. **`Determine the Confidence Level`**: Choose a desired level of confidence (e.g., 90%, 95%, 99%).

4. **`Find the Critical Value`**: Based on the chosen confidence level and the distribution of the sample statistic (usually a t-distribution or normal distribution), find the critical value that corresponds to the chosen confidence level.

5. **`Calculate the Margin of Error`**: The margin of error is calculated by multiplying the critical value by the standard error of the sample statistic. The standard error quantifies the variability of the sample statistic.

6. **`Calculate the Confidence Interval`**: The lower bound of the confidence interval is the sample statistic minus the margin of error, and the upper bound is the sample statistic plus the margin of error.

`Now, let's go through an example of calculating a confidence interval for the population mean -`

**Example :-**

Suppose we want to estimate the average height of a certain species of trees in a forest. We collect a random sample of 50 trees and measure their heights. The sample mean height is 180 cm, and the sample standard deviation is 12 cm.

1. **`Sample Statistic`** - Sample mean (x̄) = 180 cm
2. **`Sample Size`** ($n$) - 50
3. **`Desired Confidence Level`** - 95%

4. **`Critical Value`** - Since the sample size is relatively large (n > 30), we can use the standard normal distribution. For a 95% confidence level, the critical value is approximately 1.96 (consulting a z-table or using a statistical calculator).

5. **`Margin of Error`** -

   **Margin of Error** = Critical Value * (Sample Standard Deviation / √n)
   
   **Margin of Error** = 1.96 * (12 / √50) ≈ 3.38 cm

6. **`Confidence Interval`** -

   **Lower Bound** = Sample Mean - Margin of Error = 180 - 3.38 ≈ 176.62 cm
   
   **Upper Bound** = Sample Mean + Margin of Error = 180 + 3.38 ≈ 183.38 cm

So, with 95% confidence, we can say that the true average height of the trees in the forest is likely to fall within the range of approximately 176.62 cm to 183.38 cm.

In [9]:
import scipy.stats as stats
import math

print("Given Data :-")
# Sample statistics
sample_mean = 180  # Sample mean height
print("Sample Mean: ",sample_mean, "inches")

sample_stddev = 12  # Sample standard deviation
print("Std Dev. of Sample: ",sample_stddev)

sample_size = 50  # Sample size
print("Sample Size: ",sample_size)

# Desired confidence level
confidence_level = 0.95
print("Confidence Level: ",confidence_level*100,"%")


print("\nCalculation :-")
# Calculate the critical value (using the standard normal distribution)
critical_value = stats.norm.ppf((1 + confidence_level) / 2)
print("Critical Value: ",critical_value)

# Calculate the margin of error
margin_of_error = critical_value * (sample_stddev / math.sqrt(sample_size))
print('Margin Of Error:', margin_of_error, 'inches')

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

# Print the results
print("\nConfidence Interval :-")
print(f"Lower Bound: {lower_bound:.2f}")
print(f"Upper Bound: {upper_bound:.2f}")

Given Data :-
Sample Mean:  180 inches
Std Dev. of Sample:  12
Sample Size:  50
Confidence Level:  95.0 %

Calculation :-
Critical Value:  1.959963984540054
Margin Of Error: 3.3261691784392267 inches

Confidence Interval :-
Lower Bound: 176.67
Upper Bound: 183.33


--------------------------------------------------------------------------------------------------------------------

Q.No-06    Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

Ans :-

**`The formula Bayes' Theorem :-`**

$$ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} $$

    Where -
- $ P(A|B) $ is the posterior probability of event A occurring given evidence B.
- $ P(B|A) $ is the probability of observing evidence B given that event A has occurred.
- $ P(A) $ is the prior probability of event A occurring.
- $ P(B) $ is the probability of observing evidence B.

**`Sample Problem :-` Medical Test**

Suppose a certain medical condition affects 1% of the population. A diagnostic test for this condition has a sensitivity of 95% (correctly identifies the condition in 95% of cases) and a specificity of 90% (correctly identifies the absence of the condition in 90% of cases). If a person tests positive for the condition, what is the probability that they actually have the condition?

**`Solution :-`**

    Let's define the events -
- Event A: Having the medical condition.
- Event B: Testing positive for the condition.

    Given data -
- $ P(A) = 0.01 $ (1% of the population has the condition)
- $ P(B|A) = 0.95 $ (sensitivity - the probability of testing positive given that the person has the condition)
- $ P(\neg A) = 1 - P(A) = 0.99 $ (probability of not having the condition)
- $ P(B|\neg A) = 1 - 0.90 = 0.10 $ (specificity - the probability of testing positive given that the person does not have the condition)

We have to find $ P(A|B) $, the probability of having the condition given that the person tested positive.

    Using Bayes' Theorem -
    
$ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} $

`We have to calculate` $ P(B) $ `using the law of total probability -`

$ P(B) = P(B|A) \cdot P(A) + P(B|\neg A) \cdot P(\neg A) $

    Substitute the values -
    
$ P(B) = (0.95 \cdot 0.01) + (0.10 \cdot 0.99) = 0.1085 $

    Now, plug into Bayes' Theorem -

$ P(A|B) = \frac{0.95 \cdot 0.01}{0.1085} = 0.0875576036866$

So, if a person tests positive for the medical condition, the probability that they actually have the condition is approximately 8.75%.

In [10]:
# Given data
print("Given Data :-")

P_A = 0.01  # Probability of having the condition
print("\nP_A = ",P_A, "\n(",P_A*100,"% of the population has the condition)")

P_B_given_A = 0.95  # Sensitivity
print("\nP_B_given_A = ",P_B_given_A, "\n(sensitivity - the probability of testing positive given that the person has the condition)")

P_not_A = 1 - P_A  # Probability of not having the condition
print("\nP_not_A = ",P_not_A,"\n(probability of not having the condition)")

P_B_given_not_A = round((1- 0.90), 2)  # Specificity
print("\nP_B_given_not_A = ",P_B_given_not_A,"\n(specificity - the probability of testing positive given that the person does not have the condition)")

print("\n\nCalculation :-")
# Calculate P(B) using law of total probability
P_B = (P_B_given_A * P_A) + (P_B_given_not_A * P_not_A)
print("\nP_B = ",P_B)

# Calculate P(A|B) using Bayes' Theorem
P_A_given_B = (P_B_given_A * P_A) / P_B
print("\nP_A_given_B = ", P_A_given_B)


print("\n\nResult :-")
print("\nThe probability of having the condition given a positive test result is:", P_A_given_B*100,"%")

Given Data :-

P_A =  0.01 
( 1.0 % of the population has the condition)

P_B_given_A =  0.95 
(sensitivity - the probability of testing positive given that the person has the condition)

P_not_A =  0.99 
(probability of not having the condition)

P_B_given_not_A =  0.1 
(specificity - the probability of testing positive given that the person does not have the condition)


Calculation :-

P_B =  0.1085

P_A_given_B =  0.08755760368663594


Result :-

The probability of having the condition given a positive test result is: 8.755760368663594 %


--------------------------------------------------------------------------------------------------------------------

Q.No-07    Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5 Interpret the results.

Ans :-

In [11]:
import scipy.stats as stats

print("Given Data :-")
# Given data
sample_mean = 50
print("Sample Mean = ",sample_mean)

standard_deviation = 5
print("Std Dev. of Sample = ",standard_deviation)

confidence_level = 0.95  # 95% confidence level
print("Confidence Level = ",confidence_level)

print("Assuming a sample size (you need to provide this value)")
sample_size = 100
print("Sample Size = ",sample_size)

print("\nCalculation :-")
# Calculate the standard error
standard_error = standard_deviation / (sample_size ** 0.5)
print("Standard Error = ",standard_error)

# Calculate the margin of error using the Z-score for a 95% confidence level
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)
print("Z Score = ",z_score)
margin_of_error = z_score * standard_error
print("Margin Of Error = ",margin_of_error)

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

# Interpretation
print("Confidence Interval -")
print(f"Lower bound: {lower_bound:.2f}")
print(f"Upper bound: {upper_bound:.2f}")

print("\nConclusion :-")
print(f"I am {confidence_level*100}% confident that the true population mean is between {lower_bound:.2f} and {upper_bound:.2f}.")

Given Data :-
Sample Mean =  50
Std Dev. of Sample =  5
Confidence Level =  0.95
Assuming a sample size (you need to provide this value)
Sample Size =  100

Calculation :-
Standard Error =  0.5
Z Score =  1.959963984540054
Margin Of Error =  0.979981992270027
Confidence Interval -
Lower bound: 49.02
Upper bound: 50.98

Conclusion :-
I am 95.0% confident that the true population mean is between 49.02 and 50.98.


--------------------------------------------------------------------------------------------------------------------

Q.No-08    What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

Ans :-

**`The Margin of Error` (MoE)** in a **`confidence interval` (CI)** is a measure of the range within which we can reasonably expect the true population parameter to lie, based on a sample from that population. It quantifies the uncertainty associated with estimating a population parameter (such as a mean or proportion) from a sample. The margin of error is usually expressed as a positive value and is determined by the confidence level and the variability of the data.

The formula for **`The Margin of Error`** is often given by:

$ Margin\ of\ Error = Critical\ Value × Standard\ Error $

**1. `Critical Value :-`** This is determined by the desired confidence level. For instance, if you're constructing a 95% confidence interval, the critical value corresponds to the Z-score that leaves 2.5% in the tails of the distribution (since 95% - 2.5% = 92.5% falls within the confidence interval).

**2. `Standard Error :-`** This is a measure of the variability of the sample data. It's usually calculated based on the sample size and the standard deviation of the sample. The standard error essentially quantifies how much the sample mean (or other statistic) is expected to vary from the true population parameter.

**`The sample size directly affects the margin of error`**. As the sample size increases, the margin of error decreases. This is because larger sample sizes lead to more precise estimates of the population parameter. A larger sample size provides more information about the population's variability, which reduces the uncertainty in the estimation.

**`Example :-`**

Let's Conducting a survey to estimate the average income of a certain population. We Decide to use a 95% confidence level. We Collect data from two scenarios:

**Scenario A**: Sample size = 100

**Scenario B**: Sample size = 1000

`For both scenarios`,

We calculate the mean income and the standard deviation of the incomes in the sample. The standard error in each case is calculated as the standard deviation divided by the square root of the sample size.

`Given Data`,

**Confidence level** = 0.95

**Critical value** = Approximately 1.96

Let's Assuming that the true population mean income is $50,000, and the standard deviation of incomes is $10,000 in both scenarios.Then,

Mean of incomes for Scenario A & Scenario B = 50000
standard deviation of incomes for Scenario A & Scenario B = 10000

- Scenario A: The margin of error = 1.96 × (standard error) = 1.96 × ($10,000 / √100) = $1,960
- Scenario B: The margin of error = 1.96 × (standard error) = 1.96 × ($10,000 / √1000) = $619.61

As We can see that the margin of error in Scenario B is significantly smaller than in Scenario A due to the larger sample size. This means that we can be more confident that the true population mean income lies within a narrower range around our sample mean in Scenario B compared to Scenario A.

**In Python -**

In [12]:
import numpy as np
from scipy import stats

# Given
print("Given :-")
Sample_Size_of_Scenario_A = 100
print("Sample Size of Scenario A = ",Sample_Size_of_Scenario_A)

Sample_Size_of_Scenario_B = 1000
print("Sample Size of Scenario B = ",Sample_Size_of_Scenario_B)

Confidence_Level = 0.95
print("Confidence Level = ",Confidence_Level)


print("\nLet's Assuming The Population Mean and The Population Standard Deviation :-")
# Let's Assuming that the true population mean income 
Population_Mean = 50000
print("Population Mean = ",Population_Mean)

# Let's Assuming that the true population standard deviation income
Population_StdDev = 10000
print("Population Standard Deviation = ",Population_StdDev)


# Calculation
print("\nCalculation :-")
Critical_Value = stats.norm.ppf((1 + Confidence_Level) / 2)
print(f"Critical Value = {Critical_Value:.2f}")

# Scenario A
print("\nScenario A:")
Sample_Mean_of_Scenario_A = Population_Mean
print( "Sample Mean Of Scenario A = ", Sample_Mean_of_Scenario_A )
Sample_StdDev_of_Scenario_A = Population_StdDev / np.sqrt(Sample_Size_of_Scenario_A)
print ("Sample Std Dev Of Scenario A = ", Sample_StdDev_of_Scenario_A)
Margin_of_Error_Scenario_A = Critical_Value * Sample_StdDev_of_Scenario_A
print(f"Margin of Error Scenario A = {Margin_of_Error_Scenario_A:.2f}")
Confidence_Interval_Scenario_A = (Population_Mean - Margin_of_Error_Scenario_A, Population_Mean + Margin_of_Error_Scenario_A)
print(f"Confidence Interval = {Confidence_Interval_Scenario_A[0]:.2f} to {Confidence_Interval_Scenario_A[1]:.2f}")

# Scenario B
print("\nScenario B:")
Sample_Mean_of_Scenario_B = Population_Mean
print ("Sample Mean Of Scenario B = ", Sample_Mean_of_Scenario_B )
Sample_StdDev_of_Scenario_B = Population_StdDev / np.sqrt(Sample_Size_of_Scenario_B)
print ("Sample Std Dev Of Scenario B = ", Sample_StdDev_of_Scenario_B)
Margin_of_Error_Scenario_B = critical_value * Sample_StdDev_of_Scenario_B
print(f"Margin of Error Scenario B = {Margin_of_Error_Scenario_B:.2f}")
Confidence_Interval_Scenario_B = (Population_Mean - Margin_of_Error_Scenario_B, Population_Mean + Margin_of_Error_Scenario_B)
print(f"Confidence Interval = {Confidence_Interval_Scenario_B[0]:.2f} to {Confidence_Interval_Scenario_B[1]:.2f}")

# Conclusion
print("\nConclusion :-")
print(f"The Margin of Error in Scenario A '{Margin_of_Error_Scenario_A:.2f}' is significantly larger due to the smaller sample size '{Sample_Size_of_Scenario_A}'.")
print(f"The Margin of Error in Scenario A '{Margin_of_Error_Scenario_B:.2f}' is significantly larger due to the smaller sample size '{Sample_Size_of_Scenario_B}'.")


Given :-
Sample Size of Scenario A =  100
Sample Size of Scenario B =  1000
Confidence Level =  0.95

Let's Assuming The Population Mean and The Population Standard Deviation :-
Population Mean =  50000
Population Standard Deviation =  10000

Calculation :-
Critical Value = 1.96

Scenario A:
Sample Mean Of Scenario A =  50000
Sample Std Dev Of Scenario A =  1000.0
Margin of Error Scenario A = 1959.96
Confidence Interval = 48040.04 to 51959.96

Scenario B:
Sample Mean Of Scenario B =  50000
Sample Std Dev Of Scenario B =  316.22776601683796
Margin of Error Scenario B = 619.80
Confidence Interval = 49380.20 to 50619.80

Conclusion :-
The Margin of Error in Scenario A '1959.96' is significantly larger due to the smaller sample size '100'.
The Margin of Error in Scenario A '619.80' is significantly larger due to the smaller sample size '1000'.


-------------------------------------------------------------------------------------------------------------------------------------------------

q.No-09    Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

Ans :-

$$ The formula fo Z-Score $$

$$ z = \frac{x - \mu}{\sigma} $$

**`Where` :-**
- $ x $ is the data point value (75 in this case)
- $ \mu $ is the population mean (70 in this case)
- $ \sigma $ is the population standard deviation (5 in this case)
- $ z $ is the calculated z-score

**`Let's Calculate the Z-Score` :-**

$$ z = \frac{75 - 70}{5} = \frac{5}{5} = 1 $$

$$ z-score = 1 $$

In [13]:
data_point = 75
population_mean = 70
population_std_dev = 5

z_score = (data_point - population_mean) / population_std_dev

print("The calculated z-score :", z_score)

The calculated z-score : 1.0


**`Interpretation` :-**

A **Z-Score** of 1 indicates that the data point (75) is 1 **Standard Deviation** above the **Population Mean** (70). In other words, it is higher than the average value by one standard deviation. This suggests that the data point is relatively higher than the majority of the data points in the population. Z-scores are useful for understanding how individual data points compare to the overall distribution of the data in terms of standard deviations.

-------------------------------------------------------------------------------------------------------------------------------------------------

Q.No-10    In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

Ans :-

To conduct a hypothesis test using a t-test to determine if the weight loss drug is significantly effective, we need to set up the null and alternative hypotheses and calculate the t-statistic and p-value based on the sample data. The null hypothesis ($H_0$) usually states that there is no significant effect, while the alternative hypothesis ($H_1$) states that there is a significant effect.

**`Solution` :-**

**Null Hypothesis** ($H_0$) : The new weight loss drug is not significantly effective.

**Alternative Hypothesis** ($H_1$) : The new weight loss drug is significantly effective.

**`Given` :-**
-    Sample size = 50 
-    Sample mean = 6 pounds 
-    Standard deviation = 2.5 pounds

 we will use a t-test for a single sample mean. Since the population standard deviation is not known, we will use the t-distribution.

**The formula of the `t-statistic` :-**

$$ t = \frac{\text{sample mean} - \text{population mean}}{\text{sample standard error}} $$

**Where the formula of the `sample standard error` :-**

$$ \text{SE} = \frac{\text{sample standard deviation}}{\sqrt{\text{sample size}}} $$

**For a `95% confidence level`,** 

The critical t-value (two-tailed) with 49 degrees of freedom = $2.0096\ approximately$

**Let's calculate the `t-statistic` :-**

$$ t = \frac{6 - \text{population mean}}{\frac{2.5}{\sqrt{50}}} 

**Assuming the `population mean weight loss is 0` (since we're testing if the drug has an effect)**

**The `t-statistic` becomes :-**

$$ t = \frac{6 - 0}{\frac{2.5}{\sqrt{50}}} = \frac{6}{0.35355} \approx 16.9706 $$

**`Since`,**

The calculated t-statistic (16.9706) is much greater than the critical t-value (2.0096).

we can conclude that the weight loss drug is significantly effective at a 95% confidence level.

**`To verify this conclusion`,** 

we can calculate the p-value associated with this t-statistic. The p-value is the probability of observing a t-statistic as extreme as the one calculated under the null hypothesis. A very small p-value indicates strong evidence against the null hypothesis.

**`Using a t-distribution table or a statistical calculator`,** 

we find that the p-value is essentially 0 (much less than 0.0001). This extremely small p-value further supports the conclusion that the weight loss drug is significantly effective.

In summary, based on the calculated t-statistic and p-value, 

we reject the null hypothesis and conclude that the new weight loss drug is significantly effective at a 95% confidence level.

**`In Python` -**

In [14]:
import scipy.stats as stats

# Given data
sample_mean = 6
sample_std_dev = 2.5
sample_size = 50
population_mean = 0  # Assuming no effect in the population

# Calculate the t-statistic
sample_standard_error = sample_std_dev / (sample_size ** 0.5)
t_statistic = (sample_mean - population_mean) / sample_standard_error

# Degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the p-value
p_value = stats.t.sf(abs(t_statistic), df=degrees_of_freedom) * 2  # Two-tailed test

# 95% confidence level critical t-value
critical_t_value = stats.t.ppf(0.975, df=degrees_of_freedom)

# Print results
print(f"t-statistic: {t_statistic:.2f}")
print(f"p-value: {p_value}")
print(f"Critical t-value: {critical_t_value:.2f}")

# Compare the t-statistic with the critical t-value and the p-value with the significance level (e.g., 0.05)
if t_statistic > critical_t_value and p_value < 0.05:
    print("Reject the null hypothesis. The drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis. The drug is not significantly effective.")

t-statistic: 16.97
p-value: 3.7168840835270203e-22
Critical t-value: 2.01
Reject the null hypothesis. The drug is significantly effective.


-------------------------------------------------------------------------------------------------------------------------------------------------

Q.No-11    In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

Ans :-

To calculate the 95% confidence interval for the true proportion of people who are satisfied with their job, you can use the formula for the confidence interval of a proportion. The formula is :-

$$ \text{Confidence Interval} = \hat{p} \pm Z \times \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} $$

Where:
- $\hat{p}$ is the sample proportion (65% or 0.65 in decimal form).
- $Z$ is the critical z-score for a 95% confidence level. For a 95% confidence interval, $Z$ is approximately 1.96.
- $n$ is the sample size (500).

**Now let's put the values in the formula :-**

$\hat{p} = 0.65$

$Z = 1.96$ (for a 95% confidence level)

$n = 500$

$ \text{Confidence Interval} = 0.65 \pm 1.96 \times \sqrt{\frac{0.65 \times (1 - 0.65)}{500}} $

Now, plug this into the confidence interval formula :-

$ \text{Confidence Interval} = 0.65 \pm 1.96 \times \text{Standard Error} $

Calculate the standard error :-

$ \text{Standard Error} = \sqrt{\frac{0.65 \times 0.35}{500}} \approx 0.02 $

Now, plug this value into the confidence interval formula :-

$ \text{Confidence Interval} = 0.65 \pm 1.96 \times 0.02 $

Finally, calculate the lower and upper bounds of the confidence interval :-

$ Lower Bound : 0.65 - 1.96 \times 0.02 \approx 0.6108 $

$ Upper Bound : 0.65 + 1.96 \times 0.02 \approx 0.6892 $

So, the 95% confidence interval for the true proportion of people who are satisfied with their job is approximately 0.587 to 0.713, or in percentage form, 61.08% to 68.92%.

**`In Python` -**

In [15]:
import scipy.stats as stats
import math

# Given values
print("Given :-")
sample_proportion = 0.65
print("Sample Proportion = ",sample_proportion)
sample_size = 500
print("Sample Size = ",sample_size)
confidence_level = 0.95
print("Confidence Level = ",confidence_level)


print("\nCalculation :-")
# Calculate the standard error
standard_error = math.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)
print(f"Standard Error = {standard_error:.2f}")

# Calculate the critical z-score for the given confidence level
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)
print(f"Z-Score = {z_score:.2f}")

# Calculate the margin of error
margin_of_error = z_score * standard_error
print(f"Margin of Error = {margin_of_error:.2f}")

# Calculate the confidence interval
lower_bound = sample_proportion - margin_of_error
print(f"Lower Bound = {lower_bound:.2f}")
upper_bound = sample_proportion + margin_of_error
print(f"Upper Bound = {upper_bound:.2f}")

# Print the results
print("\nConclusion :-")

print(f"The {confidence_level*100}% confidence interval for the true proportion of people who are satisfied with their job is approximately {lower_bound:.2f} to {upper_bound:.2f}.")
print("Or, In Percentage form -\n 58.7% to 71.3%.")

Given :-
Sample Proportion =  0.65
Sample Size =  500
Confidence Level =  0.95

Calculation :-
Standard Error = 0.02
Z-Score = 1.96
Margin of Error = 0.04
Lower Bound = 0.61
Upper Bound = 0.69

Conclusion :-
The 95.0% confidence interval for the true proportion of people who are satisfied with their job is approximately 0.61 to 0.69.
Or, In Percentage form -
 58.7% to 71.3%.


Q.No-12    A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

Ans :-

In [16]:
import numpy as np
from scipy.stats import t

# Given data 
print("Given :-")

# Mean of Sample A
Mean_of_Sample_A = 85
print(f"Mean of Sample A = {Mean_of_Sample_A}")

# Std Dev. of Sample A
Std_of_Sample_A = 6
print(f'Standard Deviation of Sample A = {Std_of_Sample_A}')

# Mean of Sample B
Mean_of_Sample_B = 82
print(f"Mean of Sample B = {Mean_of_Sample_B}")

# Std Dev. of Sample A
Std_of_Sample_B = 5
print(f'Standard deviation of Sample B = {Std_of_Sample_B}')

# Set the significance level
alpha = 0.01
print(f"Significance Level = {alpha}")


# Calculation
print("\nCalculation :-")

# Calculate the Confidence Level
Confidence_level = 1 - alpha
print(f"Confidence Level = {Confidence_level}")

# Calculate the critical z-score for the given confidence level
z_score = stats.norm.ppf(1 - (1 - Confidence_level) / 2)
print(f"Z-Score = {z_score:.2f}")

# Calculate the Sample size
n = ((z_score**2)*((Std_of_Sample_A**2)+(Std_of_Sample_B**2)))/(Mean_of_Sample_A - Mean_of_Sample_B)**2
n_A = n
n_B = n
print(f"Sample Size of Sample A = {round(n_A)}")
print(f"Sample Size of Sample B = {round(n_B)}")

# Calculate the t-test statistic
pooled_std = np.sqrt(((n_A - 1) * Std_of_Sample_A ** 2 + (n_B - 1) * Std_of_Sample_B ** 2) / (n_A + n_B - 2))
t_statistic = (Mean_of_Sample_A - Mean_of_Sample_B) / (pooled_std * np.sqrt(1 / n_A + 1 / n_B))

# Degrees of freedom
df = n_A + n_B - 2
print(f"Degrees of freedom = {df}")

# Calculate the critical t-value
critical_t = t.ppf(1 - alpha / 2, df)
print(f"Critical t Value = {critical_t:.2f}")

# Conclusion
print("\nConclusion :-")
# Compare the t-statistic with the critical t-value
if np.abs(t_statistic) > critical_t:
    print("Reject the null hypothesis. There is a significant difference between the teaching methods.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference between the teaching methods.")

Given :-
Mean of Sample A = 85
Standard Deviation of Sample A = 6
Mean of Sample B = 82
Standard deviation of Sample B = 5
Significance Level = 0.01

Calculation :-
Confidence Level = 0.99
Z-Score = 2.58
Sample Size of Sample A = 45
Sample Size of Sample B = 45
Degrees of freedom = 87.93970948050978
Critical t Value = 2.63

Conclusion :-
Fail to reject the null hypothesis. There is no significant difference between the teaching methods.


-------------------------------------------------------------------------------------------------------------------------------------------------

Q.No-13    A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

Ans :-

To calculate the confidence interval for the true population mean, you can use the formula for a confidence interval for the mean when the population standard deviation is known:

$$ Confidence Interval = Sample Mean ± Z * (\frac{Population Standard Deviation}{\sqrt{Sample Size}}) $$

**`Given` :-**
- Sample Mean = 65 (given)
- Population Standard Deviation = 8 (given)
- Sample Size = 50 (given)
- Z = Z-score corresponding to the desired confidence level (90% in this case)

The Z-score for a 90% confidence level can be found using a standard normal distribution table or calculator. 

For a 90% confidence level, 

$Z-score \approx 1.645$

**`Now, put the values on the formula` :-**

$ Confidence\ Interval = 65 ± 1.645 * (8 / √50) $

**`Calculate the standard error` :-**

$ Standard Error = Population Standard Deviation / √Sample Size $

$ Standard Error = 8 / √50 ≈ 1.131 $

**`Calculate the margin of error` :-**

$ Margin of Error = Z * Standard Error $

$ Margin of Error = 1.645 * 1.131 ≈ 1.863 $

**`Now We can calculate the confidence interval` :-**

$ Lower Limit = Sample Mean - Margin of Error $

$ Lower Limit = 65 - 1.863 ≈ 63.137 $


$ Upper Limit = Sample Mean + Margin of Error $

$ Upper Limit = 65 + 1.863 ≈ 66.863 $

**`In Pyhton` -**

In [17]:
import scipy.stats as stats
import math

# Given values
sample_mean = 65
population_std_dev = 8
sample_size = 50
confidence_level = 0.90

# Calculate the Z-score for the given confidence level
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# Calculate the standard error
standard_error = population_std_dev / math.sqrt(sample_size)

# Calculate the margin of error
margin_of_error = z_score * standard_error

# Calculate the confidence interval
lower_limit = sample_mean - margin_of_error
upper_limit = sample_mean + margin_of_error

# Print the results
print("90% Confidence Interval:")
print(f"Lower Limit: {lower_limit:.2f}")
print(f"Upper Limit: {upper_limit:.2f}")

90% Confidence Interval:
Lower Limit: 63.14
Upper Limit: 66.86


The 90% confidence interval for **`the true population mean is approximately 63.14 to 66.86`**. This means we are 90% confident that the true population mean lies within this interval based on the given sample. 

--------------------------------------------------------------------------------------------------------------------

Q.No-14    In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

Ans :-

**Let's Define the `Null hypotheses` and `Alternative hypotheses` :-**

-    Null Hypothesis ($H_0$): Caffeine has no significant effect on reaction time. $μ = μ₀$ (where μ₀ is the population mean reaction time without caffeine).
-    Alternative Hypothesis ($H_1$): Caffeine has a significant effect on reaction time. $μ ≠ μ₀$.

Given:
- Sample mean $(x̄) = 0.25\ seconds$
- Sample standard deviation $(s) = 0.05\ seconds$
- Sample size $(n) = 30$
- Hypothesized population mean $(μ₀) =$ We don't have a specific value, but for the purpose of the t-test, let's assume $μ₀ = 0.24$ seconds (this is just an example value)

We can now calculate the t-test statistic:

$$ t = \frac{x̄ - μ₀}{\frac{s}{\sqrt{n}}} $$

$$ t = \frac{0.25 - 0.24}{\frac{0.05}{\sqrt{30}}} $$

$$ t = \frac{0.01}{\frac{0.05}{\sqrt{30}}} $$

$$ t = \frac{0.01}{\frac{0.05}{5.48}} $$

$$t ≈ 1.096$$

**Comparing the calculated `t-test statistic (1.096)` with the `critical t-value (1.70)` :-**

$$ |Calculated t-value| < |Critical t-value| $$

Since 1.096 < 1.70, we do not have enough evidence to reject the null hypothesis. In other words, at the 90% confidence level, we cannot conclude that caffeine has a significant effect on reaction time.

In [18]:
import scipy.stats as stats

# Given data
sample_mean = 0.25
sample_std = 0.05
sample_size = 30
confidence_level = 0.90

# Hypothesized population mean under the null hypothesis
null_mean = 0.24

# Calculate the t-test statistic
t_statistic = (sample_mean - null_mean) / (sample_std / (sample_size ** 0.5))
print(f"Calculated t-statistic: {t_statistic:.3f}")

# Calculate the critical t-value
alpha = 1 - confidence_level
degrees_of_freedom = sample_size - 1
critical_t = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)
print(f"Critical t-value: {critical_t:.2f}")

# Compare the t-test statistic with the critical t-value
if abs(t_statistic) > critical_t:
    print("Reject the null hypothesis: Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis: Caffeine does not have a significant effect on reaction time.")

Calculated t-statistic: 1.095
Critical t-value: 1.70
Fail to reject the null hypothesis: Caffeine does not have a significant effect on reaction time.


                                        END