## Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test. 

## Ans
------

| Aspect                | t-test                                        | z-test                                   |
|-----------------------|-----------------------------------------------|------------------------------------------|
| Type of data          | Usually used with small sample sizes (<30)    | Typically used with large sample sizes  |
| Population parameters | When population standard deviation is unknown | When population standard deviation is known |
| Distribution         | Student's t-distribution                     | Standard normal distribution            |
| Hypothesis Testing   | Comparing sample mean to population mean     | Comparing sample mean to population mean |

Example Scenarios:

1. **t-test:**
   Scenario: A psychologist wants to compare the average anxiety levels of two groups of participants who underwent different therapeutic treatments for anxiety disorders. The sample sizes are small (n < 30).
   
2. **z-test:**
   Scenario: An economist wants to determine whether the average income of employees in a large company has changed significantly over the last decade. The company has a large number of employees, and historical income data is available, including the population standard deviation.

## Q2: Differentiate between one-tailed and two-tailed tests.

## Ans
-------


| Aspect                   | One-Tailed Test                                       | Two-Tailed Test                            |
|--------------------------|-------------------------------------------------------|--------------------------------------------|
| Hypothesis direction     | Focuses on a specific direction of difference        | Tests for differences in any direction    |
| Null Hypothesis (H0)     | Example: H0: μ1 = μ2 (equal)                         | Example: H0: μ = μ0 (equal)               |
| Alternative Hypothesis   | Example: Ha: μ1 > μ2 (greater than)                 | Example: Ha: μ ≠ μ0 (not equal)          |
| Significance comparison  | Compares test statistic to upper critical value      | Compares test statistic to both critical values |
| Critical Region          | One-sided, either in the upper or lower tail        | Two-sided, split between both tails     |
| Critical Values          | Use a single critical value                        | Use two critical values                   |
| p-value Interpretation  | p-value represents the probability in one tail      | p-value represents both tail probabilities |

Example scenarios:

One-Tailed Test:
Scenario: A new drug is expected to decrease blood pressure. You want to test if the drug has a significant effect in reducing blood pressure. Here, you are interested only in the "less than" direction, as an increase in blood pressure is not relevant.

Two-Tailed Test:
Scenario: A coin is suspected of being biased. You want to test if the coin is fair (equally likely to land heads or tails). Here, you are interested in both directions of the effect: "greater than" or "less than," as you want to see if there's any evidence of bias in either direction.


## Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

## ANS
-------

| Error Type          | Type 1 Error                                      | Type 2 Error                                   |
|---------------------|---------------------------------------------------|------------------------------------------------|
| Definition          | Occurs when you reject a true null hypothesis.    | Occurs when you fail to reject a false null hypothesis. |
| Also Known As       | False Positive                                    | False Negative                                |
| Significance Level | α (alpha), the probability of making this error, typically set by the researcher (e.g., 0.05). | β (beta), the probability of making this error, not directly set by the researcher. |
| Example Scenario    | **Type 1 Error (False Positive):** You conclude that a new drug is effective (reject the null hypothesis) when it actually has no effect (null hypothesis is true). | **Type 2 Error (False Negative):** You conclude that a new drug is not effective (fail to reject the null hypothesis) when it actually is effective (null hypothesis is false). |
| Consequences        | May lead to unnecessary costs or risks if you act on incorrect information. | May lead to missed opportunities or failure to detect real effects. |
| Controlling         | Controlled by setting a lower significance level (α) but increases the risk of Type 2 errors. | Controlled by increasing the sample size, which reduces the risk of Type 2 errors but can be resource-intensive. |



## Q4: Explain Bayes's theorem with an example.

## Ans
------
Bayes's theorem is a fundamental concept in probability theory and statistics that allows us to update our beliefs about an event based on new evidence. It provides a way to calculate conditional probabilities, which are the probabilities of an event occurring given that another event has already occurred. Bayes's theorem is especially useful in situations where we have prior knowledge or beliefs about the event we are interested in.

The formula for Bayes's theorem is as follows:

$[P(A|B) = \frac{P(B|A) * P(A)}{P(B)}]$

Where:
- \(P(A|B)\) is the probability of event A occurring given that event B has occurred (the conditional probability).
- \(P(B|A)\) is the probability of event B occurring given that event A has occurred.
- \(P(A)\) is the prior probability of event A occurring before we have any knowledge of event B.
- \(P(B)\) is the prior probability of event B occurring before we have any knowledge of event A.

Let's illustrate Bayes's theorem with an example:

**Example: Medical Diagnosis**

Suppose you are a doctor and you are trying to diagnose whether a patient has a rare disease (Disease X). You know the following probabilities:

- The probability that a person has Disease X (prior probability), \(P(Disease X) = 0.01\) (1% of the population has the disease).
- The probability of a positive test result given that a person has Disease X, \(P(Positive Test | Disease X) = 0.95\) (95% true positive rate).
- The probability of a positive test result given that a person does not have Disease X, \(P(Positive Test | No Disease X) = 0.10\) (10% false positive rate).

Now, a patient comes to you with a positive test result (B), and you want to calculate the probability that they actually have Disease X (A), which is \(P(Disease X | Positive Test)\).

Using Bayes's theorem:

$[P(Disease X | Positive Test) = \frac{P(Positive Test | Disease X) * P(Disease X)}{P(Positive Test)}]$

We need to calculate \(P(Positive Test)\), which can be found using the law of total probability:

$[P(Positive Test) = P(Positive Test | Disease X) * P(Disease X) + P(Positive Test | No Disease X) * P(No Disease X)]$

Substituting the known values:

$[P(Positive Test) = (0.95 * 0.01) + (0.10 * 0.99) = 0.0095 + 0.099 = 0.1085]$

Now, we can calculate \(P(Disease X | Positive Test)\):

$[P(Disease X | Positive Test) = \frac{0.95 * 0.01}{0.1085} ≈ 0.0876]$

So, given a positive test result, the probability that the patient actually has Disease X is approximately 8.76%. Bayes's theorem allows us to update our initial belief (prior probability) based on new evidence (the positive test result).


## Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

## Ans
--------
A confidence interval is a range of values calculated from sample data that is likely to contain the true population parameter of interest, such as a population mean or proportion, with a certain level of confidence. In other words, it provides an interval estimate for the population parameter along with a statement about the level of confidence in the estimate.

A typical confidence interval is expressed as:

$[ \text{Point Estimate} \pm \text{Margin of Error} ]$

- The "Point Estimate" is the sample statistic that serves as our best guess for the population parameter. For example, the sample mean (x̄) is often used as a point estimate for the population mean (μ).

- The "Margin of Error" is a range that accounts for the variability and uncertainty in the sample data. It is typically calculated using a critical value from a probability distribution (e.g., the t-distribution or z-distribution) and the standard error of the sample statistic.

To calculate a confidence interval, follow these general steps:

1. Choose a confidence level (e.g., 95%, 99%, etc.). This represents the probability that the interval contains the true population parameter.

2. Determine the appropriate probability distribution and critical value(s) associated with the chosen confidence level. For example, for a 95% confidence interval with a normal distribution, you might use a critical value of 1.96.

3. Calculate the point estimate using your sample data. For example, if you are estimating the population mean, calculate the sample mean.

4. Calculate the standard error of the sample statistic, which depends on the sample size and the population parameter you are estimating.

5. Calculate the margin of error by multiplying the standard error by the critical value from step 2.

6. Construct the confidence interval by adding and subtracting the margin of error from the point estimate.

Here's an example:

Suppose you want to estimate the average height of adult males in a certain city. You take a random sample of 100 adult males and measure their heights. The sample mean height is 175 cm, and the sample standard deviation is 6 cm. You want to calculate a 95% confidence interval for the average height.

1. Choose a 95% confidence level.

2. Because you have a sample size of 100, you can approximate the sampling distribution of the sample mean as a normal distribution. The critical value for a 95% confidence interval with a normal distribution is 1.96 (you can find this value from a standard normal distribution table or calculator).

3. The point estimate is the sample mean, which is 175 cm.

4. Calculate the standard error: $(SE = \frac{\text{Sample Standard Deviation}}{\sqrt{\text{Sample Size}}} = \frac{6}{\sqrt{100}} = 0.6)$.

5. Calculate the margin of error: $(ME = \text{Critical Value} \times SE = 1.96 \times 0.6 = 1.176)$.

6. Construct the 95% confidence interval: \(175 \pm 1.176\), which gives you the interval (173.824, 176.176).

This means you can be 95% confident that the true average height of adult males in the city falls within the range of 173.824 cm to 176.176 cm based on your sample data.

## Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

## Ans
-----
Certainly! Let's use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence.

**Bayes' Theorem** is expressed as:

$[ P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)} ]$

Where:
- $( P(A | B) )$ is the probability of event A occurring given that event B has occurred (the conditional probability).
- $( P(B | A) )$ is the probability of event B occurring given that event A has occurred.
- $( P(A) )$ is the prior probability of event A occurring before we have any knowledge of event B.
- $( P(B) )$ is the prior probability of event B occurring before we have any knowledge of event A.

**Example Problem:**

Suppose we have a deck of cards, and we want to find the probability of drawing a red card (Event A) given that the card drawn is a face card (Event B). We know the following probabilities:

- \( P(A) \): The prior probability of drawing a red card from a standard deck of 52 cards is $( \frac{26}{52} = \frac{1}{2} )$ because there are 26 red cards (hearts and diamonds) out of 52.
- \( P(B | A) \): The probability of drawing a face card given that the card is red is $( \frac{6}{26} )$ because there are 6 face cards (king, queen, and jack of hearts and diamonds) among the 26 red cards.
- $( P(B | \neg A) )$: The probability of drawing a face card given that the card is not red is $( \frac{6}{26} )$ because there are 6 face cards (king, queen, and jack of spades and clubs) among the 26 non-red cards.

We want to calculate \( P(A | B) \), which is the probability of drawing a red card given that the card drawn is a face card.

**Solution:**

Using Bayes' Theorem:

$[ P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)} ]$

We need to calculate \( P(B) \), which can be found using the law of total probability:

$[ P(B) = P(B | A) \cdot P(A) + P(B | \neg A) \cdot P(\neg A) ]$

$( P(\neg A) )$ is the probability of not drawing a red card, which is $( 1 - P(A) = \frac{1}{2} )$ since there are 26 black cards (spades and clubs) out of 52.

Now, we can calculate \( P(B) \):

$[ P(B) = \left(\frac{6}{26} \cdot \frac{1}{2}\right) + \left(\frac{6}{26} \cdot \frac{1}{2}\right) = \frac{1}{26} + \frac{1}{26} = \frac{1}{13} ]$

Finally, we can calculate \( P(A | B) \):

$[ P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)} = \frac{\frac{6}{26} \cdot \frac{1}{2}}{\frac{1}{13}} = \frac{6}{26} \cdot \frac{1}{2} \cdot \frac{13}{1} = \frac{39}{26} \cdot \frac{1}{2} = \frac{39}{52} = \frac{3}{4} ]$


So, the probability of drawing a red card given that the card drawn is a face card is \( \frac{3}{4} \) or 75%.

## Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

In [6]:
import scipy.stats as stats

# Sample statistics
sample_mean = 50
sample_stddev = 5
sample_size = 25

# Confidence level
confidence_level = 0.95

# Calculate the standard error (SE)
standard_error = sample_stddev / (sample_size**0.5)

# Calculate the margin of error (MOE)
margin_of_error = stats.t.ppf((1 + confidence_level) / 2, df=sample_size-1) * standard_error

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"95% Confidence Interval: ({lower_bound}, {upper_bound})")


95% Confidence Interval: (47.93610143837198, 52.06389856162802)


## Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

## Ans 
------
The margin of error (MOE) in a confidence interval is a measure of the precision or uncertainty associated with the estimate of a population parameter (e.g., mean or proportion) based on a sample. It quantifies the range within which we expect the true population parameter to fall with a certain level of confidence. In other words, it represents the "wiggle room" around the point estimate.

The key points about the margin of error are:

1. **Larger margin of error:** A larger margin of error implies greater uncertainty in the estimate. It means that the range around the point estimate is wider, and we are less confident about the location of the true population parameter.

2. **Smaller margin of error:** Conversely, a smaller margin of error indicates higher precision and less uncertainty. The range around the point estimate is narrower, and we have greater confidence that the true population parameter is closer to the point estimate.

**Sample Size's Effect on Margin of Error:**

Sample size has a significant impact on the margin of error. As the sample size increases:

- The margin of error decreases, leading to a more precise estimate.
- The range around the point estimate narrows, meaning that we have more confidence in the accuracy of the estimate.
- The sampling distribution of the sample statistic (e.g., sample mean) becomes more concentrated around the true population parameter.

**Example Scenario:**

Let's consider an example where a larger sample size would result in a smaller margin of error:

**Scenario:** Suppose you want to estimate the average time it takes for customers to complete a specific task on your website. You take two samples, one with a sample size of 50 and another with a sample size of 500, and calculate 95% confidence intervals for both.

- For the sample with a size of 50, you find a 95% confidence interval of (30 seconds, 40 seconds) with a margin of error of 5 seconds.
- For the sample with a size of 500, you find a 95% confidence interval of (32 seconds, 38 seconds) with a margin of error of 3 seconds.

In this scenario, the larger sample size of 500 resulted in a smaller margin of error (3 seconds) compared to the smaller sample size of 50 (5 seconds). This indicates that the estimate of the average time taken by customers is more precise with the larger sample size, and you can be more confident about the accuracy of the estimate.

## Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

## Ans
-------
Interpretation:
A z-score of 1 means that the data point (75) is 1 standard deviation above the mean (70) in the population. In other words, it tells you how far the data point is from the average (mean) in terms of standard deviations. A positive z-score indicates that the data point is above the mean, while a negative z-score would indicate that it's below the mean.


In [11]:
# Define the values

x = 75  # Data point value
mu = 70  # Population mean
sigma = 5  # Population standard deviation

# Calculate the z-score
z = (x - mu) / sigma
print(f"The z-score is: {z}")


from scipy.stats import norm

# Calculate the percentage of values below a z-score of 1
percent_below = norm.cdf(z) * 100

print(f"The percentage of values below a z-score of 1 is: {percent_below:.2f}%")

The z-score is: 1.0
The percentage of values below a z-score of 1 is: 84.13%


## Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

**Step 1: Define the Null Hypothesis (H0) and Alternative Hypothesis (H1):**

- Null Hypothesis (H0): The new weight loss drug is not significantly effective; it does not lead to a significant reduction in weight. In mathematical terms, H0: μ = μ0 (where μ0 represents the population mean weight loss with no effect, typically μ0 = 0 in this context).

- Alternative Hypothesis (H1): The new weight loss drug is significantly effective; it leads to a significant reduction in weight. In mathematical terms, H1: μ ≠ μ0 (indicating a two-tailed test because we're interested in whether the drug has any effect, either positive or negative).

**Step 2: Set the Significance Level (α):**

You've mentioned a 95% confidence level, so the significance level (α) is 0.05. This means you want 95% confidence in your results, leaving 5% for the possibility of making a Type I error (rejecting the null hypothesis when it's true).

**Step 3: Collect Data and Calculate Test Statistic:**

You already have the sample data:
- Sample Mean (x̄) = 6 pounds
- Sample Standard Deviation (s) = 2.5 pounds
- Sample Size (n) = 50

Now, calculate the test statistic (t-statistic) using the formula for a one-sample t-test:

$[ t = \frac{x̄ - μ0}{(s / √n)} ]$

Substitute the values:

$[ t = \frac{6 - 0}{(2.5 / √50)} = \frac{6}{(2.5 / 7.07)} ≈ 16.9706]$

**Step 4: Calculate the Degrees of Freedom:**

For a one-sample t-test, degrees of freedom (df) is equal to (n - 1). In this case, df = 50 - 1 = 49.

**Step 5: Find the Critical Value:**

With a significance level of 0.05 and 49 degrees of freedom, you can find the critical t-value from a t-table or calculator. For a two-tailed test, you'll find the critical values at the 0.025 and 0.975 percentiles.

In a standard t-table, the critical t-values for a 95% confidence interval with 49 degrees of freedom are approximately -2.0096 and 2.0096.

**Step 6: Compare the Test Statistic and Critical Value:**

Compare the calculated t-statistic (16.9706) with the critical t-values (-2.0096 and 2.0096). Since the calculated t-statistic is much larger than the critical values, you reject the null hypothesis.

**Step 7: Make a Conclusion:**

Based on the test results, you can conclude that the new weight loss drug is significantly effective at a 95% confidence level. The sample data provides strong evidence that the drug leads to a significant reduction in weight.

In [13]:
import numpy as np
from scipy.stats import t

# Null hypothesis: the drug is not significantly effective
# Alternative hypothesis: the drug is significantly effective
alpha = 0.05  # significance level
null_hypothesis = "the drug is not significantly effective"
alternative_hypothesis = "the drug is significantly effective"

mu = 0
sample_mean = 6
sample_std = 2.5
n = 50
df = n - 1  # degrees of freedom
l = np.sqrt(n)

# Calculate the t-score and p-value
t_score = (sample_mean - mu) / ((sample_std / l))
p_value = 2 * (1 - t.cdf(abs(t_score), df))

# Compare p-value with alpha and make a conclusion
print(f"t-statistic: {t_score:.4f}")
print(f"p-value: {p_value}")
if p_value < alpha:
    print(f"Reject the null hypothesis. {alternative_hypothesis}.")
else:
    print(f"Fail to reject the null hypothesis. {null_hypothesis}.")


t-statistic: 16.9706
p-value: 0.0
Reject the null hypothesis. the drug is significantly effective.


## Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

## Ans
-----
To calculate the 95% confidence interval for the true proportion of people who are satisfied with their job, you can use the following formula for the confidence interval of a population proportion:

$[ \text{Confidence Interval} = \hat{p} \pm Z \cdot \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} ]$

Where:
- $ (\hat{p})$ is the sample proportion (in this case, 65% or 0.65).
- $ (Z)$ is the critical value for the desired confidence level (for a 95% confidence interval, \(Z\) is approximately 1.96, which corresponds to the standard normal distribution).
- $ (n)$ is the sample size (500 in this case).

Now, let's calculate the confidence interval:

$[ \text{Confidence Interval} = 0.65 \pm 1.96 \cdot \sqrt{\frac{0.65 \cdot (1 - 0.65)}{500}} ]$

Calculations:
- $( \sqrt{\frac{0.65 \cdot (1 - 0.65)}{500}} )$ is approximately 0.608191(rounded to four decimal places).

Now, plug in the values:

$[ \text{Confidence Interval} = 0.65 \pm 1.96 \cdot 0.608191 ]$

Now, calculate the upper and lower bounds of the confidence interval:

Lower Bound:
$[ 0.65 - 1.96 \cdot 0.608191 \approx 0.65 - 0.04180 = 60.82 ]$


Upper Bound:
$[ 0.65 + 1.96 \cdot 0.608191 \approx 0.65 + 0.04180 = 69.18 ]$

So, the 95% confidence interval for the true proportion of people who are satisfied with their job is approximately 60.82%, 69.18%. This means that we can be 95% confident that the true proportion falls within this range based on the survey results.

In [1]:
import math

p = 0.65  # sample proportion
n = 500  # sample size
z_alpha = 1.96  # z-score for 95% confidence level

# Calculate the standard error
se = math.sqrt((p * (1 - p)) / n)

# Calculate the margin of error
me = z_alpha * se

# Calculate the confidence interval
lower_bound = p - me
upper_bound = p + me

# Print the results
print(f"The 95% confidence interval for the proportion of people who are satisfied with their job is ({lower_bound*100:.2f}%, {upper_bound*100:.2f}%)")


The 95% confidence interval for the proportion of people who are satisfied with their job is (60.82%, 69.18%)


## Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

## Ans
---------

**Step 1: Define the Null Hypothesis (H0) and Alternative Hypothesis (H1):**

- Null Hypothesis (H0):  H0: μA - μB = 0, where μA represents the population mean for Sample A and μB represents the population mean for Sample B.

- Alternative Hypothesis (H1): H1: μA - μB ≠ 0, indicating a two-tailed test because you want to test if there is any difference, either positive or negative.

**Step 2: Set the Significance Level (α):**

significance level of 0.01, so α = 0.01. This represents the probability of making a Type I error (rejecting the null hypothesis when it's true).

**Step 3: Collect Data and Calculate Test Statistic:**

- Sample A: 
  - Mean (x̄A) = 85
  - Standard Deviation (sA) = 6
- Sample B:
  - Mean (x̄B) = 82
  - Standard Deviation (sB) = 5

let's assume that both samples have the same size, nA = nB = 30.

Now, calculate the test statistic (t-statistic) using the formula for an independent two-sample t-test:

$[ t = \frac{(x̄A - x̄B)}{\sqrt{\frac{sA^2}{nA} + \frac{sB^2}{nB}}} ]$

Substitute the values:

$[ t = \frac{(85 - 82)}{\sqrt{\frac{6^2}{30} + \frac{5^2}{30}}} ]$

Now, calculate the t-statistic.

**Step 4: Find the Degrees of Freedom and Critical Value:**

Degrees of Freedom:
\[ df = nA + nB - 2 = 30 + 30 - 2 = 58 \]

Critical Value:

Since you want a significance level of 0.01 and it's a two-tailed test, you need to find the critical t-value .
for α/2 = 0.005 and df = 58. You can use a t-table or calculator to find this critical value.

**Step 5: Compare the Test Statistic and Critical Value:**

Compare the calculated t-statistic with the critical t-value. If the absolute value of the t-statistic is greater than the critical t-value, you can reject the null hypothesis. t-statistic: 2.104 > Critical t-value: 2.663  

**Step 6: Make a Conclusion:**

Based on the test results, make a conclusion about whether there is a significant difference in student performance between the two teaching methods.


In [5]:
import numpy as np
from scipy.stats import t

# Sample A
x1 = 85
s1 = 6
n1 = 30

# Sample B
x2 = 82
s2 = 5
n2 = 30

# Set up null and alternative hypotheses
# H0: mu1 = mu2 (the means are equal)
# Ha: mu1 != mu2 (the means are not equal)
alpha = 0.01
null_hypothesis = "mu1 = mu2 (the means are equal)"
alternative_hypothesis = "mu1 != mu2 (the means are not equal)"

# Calculate pooled standard deviation
sp = np.sqrt(((n1 - 1) * s1 ** 2 + (n2 - 1) * s2 ** 2) / (n1 + n2 - 2))
# Calculate t-statistic
t_stat = (x1 - x2) / (sp * np.sqrt(1 / n1 + 1 / n2))

# Calculate degrees of freedom
df = n1 + n2 - 2

# Calculate critical t-value
t_crit = t.ppf(1-alpha / 2, df)

# Calculate p-value
p_value = 2 * (1 - t.cdf(abs(t_stat), df))

# Print results
print(f"t-statistic: {t_stat:.3f}")
print(f"Degrees of freedom: {df}")
print(f"Critical t-value: {t_crit:.3f}")
print(f"p-value: {p_value:.3f}")

if abs(t_stat) > t_crit:
    print("Reject null hypothesis.")
    print(alternative_hypothesis)
else:
    print("Fail to reject null hypothesis.")
    print(null_hypothesis)

t-statistic: 2.104
Degrees of freedom: 58
Critical t-value: 2.663
p-value: 0.040
Fail to reject null hypothesis.
mu1 = mu2 (the means are equal)


### Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

## Ans
-----
$[ \text{Confidence Interval} = \bar{X} \pm Z \cdot \frac{\sigma}{\sqrt{n}} ]$

Where:
- $(\bar{X})$ is the sample mean (65 in this case).
- \(Z\) is the critical value corresponding to the desired confidence level (90% or 0.90).
- $(\sigma)$ is the population standard deviation (8 in this case).
- \(n\) is the sample size (50 in this case).

**Step 1: Find the Critical Value (Z) for a 90% Confidence Interval:**

You can find the critical value (Z) from the standard normal distribution table or using a calculator. For a 90% confidence interval, the critical value is approximately 1.645.

**Step 2: Plug in the Values and Calculate:**

Now, you can plug in the values into the formula and calculate the confidence interval:

$[ \text{Confidence Interval} = 65 \pm 1.645 \cdot \frac{8}{\sqrt{50}} ]$

Calculations:
- $( \frac{8}{\sqrt{50}} )$ is approximately 1.13137084966 (rounded to several decimal places).

Now, calculate the upper and lower bounds of the confidence interval:

Lower Bound:
$[ 65 - 1.645 \cdot 1.13137084966 \approx 65 - 1.86473291 \approx 63.13526709 ]$

Upper Bound:
$[ 65 + 1.645 \cdot 1.13137084966 \approx 65 + 1.86473291 \approx 66.86473291 ]$

So, the 90% confidence interval for the true population mean is approximately (63.14, 66.87).

In [8]:
import math
from scipy.stats import t

pop_mean = 60
pop_std = 8
n = 50
sample_mean = 65
confidence_level = 0.90

# Calculate the t-score
t_score = (sample_mean - pop_mean) / (pop_std / math.sqrt(n))

# Find the critical t-value
df = n - 1
t_crit = t.ppf((1+confidence_level)/2, df)
print(f't statistic for {n} samples with {confidence_level*100:.0f}% confidence is : {t_crit:.4f}')

# Calculate the margin of error
margin_of_error = t_crit * (pop_std / math.sqrt(n))

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

# Print the results
print(f"The {confidence_level*100:.0f}% confidence interval for the true population mean is ({lower_bound:.3f}, {upper_bound:.3f})")


t statistic for 50 samples with 90% confidence is : 1.6766
The 90% confidence interval for the true population mean is (63.103, 66.897)


# Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

## Ans
------

**Step 1: Define the Null Hypothesis (H0) and Alternative Hypothesis (H1):**

- Null Hypothesis (H0): Caffeine has no significant effect on reaction time; the mean reaction time with caffeine ($(\mu$)) is equal to the mean reaction time without caffeine ($(mu_0$)). In mathematical terms, H0: $(\mu = \mu_0)$.

- Alternative Hypothesis (H1): Caffeine has a significant effect on reaction time; the mean reaction time with caffeine ($(\mu$)) is not equal to the mean reaction time without caffeine ($(mu_0$)). In mathematical terms, H1: $(\mu \neq \mu_0)$.

This is a two-tailed test because you want to test if there is any difference, either positive or negative.

**Step 2: Set the Significance Level (\(\alpha\)):**

You've mentioned a significance level of 90%, so $(\alpha = 1 - 0.90 = 0.10)$. This represents the probability of making a Type I error (rejecting the null hypothesis when it's true).

**Step 3: Collect Data and Calculate Test Statistic:**

You have the following data for the sample:
- Sample Mean ($(\bar{X}$)) = 0.25 seconds
- Sample Standard Deviation ($(s$)) = 0.05 seconds
- Sample Size ($(n)$) = 30

Now, calculate the test statistic (t-statistic) using the formula for a one-sample t-test:

$[ t = \frac{(\bar{X} - \mu_0)}{\frac{s}{\sqrt{n}}} ]$

Substitute the values:

$[ t = \frac{(0.25 - \mu_0)}{\frac{0.05}{\sqrt{30}}} ]$

**Step 4: Find the Degrees of Freedom and Critical Value:**

For a one-sample t-test, degrees of freedom (df) is equal to ($(n - 1)$). In this case, df = 30 - 1 = 29.

Next, you need to find the critical t-value(s) based on the significance level ($(\alpha/2$)) and degrees of freedom (df). Since it's a two-tailed test, you'll find the critical values for $(\alpha/2 = 0.10/2 = 0.05$) in both tails. You can use a t-table or calculator to find these critical values.

**Step 5: Compare the Test Statistic and Critical Values:**

Compare the calculated t-statistic with the critical t-values. If the absolute value of the t-statistic is greater than the critical t-value(s), you can reject the null hypothesis.

**Step 6: Make a Conclusion:**

Based on the test results, make a conclusion about whether caffeine has a significant effect on reaction time.



In [11]:
import math
import scipy.stats as stats


null_hypothesis = "Caffeine has no significant effect on reaction time"
alternative_hypothesis = "Caffeine has a significant effect on reaction time"

sample_mean = 0.25     # sample mean
sample_std_dev = 0.05  # sample standard deviation
n = 30                 # sample size
pop_mean = 0.3         # population mean under null hypothesis
alpha = 0.1            # significance level

# Calculate the t-statistic
t_stat = (sample_mean - pop_mean) / (sample_std_dev / math.sqrt(n))

# Calculate the critical t-value
t_crit = stats.t.ppf(1-alpha/2, n-1)

# Calculate the confidence interval
margin_of_error = t_crit * (sample_std_dev / math.sqrt(n))
lower_ci = sample_mean - margin_of_error
upper_ci = sample_mean + margin_of_error

# Print the results
print(f"t-statistic: {t_stat:.3f}")
print(f"t-critical value: {t_crit:.3f}")
print(f"90% Confidence Interval: ({lower_ci:.3f}, {upper_ci:.3f})")

# Determine if the null hypothesis should be rejected or not
if abs(t_stat) > t_crit:
    print("Reject the null hypothesis")
    print(alternative_hypothesis)
else:
    print("Fail to reject the null hypothesis")
    print(null_hypothesis)


t-statistic: -5.477
t-critical value: 1.699
90% Confidence Interval: (0.234, 0.266)
Reject the null hypothesis
Caffeine has a significant effect on reaction time
