---
#**Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.**

- **T-test:** A t-test is used when the sample size is small (typically less than 30) or when the population standard deviation is unknown. It compares the means of two samples to determine if they are significantly different from each other.
  Example: Comparing the exam scores of two groups of students (one group taught with method A and another with method B).

- **Z-test:** A z-test is used when the sample size is large (typically greater than 30) and the population standard deviation is known. It compares a sample mean to a population mean to determine if there is a significant difference.
  Example: Testing whether the average height of a sample of individuals is significantly different from the population average height.

---
#**Q2: Differentiate between one-tailed and two-tailed tests.**

- **One-tailed test:** A one-tailed test is used when the direction of the difference or effect is specified before conducting the test. It examines if the sample statistic falls significantly above or below a specified value in a single direction.
- **Two-tailed test:** A two-tailed test is used when the direction of the difference or effect is not specified before conducting the test. It examines if the sample statistic falls significantly different from a specified value in either direction.

---
#**Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.**

- **Type 1 error:** Type 1 error occurs when the null hypothesis is rejected when it is actually true. It represents a false positive result.
  Example: Concluding that a new drug is effective (rejecting the null hypothesis) when it actually has no effect.

- **Type 2 error:** Type 2 error occurs when the null hypothesis is not rejected when it is actually false. It represents a false negative result.
  Example: Failing to conclude that a new drug is effective (not rejecting the null hypothesis) when it actually has a positive effect.

---
#**Q4: Explain Bayes's theorem with an example.**

Bayes's theorem calculates the probability of an event occurring given prior knowledge of conditions related to the event. It is expressed as:
\[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} \]
Where:
- \( P(A|B) \) is the probability of event A given that event B has occurred.
- \( P(B|A) \) is the probability of event B given that event A has occurred.
- \( P(A) \) and \( P(B) \) are the probabilities of events A and B, respectively.

Example: Suppose we want to find the probability of a person having a certain disease given that they test positive. Let:
- \( P(D) \) be the probability of having the disease.
- \( P(Pos|D) \) be the probability of testing positive given that the person has the disease.
- \( P(Pos) \) be the probability of testing positive.

---
#**Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.**

A confidence interval is a range of values that is likely to contain the true population parameter with a certain level of confidence. It is calculated using sample statistics and takes into account the variability in the data.

Example: Calculating a 95% confidence interval for the population mean weight of apples in a basket, given a sample mean weight of 150 grams and a sample standard deviation of 10 grams.

---
#**Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.**

Example: Suppose a test for a certain disease is 99% accurate. If 1% of the population has the disease, what is the probability that a person has the disease given that they test positive?

---
#**Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.**

A 95% confidence interval for the population mean is calculated as \(\bar{x} \pm (1.96 \times \frac{\sigma}{\sqrt{n}})\), where \(\bar{x}\) is the sample mean, \(\sigma\) is the population standard deviation, and \(n\) is the sample size.


In [4]:
sample_mean = 50
std_dev = 5
sample_size = 100

margin_of_error = 1.96 * (std_dev / (sample_size ** 0.5))
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("95% Confidence Interval:", (lower_bound, upper_bound))

95% Confidence Interval: (49.02, 50.98)


---
#**Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.**

The margin of error in a confidence interval is the range of values above and below the sample statistic within which the true population parameter is likely to fall. It is influenced by the confidence level and the variability of the data.

Sample size affects the margin of error inversely proportional to the square root of the sample size. In other words, as the sample size increases, the margin of error decreases. This is because a larger sample size provides more information and reduces sampling variability.

Example: Suppose you want to estimate the average height of students in a school. With a small sample size of 20 students, the margin of error might be relatively large, say ±5 inches. However, if you increase the sample size to 200 students, the margin of error would decrease, perhaps to ±1 inch, providing a more precise estimate of the population mean height.

---
#**Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.**

The z-score formula is:
\[ z = \frac{x - \mu}{\sigma} \]
where \( x \) is the data point value, \( \mu \) is the population mean, and \( \sigma \) is the population standard deviation.

Given:
- \( x = 75 \)
- \( \mu = 70 \)
- \( \sigma = 5 \)

\[ z = \frac{75 - 70}{5} = 1 \]

Interpretation: The data point value of 75 is 1 standard deviation above the mean.

---
#**Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.**

In [5]:
import scipy.stats as stats

sample_mean = 6
population_mean = 0  # Null hypothesis: The drug is not effective (no weight loss)
sample_std_dev = 2.5
sample_size = 50
confidence_level = 0.95

t_statistic = (sample_mean - population_mean) / (sample_std_dev / (sample_size ** 0.5))
p_value = stats.t.sf(abs(t_statistic), df=sample_size-1) * 2

if p_value < (1 - confidence_level):
    print("Reject null hypothesis: The weight loss drug is significantly effective.")
else:
    print("Fail to reject null hypothesis: There is no evidence to suggest that the weight loss drug is effective.")

Reject null hypothesis: The weight loss drug is significantly effective.


---
#**Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.**


In [6]:
import statsmodels.stats.proportion as proportion

sample_proportion = 0.65
sample_size = 500
confidence_level = 0.95

conf_interval = proportion.proportion_confint(sample_proportion * sample_size, sample_size, alpha=(1 - confidence_level))

print("95% Confidence Interval for Proportion of Satisfied People:", conf_interval)

95% Confidence Interval for Proportion of Satisfied People: (0.6081925393809212, 0.6918074606190788)


---
#**Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.**

In [7]:
import scipy.stats as stats

mean_A = 85
std_dev_A = 6
n_A = 30
mean_B = 82
std_dev_B = 5
n_B = 30
confidence_level = 0.01

t_statistic = (mean_A - mean_B) / ((std_dev_A**2 / n_A) + (std_dev_B**2 / n_B))**0.5
p_value = stats.t.sf(abs(t_statistic), df=min(n_A, n_B)-1) * 2

if p_value < (1 - confidence_level):
    print("Reject null hypothesis: There is a significant difference in student performance between the two teaching methods.")
else:
    print("Fail to reject null hypothesis: There is no significant difference in student performance between the two teaching methods.")

Reject null hypothesis: There is a significant difference in student performance between the two teaching methods.


---
#**Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.**


In [8]:
import scipy.stats as stats

sample_mean = 65
population_std_dev = 8
sample_size = 50
confidence_level = 0.90

margin_of_error = stats.norm.ppf((1 + confidence_level) / 2) * (population_std_dev / (sample_size**0.5))
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("90% Confidence Interval for Population Mean:", (lower_bound, upper_bound))

90% Confidence Interval for Population Mean: (63.13906055411732, 66.86093944588268)


---
#**Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.**

In [9]:
import scipy.stats as stats

# Given data
sample_mean = 0.25
population_mean = 0  # Null hypothesis: Caffeine has no effect on reaction time
sample_std_dev = 0.05
sample_size = 30
confidence_level = 0.90

# Step 3: Calculate the t-statistic
t_statistic = (sample_mean - population_mean) / (sample_std_dev / (sample_size ** 0.5))

# Step 4: Determine the degrees of freedom
degrees_of_freedom = sample_size - 1

# Step 5: Find the critical t-value
alpha = 1 - confidence_level
critical_t_value = stats.t.ppf(1 - alpha/2, df=degrees_of_freedom)  # for a two-tailed test

# Step 6: Compare the t-statistic with the critical t-value
if abs(t_statistic) > critical_t_value:
    print("Reject the null hypothesis: Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis: There is no significant effect of caffeine on reaction time.")


Reject the null hypothesis: Caffeine has a significant effect on reaction time.
