### [Q1.] What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

[Ans]

| Aspect | t-test	| z-test |
| ------ | ------ | ------ |
| Sample Size	| Typically used when sample size is small (n < 30). |	Used when sample size is large (n ≥ 30). |
|Population Std Dev	| Population standard deviation is unknown. | Population standard deviation is known. |
| Distribution | Assumes the sample follows a t-distribution. |Assumes the sample follows a normal distribution. |
| Usage	| Used when data is more variable or sample size is small. |	Used when the population is well-defined and sample size is large. |

### [Q2.] Differentiate between one-tailed and two-tailed tests.

[Ans]

| **ONE-TAILED TEST** | **TWO-TAILED TEST |
| ------------------- | ----------------- |
| A one-tailed test evaluates whether the sample mean is significantly greater or less than a known value or another sample mean, in only one direction. | A two-tailed test evaluates whether the sample mean is significantly different from a known value or another sample mean, in both directions (either greater or less). |
| If you're testing whether a new drug is better than the current one (and not worse), you'd use a one-tailed test. | If you're testing whether a new drug has a different effect (could be better or worse) than the current one, you'd use a two-tailed test. |

### [Q3.] Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

[Ans]

| **TYPE 1 ERROR** | **TYPE 2 ERROR** |
| ---------------- | ---------------- |
| We reject the null hypothesis, when in reality it is true. | We retain or accept the null hypothesis, when in reality it is false. |
| The test results suggest that the new drug is more effective than the placebo, but in reality, the drug has no additional effect. |  The test results suggest that the new drug is not more effective than the placebo, but in reality, the drug does have a positive effect. |

### [Q4.] Explain Bayes's theorem with an example.

[Ans]

**Bayes's Theorem** is a fundamental concept in probability theory that describes how to update the probability of a hypothesis based on new evidence. It calculates the probability of an event based on prior knowledge of conditions related to the event.

Imagine you’re trying to figure out the probability that it rained, given that you see traffic on your way to work. Let’s assume that the chance of rain on any given day is 30%, and the likelihood of traffic when it rains is 80%. However, even when it doesn’t rain, there’s still a 40% chance of traffic. Now, using Bayes’s Theorem, we can calculate the probability that it rained, knowing there is traffic.
First, we calculate the total probability of traffic, which takes into account both when it rains and when it doesn't. This comes out to 52%. Applying Bayes’s formula, we find that the probability it rained, given that you see traffic, is about 46%. In other words, seeing traffic gives us some evidence that it might have rained, but it’s still less than a 50-50 chance because traffic can happen even without rain.

### [Q5.] What is a confidence interval? How to calculate the confidence interval, explain with an example.

[Ans]

A confidence interval is a range of values, derived from sample data, that is likely to contain the true population parameter (like the mean) with a certain level of confidence. The confidence interval gives an estimate of where this parameter lies, along with the level of uncertainty.

To calculate confidence interval we use this formula , $$CI = \bar{x} \pm Z \cdot \left( \frac{\sigma}{\sqrt{n}} \right)$$



### [Q6.] Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

[Ans]

**Problem :** Suppose a disease affects 1% of the population. A test for the disease is 90% accurate, meaning if a person has the disease, the test will be positive 90% of the time (true positive). If a person does not have the disease, the test will be negative 90% of the time (true negative). If a person tests positive, what is the probability that they actually have the disease?

**Solution :**
A = Event that the person has the disease.
B = Event that the person tests positive.

P(A) = 0.01<br>
P(B|A) = 0.90<br>
P(B|A') = 0.10<br>
P(A') = 0.99<br>

P(A|B) = P(A) * P(B|A) / P(B)

P(B) = P(A) * P(B|A) + P(B|A') * P(A')<br>
P(B) = (0.01 * 0.90) + (0.10 * 0.99)
P(B) = 0.108

P(A|B) = 0.01 * 0.90 / 0.108<br>
P(A|B) = 0.0833

Thus, the probability that the person actually have the disease is 8.33%.

### [Q7.] Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

##### [Ans]

In [2]:
import numpy as np
import scipy.stats as stats

mean = 50
std_dev = 5
sample_size = 30

z = stats.norm.ppf(0.975)
margin_of_errro = z * (std_dev / np.sqrt(sample_size))

lower_bound = mean - margin_of_errro
upper_bound = mean + margin_of_errro

print(f"95% confidence interval: ({lower_bound:.2f}, {upper_bound:.2f})")

95% confidence interval: (48.21, 51.79)


### [Q8.] What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

[Ans]

**Margin of Error :** Margin of Error Definition: The margin of error in a confidence interval is the range of values above and below the sample estimate. It indicates the extent of uncertainty surrounding the estimate.

**Effect in sample size :** The margin of error decreases as the sample size increases because the standard error decreases with a larger n.

In [4]:
# Example :
std = 5
n = 30

margin_of_error = std / np.sqrt(n)
print(f"Margin of Error: {margin_of_error:.2f}")

Margin of Error: 0.91


### [Q9.] Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

[Ans]

In [5]:
mean = 70
std = 5
x = 75

z = (x - mean) / std
print(f"Z-Score : {z}")

Z-Score : 1.0


### [Q10.] In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

[Ans]

In [7]:
# Null Hypothesis : Drug significantly effective at 95% confidence interval.
# Alternate Hypothesis : Drug not significantly effective at 95% confidence interval.

sample_size = 50
std = 2.5
confidence_interval = 0.95
alpha = 0.05
sample_mean = 6

t_critical = stats.t.ppf(1-alpha/2, df=sample_size-1)
t_statistics = sample_mean / (std/sample_size)

if(t_statistics > t_critical):
    print("Reject null hypothesis")
else:
    print("Accept null hypothesis")


Reject null hypothesis


### [Q11.] In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

[Ans]

In [3]:
import numpy as np
import scipy.stats as stats
n_survey = 500
p_satisfied = 0.65

z = stats.norm.ppf(0.975)

margin_of_error = z * (np.sqrt((p_satisfied * (1 - p_satisfied)) / n_survey))

lower_bound = p_satisfied - margin_of_error
upper_bound = p_satisfied + margin_of_error

print(f"95% Confidence Interval : ({lower_bound:.2f}, {upper_bound:.2f})")

95% Confidence Interval : (0.61, 0.69)


### [Q12.] A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

[Ans]

In [11]:
# Null Hypothesis : Both teaching has same significant difference in student performance.

alpha = 0.01
std_A = 6
std_B = 5
mean_A = 85
mean_B = 82
n_A = 30
n_B = 30

pooled_std = np.sqrt(((n_A - 1) * std_A**2 + (n_B - 1) * std_B**2) / (n_A + n_B - 2))

t_critical = stats.t.ppf(1-alpha/2, df=n_A + n_B - 2)

t_stats = (mean_A - mean_B) / (pooled_std * np.sqrt(1/n_A + 1/n_B))

if(t_stats > t_critical):
    print("Reject Null Hypothesis")
else:
    print("Accept Null Hypothesis")




Accept Null Hypothesis


### [Q13.] A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

[Ans]

In [17]:
std = 8
sample_size = 50
sample_mean = 65
ci = 0.90
alpha = 1 - ci


z = stats.norm.ppf(1-alpha/2)

margin_of_error = z * (std/np.sqrt(sample_size))

lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"90% Confidence Interval : ({lower_bound:.2f}, {upper_bound:.2f})")

90% Confidence Interval : (63.14, 66.86)


### [Q14.] In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

[Ans]

In [26]:
sample_mean = 0.25
sample_std = 0.5
sample_size = 30
ci = 0.90
alpha = 1 - ci

t_statistics = sample_mean / (std/np.sqrt(sample_size))
t_critical = stats.t.ppf(1-alpha/2, df=sample_size-1)

if(t_statistics > t_critical):
  print("Reject the NUll Hypothesis")
else:
  print("Fail to reject the Null Hypothesis")

Reject the NUll Hypothesis
