# Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.


Both t-tests and z-tests are statistical tests used to compare two means. However, they differ in some ways:
1. A t-test is used when the standard deviation of the dataset is unknown, whereas a z-test is used when the standard deviation is known.
2. A t-test is used when the sample size is small (less than 30), whereas a z-test is used when the sample size is large (more than 30).
 
_Here's an example scenario where a t-test would be used:

Suppose you want to test whether there is a significant difference in the average weight of two breeds of dogs, Labrador Retrievers and Golden Retrievers. You randomly select 10 dogs from each breed and weigh them. You find that the mean weight of Labrador Retrievers is 70 pounds with a standard deviation of 5 pounds, and the mean weight of Golden Retrievers is 75 pounds with a standard deviation of 8 pounds.

Since the sample size is small (n = 10), and the standard deviation of the population is unknown, you would use a t-test to determine whether the difference in means is statistically significant. 
 
_Here's an example scenario where a z-test would be used :

Suppose you want to test whether a new drug is effective in reducing blood pressure. You randomly select 500 patients with high blood pressure and divide them into two groups: a treatment group and a control group. The treatment group receives the new drug, while the control group receives a placebo. After a month, you measure the blood pressure of each patient and calculate the mean blood pressure for each group.

Assume that the standard deviation of the population is known to be 10 mmHg. Since the sample size is large (n = 250 for each group), you can use a z-test to determine whether the difference in means is statistically significant. 

# Q2: Differentiate between one-tailed and two-tailed tests.


In hypotheses testing we are trying to reject a null hypotheses by doing analysis on our dataset.  This test involves formulating an alternative hypothesis to replace the null hypothesis. This alternative hypothesis may be different, less than, or greater than the null hypothesis.

A one-tailed test may be either left-tailed or right-tailed.A left-tailed test is used when the alternative hypothesis states that the true value of the parameter specified in the null hypothesis is less than the null hypothesis claims.A right-tailed test is used when the alternative hypothesis states that the true value of the parameter specified in the null hypothesis is greater than the null hypothesis claims

A two-tailed test is a statistical test used to determine whether the mean of a sample is significantly different from a hypothesized value. In a two-tailed test, the alternative hypothesis states that the true value of the parameter specified in the null hypothesis is not equal to the null hypothesis claims.

# Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.


In statistics, a Type I error occurs when we reject a null hypothesis that is actually true and a Type II error occurs when we fail to reject a null hypothesis that is actually false. 

For example :
You decide to get tested for COVID-19 based on mild symptoms. There are two errors that could potentially occur:

Type I error (false positive): the test result says you have coronavirus, but you actually don’t.

Type II error (false negative): the test result says you don’t have coronavirus, but you actually do.

# Q4:  Explain Bayes's theorem with an example.


Bayes's theorem is a fundamental concept in probability theory and statistics that describes how to update the probability of a hypothesis (an event or proposition) based on new evidence or information.

\begin{align*}
P(A|B) &= \frac{P(B|A) \cdot P(A)}{P(B)}
\end{align*}

Where:
* P(A|B) is the probability of hypothesis A given evidence B
* P(B|A) is the probability of hypothesis B given  evidence A
* P(A) is the prior probability of hypothesis A
* P(B) is the prior probability of hypothesis B


##### Bayes's Theorem Example: Medical Test for a Rare Disease

Suppose there's a rare disease that affects 1\% of the population. You go to a doctor for a test that can detect the disease with 95\% accuracy. If you test positive, what is the probability that you actually have the disease?

Let's define the events:
\begin{align*}
A & : \text{You have the disease.} \\
B & : \text{You test positive for the disease.}
\end{align*}

Given information:
\begin{align*}
P(A) &= 0.01 \quad \text{(1\% chance of having the disease)} \\
P(B|A) &= 0.95 \quad \text{(95\% accuracy of the test when you have the disease)} \\
P(B|\neg A) &= 0.05 \quad \text{(5\% false positive rate of the test)}
\end{align*}

We want to calculate $P(A|B)$, the probability of having the disease given that you tested positive.

Using Bayes's theorem:
\begin{align*}
P(A|B) &= \frac{P(B|A) \cdot P(A)}{P(B)}
\end{align*}

The denominator $P(B)$ can be calculated using the law of total probability:
\begin{align*}
P(B) &= P(B|A) \cdot P(A) + P(B|\neg A) \cdot P(\neg A) \\
&= (0.95 \cdot 0.01) + (0.05 \cdot 0.99) \\
&\approx 0.0595
\end{align*}

Now, plug in the values into Bayes's theorem:
\begin{align*}
P(A|B) &= \frac{0.95 \cdot 0.01}{0.0595} \\
&\approx 0.1597
\end{align*}

So, even if you test positive for the disease, the probability that you actually have the disease is only about 16\%. This demonstrates how the prior probability (prevalence of the disease) and the accuracy of the test affect the final probability.

Bayes's theorem allows us to update our beliefs in a systematic way based on new evidence, making it a powerful tool in decision-making under uncertainty.


# Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.



When we do estimation on a parameter, there may be chance that this estimation are incorrect due to the randomness of the sample. So to deal with randomnes, we are using interval with degree of confidence.

To calculate Confidence we use this formula:

* lower bound = sample_estimation - margin error 
 
* upper bound = sample_estimation + margin error

Where margin error for a two tailed test is :

##### Z-test

$
\text{Margin of Error} = Z(1-alpha/2) \times \frac{\sigma}{\sqrt{n}}
$

Where:

1. Z is the z-score corresponding to the desired confidence level.
2. σ is the population standard deviation.
3. n is the sample size.


##### T-test 
$
\text{Margin of Error} = T(1-alpha/2) \times \frac{\sigma}{\sqrt{n}}
$

1. T is the corresponding t-table for the confidence level
2. σ is the sample standard deviation.
3. n is the sample size.


# Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.



* P(Fire) dangerous fires are rare (1%)
* P(Smoke) smoke is fairly common (10%)
* P(Smoke|Fire) 90% of dangerous fires make smoke

In [1]:
# P(Fire) = p1
p1 = 0.01
# P(Smoke) = p2
p2 = 0.1
# P(Smoke|Fire) = p3
p3 = 0.9
# P(Fire|Smoke) = p4
p4 = p1*p3/p2
  
print(f"Probability of dangerous Fire when there is Smoke: {p4}")

Probability of dangerous Fire when there is Smoke: 0.09000000000000001


# Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.


In [2]:
import math
from scipy.stats import t

In [3]:
def interval_estimate( sample_mean, sample_std, sample_size ):
    """This function estimate the interval estimate of population mean using the sample mean, standard deviation ,and sample size."""
    alpha = 0.05
    
    #bondary for a two tailed 
    t_crit = t.ppf(1-alpha/2, sample_size-1 )
    
    #standard error of the mean
    std_error = sample_std/math.sqrt(sample_size)
    
    #margin error using 95% confidence interval
    margin_error = t_crit * std_error
    
    #lower and upper bounds of CI
    lower_bound = sample_mean - margin_error
    upper_bound = sample_mean + margin_error
    
    #return the interval estimate
    return (lower_bound,upper_bound)

In [4]:
interval_estimate(50, 5, 30)

(48.1329693162095, 51.8670306837905)

# Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.



The margin of error in a confidence interval is the range of values that is likely to contain the true population parameter with a certain level of confidence. A larger sample size generally results in a smaller margin of error, as larger samples tend to better represent the population and reduce the impact of random variation.

For a two tailed T-test, margin error is given b this formula. 

$
\text{Margin of Error} = T(1-alpha/2) \times \frac{\sigma}{\sqrt{n}}
$

So mathematicaly when n become larger, the margin error become more and more smaller.

For example

In [5]:
interval_estimate(50,5,30)

(48.1329693162095, 51.8670306837905)

Now let's increase the sample size to 1000. We can see that the range of confidence interval become smaller

In [6]:
interval_estimate(50,5,100)

(49.00789152424566, 50.99210847575434)

# Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.



$$z = \frac{x - \mu}{\sigma}$$
Where:
* x is the data point
* mu is the population mean 
* sigma is the population standard deviation

In [7]:
z_score = (75-70)/5
print(z_score)

1.0


A z-score of 1 means that the data point is 1 standard deviation above the population mean. In other words, the value of 75 is relatively high compared to the rest of the population.

# Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.


In [8]:
Ho = "The drug is not significantly effective" # population mean = 0
Ha = "The drug is effective"  # population mean is not equal to 0

#we assume the null hypotheses is correct at first
µ = 0

#sample mean
s_mean = 6
s_std = 2.5
n = 50
alpha = 0.05

#t-statistics
t_stat = (s_mean - µ )/(s_std/math.sqrt(n))

#bondary
t_crit = t.ppf( 1-alpha, n-1 )


print(f"t_crit = {t_crit}, t_stat = {t_stat}")
if t_stat < -t_crit or t_stat > t_crit:
    print("Reject Ho.", Ha)
else:
    print("Fail to reject Ho.", Ho)



t_crit = 1.6765508919142629, t_stat = 16.970562748477143
Reject Ho. The drug is effective


# Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.


In [9]:
sample_proportion = 0.65
sample_size = 500
alpha = 0.05

#z score
z_score = norm.ppf(alpha/2)

# Calculate standard error
standard_error = math.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

# Calculate margin of error
margin_of_error = z_score * standard_error

# Calculate confidence interval
lower_bound = sample_proportion - margin_of_error
upper_bound = sample_proportion + margin_of_error

# Print the confidence interval
print("95% Confidence Interval:", (lower_bound, upper_bound))

NameError: name 'norm' is not defined

# Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.


In [None]:
Ho = "Two teaching methods are similar in their aim to improve student performance"
Ha = "Two teaching method are not similar in their aim to improve student performance "

#sample 1
mean1 = 85
std1 = 6
n1 = 30
#sample 2
mean2 = 82
std2 = 5
n2 = 30

alpha = 0.01

#degree of freedom for 2 sample with different variance
df = ((std1**2/n1 + std2**2/n2 )**2)/( (std1**2/n1)**2/(n1-1) + (std2**2/n2)**2/(n2-1) )

#t-statistics
t_stat = (mean1 - mean2)/math.sqrt( std1**2/n1 + std2**2/n2 )

#bondary
t_crit = t.ppf( 1-alpha/2, df)

print(f"t-stat = {t_stat}, t_crit = {t_crit}")
if t_stat < -t_crit or t_stat > t_crit:
    print("Reject Ho.", Ha)
else:
    print("Fail to reject Ho.", Ho)



# Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.


In [None]:
from scipy.stats import norm

In [None]:
#population
mean = 60
std = 8

#sample
s_mean = 65
n = 50

alpha = 0.1

#bondaries for a two tailed test
z_crit = norm.ppf( 1-alpha/2 )
 
lower_bound = mean - z_crit*std/math.sqrt(n)
upper_bound = mean + z_crit*std/math.sqrt(n)

print(f"The 90% confidence interval for true population mean is {(lower_bound,upper_bound)}")


# Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

According to research the average reaction time of a person is around 250 milliseconds

In [None]:
Ho = "The cafeine has a significant effect on reaction time"
Ha = "The cafeine doesn't affect the reaction time"

#population
µ = 0.25
#sample
mean = 0.25
std = 0.05
n = 30 

alpha = 0.1

# bondaries
t_crit = t.ppf( 1-alpha/2, n-1)

# t-statistics
t_stat = (mean - µ)/(std/math.sqrt(n))

print(f"t-stat = {t_stat}, t_crit = {t_crit}")
if t_stat < -t_crit or t_stat > t_crit:
    print("Reject Ho.", Ha)
else:
    print("Fail to reject Ho.", Ho)
