## Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

Both t-tests and z-tests are statistical tests used to make inferences about population parameters based on sample data. 

* T-Test:
A t-test is used when the sample size is small (typically less than 30) and the population standard deviation is unknown. It relies on the t-distribution, which accounts for the uncertainty introduced by estimating the population standard deviation from the sample.

* Z-Test:
A z-test is appropriate when the sample size is large (typically greater than 30) or when the population standard deviation is known. It is based on the standard normal distribution and is more accurate under these conditions due to the central limit theorem, which states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.

#### Example:

* T-Test:
Imagine a pharmaceutical company is testing a new drug to lower blood pressure. They recruit a small sample of 20 individuals with hypertension and measure their blood pressure before and after taking the drug. Since the sample size is small, and the population standard deviation of blood pressure is unknown, a paired t-test would be appropriate to compare the means of the paired samples (before and after the drug) to determine if the drug has a statistically significant effect on reducing blood pressure.

* Z-Test:
Consider an example where we are conducting a survey to estimate the average height of all students in a large university. W collect data from a random sample of 200 students, and we happen to know the population standard deviation of height from previous research. In this case, a z-test would be suitable for testing whether the average height of our sample differs significantly from the known population average height.

## Q2: Differentiate between one-tailed and two-tailed tests.

##### One-Tailed Test
A one-tailed test is based on a uni-directional hypothesis where the area of rejection is on only one side of the sampling distribution. It determines whether a particular population parameter is larger or smaller than the predefined parameter. It uses one single critical value to test the data.

* Example: Effect of participants of students in coding competition on their fear level.

##### Two-Tailed Test
A two-tailed test is also called a nondirectional hypothesis. For checking whether the sample is greater or less than a range of values, we use the two-tailed. It is used for null hypothesis testing.

* Example: Effect of new bill pass on the loan of farmers. 

## Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

##### Type 1 Error
A type 1 error appears when the null hypothesis (H0) of an experiment is true, but still, it is rejected. A type I error is often called a false positive.

##### Type 2 Error
A type 2 error appears when the null hypothesis is false but mistakenly fails to be refused. A type II error is also known as false negative.
icance formula

* Example 1: Let us consider a null hypothesis – A man is not guilty of a crime.

Then in this case:

1. Type 1 error (False Positive):
He is condemned to crime, though he is not guilty or committed the crime.	
2. Type 2 error (False Negative)
He is condemned not guilty when the court actually does commit the crime by letting the guilty one go free.


## Q4: Explain Bayes's theorem with an example.

Bayes's Theorem is a fundamental concept in probability theory and statistics that describes how to update our beliefs about a hypothesis or event based on new evidence. It combines prior knowledge with new information to give us a more accurate estimate of the probability of the hypothesis being true. The formula for Bayes's Theorem is:

* P(A∣B)= P(B∣A)⋅P(A)/P(B)

* example:

Suppose there's a rare disease that affects 1% of the population. You want to know if someone has the disease based on a positive test result. The test is accurate 95% of the time when a person actually has the disease, and it gives a false positive (indicating disease when the person is healthy) 10% of the time.

Let's define the events:
* A: The person has the disease.
* B: The test result is positive.
We are looking to find P(A∣B), the probability that a person has the disease given a positive test result.

Using Bayes's Theorem:

P(A)=0.01, 
P(B∣A)=0.95, 
P(B∣A)=0.10,  
P(A)=1−P(A)=0.99
Now we can plug these values into the formula:
P(A∣B)=  P(B∣A)⋅P(A)+P(B∣A)⋅P(A) = 0.087

So, even with a positive test result, the probability of actually having the disease is only around 8.7%. 

## Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

A confidence interval is a range of values that is used to estimate the true value of a population parameter, such as a population mean or proportion, along with a specified level of confidence. It provides a measure of the uncertainty associated with our sample estimate.
Calculating a Confidence Interval for a Population Mean (using t-distribution):

Given a sample mean = X-bar,
sample standard deviation= s,
sample size= n, 
and a desired level of confidence (usually expressed as a percentage, such as 95% or 99%), you can calculate the confidence interval using the following formula:


Confidence Interval= 
(x- bar) ± Margin of Error

Where the margin of error is determined by the critical value from the t-distribution for the specified level of confidence and the degrees of freedom= n-1

In [2]:
## example
# Let's say a random sample of 30 students is taken to estimate the average height of all students in a university. The sample mean height is 165 cm, and the sample standard deviation is 8 cm. We want to calculate a 95% confidence interval for the population mean height.

import numpy as np

sample_mean = 165  
sample_std = 8     
sample_size = 30  
confidence_level = 0.95

# Calculate the critical value from the t-distribution
degrees_of_freedom = sample_size - 1
critical_value = np.abs(np.random.standard_t(degrees_of_freedom, size=1))  # Approximate using random t-value (for illustration)

# Calculate the margin of error
margin_of_error = critical_value * (sample_std / np.sqrt(sample_size))

# Calculate the confidence interval
confidence_interval_lower = sample_mean - margin_of_error
confidence_interval_upper = sample_mean + margin_of_error

confidence_interval = (confidence_interval_lower, confidence_interval_upper)
print("95% Confidence Interval:", confidence_interval)


95% Confidence Interval: (array([164.79592461]), array([165.20407539]))


## Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

* Sample Problem:
Suppose ywe are interested in a certain medical condition, and we know that in a given population, 10% of individuals have this condition. We also know that a specific test for this condition has a false positive rate of 5% and a true positive rate of 90% . If a randomly selected individual tests positive for the condition, what is the probability that they actually have it?

In [5]:
# Prior probability of having the condition
P_condition = 0.10

# Probability of testing positive given that the person has the condition
P_positive_given_condition = 0.90

# Probability of testing positive given that the person does not have the condition (false positive rate)
P_positive_given_no_condition = 0.05

# Probability of not having the condition
P_no_condition = 1 - P_condition


# Calculate the denominator (P(positive))
P_positive = (P_positive_given_condition * P_condition) + (P_positive_given_no_condition * P_no_condition)

# Calculate the posterior probability using Bayes' Theorem
P_condition_given_positive = (P_positive_given_condition * P_condition) / P_positive

print("Probability of having the condition given a positive test result:", P_condition_given_positive)


Probability of having the condition given a positive test result: 0.6666666666666667


## Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

In [7]:
import numpy as np
import scipy.stats as stats

sample_mean = 50
sample_std = 5
sample_size = 30
confidence_level = 0.95

# Calculate the critical value from the t-distribution
degrees_of_freedom = sample_size - 1
critical_value = stats.t.ppf((1 + confidence_level) / 2, df=degrees_of_freedom)

# Calculate the margin of error
margin_of_error = critical_value * (sample_std / np.sqrt(sample_size))

# Calculate the confidence interval
confidence_interval_lower = sample_mean - margin_of_error
confidence_interval_upper = sample_mean + margin_of_error

confidence_interval = (confidence_interval_lower, confidence_interval_upper)

print("95% Confidence Interval:", confidence_interval)


95% Confidence Interval: (48.1329693162095, 51.8670306837905)


* Interpretation:
With 95% confidence, we can say that the true population mean is likely to fall within the range of approximately 48.06 to 51.94 based on the given sample data. 

## Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

The margin of error in a confidence interval is the range around a sample estimate (such as a sample mean or proportion) that accounts for the uncertainty in estimating the true population parameter.

In mathematical terms, the margin of error is calculated as a product of the critical value (usually obtained from a distribution, such as the t-distribution or z-distribution), the standard error of the sample estimate, and sometimes, the population standard deviation.

* The relationship between sample size and the margin of error is inversely proportional. As the sample size increases, the margin of error decreases, and vice versa. This is because a larger sample size provides more information about the population, leading to a more accurate estimate of the population parameter.

* Example:
Let's consider an example to illustrate how sample size affects the margin of error:

Suppose we are conducting a survey to estimate the average age of all residents in a city. And we  have two scenarios: one with a sample size of 100 and another with a sample size of 500.

1. Scenario 1: Sample Size of 100

Sample Mean Age: 40 years
Sample Standard Deviation: 10 years
Confidence Level: 95%

2. Scenario 2: Sample Size of 500

Sample Mean Age: 40 years
Sample Standard Deviation: 10 years
Confidence Level: 95%

For both scenarios, the sample mean and standard deviation are the same. However, due to the larger sample size in Scenario 2, the margin of error will be smaller. This means that the confidence interval in Scenario 2 will be narrower than in Scenario 1.

## Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

In [9]:
data_point = 75
population_mean = 70
population_std = 5

# Calculate the z-score
z_score = (data_point - population_mean) / population_std

print("Z-Score:", z_score)

Z-Score: 1.0


* Interpretation:
The calculated z-score of 1.0 indicates that the data point with a value of 75 is 1 standard deviation above the mean of the population. In other words, this data point is relatively higher than the average value of the population by one standard deviation.

## Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

In [10]:
import numpy as np
import scipy.stats as stats

sample_size = 50
sample_mean = 6
sample_std = 2.5
confidence_level = 0.95

# Calculate the t-statistic
population_mean = 0  # Null hypothesis assumes no change in weight
degrees_of_freedom = sample_size - 1
t_statistic = (sample_mean - population_mean) / (sample_std / np.sqrt(sample_size))

# Calculate the critical t-value
critical_value = stats.t.ppf(confidence_level, df=degrees_of_freedom)

# Compare t-statistic with critical t-value
if t_statistic > critical_value:
    print("Reject the null hypothesis. The drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis. The drug is not significantly effective.")


Reject the null hypothesis. The drug is significantly effective.


## Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

In [11]:
import numpy as np
import scipy.stats as stats

sample_size = 500
sample_proportion = 0.65 
confidence_level = 0.95

# Calculate the standard error of the proportion
standard_error = np.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

# Calculate the critical z-value
critical_value = stats.norm.ppf((1 + confidence_level) / 2)  # Two-tailed test

# Calculate the margin of error
margin_of_error = critical_value * standard_error

# Calculate the confidence interval
confidence_interval_lower = sample_proportion - margin_of_error
confidence_interval_upper = sample_proportion + margin_of_error

confidence_interval = (confidence_interval_lower, confidence_interval_upper)

print("95% Confidence Interval for Proportion:", confidence_interval)


95% Confidence Interval for Proportion: (0.6081925393809212, 0.6918074606190788)


* Interpretation:
With 95% confidence, we can say that the true proportion of people who are satisfied with their job is likely to fall within the range of approximately 0.6099 to 0.6901 based on the given sample data. 

## Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

In [12]:
import numpy as np
import scipy.stats as stats

# Given data for Sample A
sample_mean_a = 85
sample_std_a = 6
sample_size_a = 30 

# Given data for Sample B
sample_mean_b = 82
sample_std_b = 5
sample_size_b = 30  



alpha = 0.01

# Calculate the pooled standard deviation
pooled_std = np.sqrt(((sample_size_a - 1) * sample_std_a**2 + (sample_size_b - 1) * sample_std_b**2) / (sample_size_a + sample_size_b - 2))

# Calculate the t-statistic
t_statistic = (sample_mean_a - sample_mean_b) / (pooled_std * np.sqrt(1/sample_size_a + 1/sample_size_b))

# Calculate the degrees of freedom
degrees_of_freedom = sample_size_a + sample_size_b - 2

# Calculate the critical t-value
critical_value = stats.t.ppf(1 - alpha / 2, df=degrees_of_freedom)

# Compare t-statistic with critical t-value
if np.abs(t_statistic) > critical_value:
    print("Reject the null hypothesis. There is a significant difference in student performance between the two teaching methods.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in student performance between the two teaching methods.")


Fail to reject the null hypothesis. There is no significant difference in student performance between the two teaching methods.


## Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

In [13]:
import numpy as np
import scipy.stats as stats

sample_mean = 65
population_mean = 60
population_std = 8
sample_size = 50
confidence_level = 0.90

# Calculate the critical z-value
critical_value = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the standard error of the mean
standard_error = population_std / np.sqrt(sample_size)

# Calculate the margin of error
margin_of_error = critical_value * standard_error

# Calculate the confidence interval
confidence_interval_lower = sample_mean - margin_of_error
confidence_interval_upper = sample_mean + margin_of_error

confidence_interval = (confidence_interval_lower, confidence_interval_upper)

print("90% Confidence Interval for Population Mean:", confidence_interval)


90% Confidence Interval for Population Mean: (63.13906055411732, 66.86093944588268)


* Interpretation:
With 90% confidence, we can say that the true population mean is likely to fall within the range of approximately 63.85 to 66.15 based on the given sample data.

## Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [14]:
import numpy as np
import scipy.stats as stats

sample_mean = 0.25
population_mean = 0 
sample_std = 0.05
sample_size = 30
confidence_level = 0.90

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean) / (sample_std / np.sqrt(sample_size))

# Calculate the degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the critical t-value
critical_value = stats.t.ppf(1 - (1 - confidence_level) / 2, df=degrees_of_freedom)

# Compare t-statistic with critical t-values for a two-tailed test
if np.abs(t_statistic) > critical_value:
    print("Reject the null hypothesis. Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis. There is no significant effect of caffeine on reaction time.")


Reject the null hypothesis. Caffeine has a significant effect on reaction time.
