## Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

### T-Test:
#### A t-test is used when you have a small sample size (typically n < 30) or when the population standard deviation is unknown.
#### It is appropriate when you are comparing a sample mean to a hypothesized population mean or when you are comparing two sample means.
#### The test statistic in a t-test follows a t-distribution, which has fatter tails than the normal distribution.

### Example Scenario for a T-Test:
#### Suppose a pharmaceutical company develops a new drug and wants to test its effectiveness in reducing blood pressure. They randomly select 20 participants and measure their blood pressure before and after taking the drug. The company wants to determine whether there is a significant decrease in blood pressure after taking the drug. Since the sample size is relatively small, a t-test would be appropriate for this scenario.

### Z-Test:
#### A z-test is used when you have a large sample size (typically n > 30) and the population standard deviation is known.
#### It is appropriate when you are comparing a sample mean to a known population mean or when you are comparing two sample means.
#### The test statistic in a z-test follows a standard normal distribution (z-distribution).

### Example Scenario for a Z-Test:
#### Suppose you work for a manufacturing company that claims their product has an average weight of 500 grams. You take a random sample of 50 products and find that their average weight is 505 grams. You can use a z-test to determine whether the sample mean of 505 grams is significantly different from the claimed population mean of 500 grams.

## Q2: Differentiate between one-tailed and two-tailed tests.

### One-Tailed Test:

#### In a one-tailed test, the null hypothesis is tested against a specific direction or outcome. The alternative hypothesis is formulated to check whether the sample data is significantly greater than or significantly less than a certain value.
#### The critical region is located on one side of the distribution (either the left tail or the right tail), depending on the specific direction specified in the alternative hypothesis.
#### A one-tailed test is used when you are interested in determining whether the sample data deviates in only one specific direction.

### Two-Tailed Test:

#### In a two-tailed test, the null hypothesis is tested against the possibility of any significant difference, regardless of direction. The alternative hypothesis is generally formulated to check whether the sample data is significantly different from a certain value.
#### The critical region is divided into two parts, located in both tails of the distribution.
#### A two-tailed test is used when you are interested in determining whether the sample data deviates significantly in either direction from the null hypothesis.

## Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

### Type 1 Error (False Positive):
#### A Type 1 error occurs when the null hypothesis is incorrectly rejected when it is actually true. In other words, you conclude there is a significant effect or difference when there isn't one in reality. The probability of committing a Type 1 error is denoted by the symbol alpha and is also called the significance level.

### Example Scenario for Type 1 Error:
#### Suppose a pharmaceutical company is testing a new drug to determine if it reduces cholesterol levels. The null hypothesis states that the drug has no effect on cholesterol levels. However, due to a small sample size or random variability, the company's analysis incorrectly leads them to reject the null hypothesis and claim that the drug is effective, even though it's not. This is a Type 1 error.

### Type 2 Error (False Negative):
#### A Type 2 error occurs when the null hypothesis is incorrectly not rejected when it is actually false. In other words, you fail to detect a significant effect or difference that actually exists. The probability of committing a Type 2 error is denoted by the symbol beta.

### Example Scenario for Type 2 Error:
#### Continuing from the previous example, suppose the new drug actually does reduce cholesterol levels, but the company's study fails to show a statistically significant difference due to a small sample size or other factors. In this case, the null hypothesis is not rejected, even though it should have been rejected. This is a Type 2 error.

## Q4: Explain Bayes's theorem with an example.

#### Bayes's Theorem is a fundamental concept in probability theory and statistics that describes how to update the probability of a hypothesis based on new evidence. It provides a mathematical framework for incorporating new information into our beliefs or understanding about a situation.

The formula for Bayes's Theorem is as follows:

P(A|B) = P(B|A).P(A)/P(B)

Where:


##### P(A∣B) is the posterior probability of event A given event B has occurred.
##### P(B∣A) is the likelihood of event B occurring given that event A has occurred.
##### P(A) is the prior probability of event A.
##### P(B) is the probability of event B.

Let's explain Bayes's Theorem with a classic example: medical testing.

Suppose we are concerned about having a rare disease and you decide to take a medical test. Let's define the following events:


A: Having the disease (the hypothesis you want to test).
B: Testing positive on the medical test.


We know the following probabilities:

##### P(A): The prior probability of having the disease (which might be very low since it's a rare disease).
##### P(B∣A): The probability of testing positive given that you actually have the disease (this is the test's sensitivity).
##### P(B∣¬A): The probability of testing positive given that you don't have the disease (this is the test's false positive rate).


Now, you want to find out the probability that you actually have the disease given that you tested positive, P(A∣B).

Using Bayes's Theorem:

P(A∣B)= P(B∣A)⋅P(A)/P(B)

We need to calculate  P(B), which can be expressed as the sum of two cases: testing positive and having the disease, and testing positive and not having the disease.


P(B)=P(B∣A)⋅P(A)+P(B∣¬A)⋅P(¬A)

Where P(¬A) is the probability of not having the disease.

Now, you can substitute the values you know into the equation to calculate P(A∣B), which represents the updated probability of having the disease after testing positive.

## Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

#### A confidence interval is a statistical range that provides an estimate of the true value of a population parameter based on a sample from that population.

Confidence Interval = Sample Mean±Z-Score * Population Standard Deviation/ root under sample size


Suppose a company wants to estimate the average time it takes for customers to complete a specific task on their website. They collect a random sample of 50 customers and record their completion times. The sample mean completion time is 25.6 seconds, and the population standard deviation is known to be 4.2 seconds. The company wants to calculate a 95% confidence interval for the true average completion time.

1.Determine the Z-Score for the 95% confidence level. For a 95% confidence level, the critical value (Z-Score) is approximately 1.96 (you can find this value in a standard normal distribution table).

2.Plug in the values into the formula:
Confidence Interval =25.6±1.96*4.2/50

3.Calculate the margin of error:
Margin of Error=1.96*4.250 =Approx 1.172

4.Calculate the confidence interval:
Lower bound = 25.6 - 1.172 = 24.428
Upper bound = 25.6 + 1.172 = 26.772

So, the 95% confidence interval for the true average completion time is approximately 24.428 to 26.772 seconds. This means that the company can be 95% confident that the true average completion time for all customers falls within this range based on the sample data.

## Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

Sample Problem:
Suppose a factory produces light bulbs, and it's known that 5% of the bulbs produced are defective. A quality control test is performed, which is 90% accurate at detecting defective bulbs and 95% accurate at identifying non-defective bulbs. If a randomly selected light bulb fails the quality control test, what is the probability that it is actually defective?

Solution:
Let's define the events:

D: The event that a randomly selected bulb is defective.
¬D: The event that a randomly selected bulb is not defective.
T: The event that a bulb fails the quality control test.

We are asked to find P(D∣T), the probability that a bulb is defective given that it fails the quality control test.

We know the following probabilities:

##### P(D)=0.05 (prior probability of a bulb being defective)
##### P(¬D)=1−P(D)=0.95 (prior probability of a bulb not being defective)
##### P(T∣D)=0.90 (probability of failing the test given that the bulb is defective)
##### P(T∣¬D)=1−0.95=0.05 (probability of failing the test given that the bulb is not defective)

Using Bayes' Theorem:

P(D∣T)= P(T∣D)⋅P(D)/P(T)

We need to calculate P(T), which can be expressed as the sum of two cases: failing the test and being defective, and failing the test and not being defective.

P(T)=P(T∣D)⋅P(D)+P(T∣¬D)⋅P(¬D)

Substitute the values into the formula to calculate P(D|T):
P(D∣T)= P(T∣D)⋅P(D)/P(T) =0.90*0.05/0.90*0.05+0.05⋅0.95 = approx (0.487)

So, the probability that a bulb is actually defective given that it fails the quality control test is approximately 0.487, or 48.7%.

## Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

In [1]:
import scipy.stats as stats

# Given data
sample_mean = 50
population_stddev = 5
sample_size = 30
confidence_level = 0.95

# Calculate the standard error
standard_error = population_stddev / (sample_size ** 0.5)

# Calculate the Z-Score for the desired confidence level
z_score = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_score * standard_error

# Calculate the confidence interval
confidence_interval_lower = sample_mean - margin_of_error
confidence_interval_upper = sample_mean + margin_of_error

# Print the results
print("95% Confidence Interval:")
print("Lower Bound:", confidence_interval_lower)
print("Upper Bound:", confidence_interval_upper)


95% Confidence Interval:
Lower Bound: 48.210805856282846
Upper Bound: 51.789194143717154


## Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

#### The margin of error (MOE) in a confidence interval is the range around a point estimate that indicates the uncertainty associated with the estimate. It represents the maximum amount by which the estimate might differ from the true population parameter. In other words, it's a measure of the precision of the estimate.

### Sample size has a direct impact on the margin of error:

#### Inverse Relationship with Sample Size: As the sample size increases, the margin of error decreases, holding other factors constant. This is because larger sample sizes provide more information about the population, leading to a more accurate estimate.

#### Direct Relationship with Variability: The margin of error is also affected by the variability of the data. If the data points in the sample are more spread out (higher variability), the margin of error will be larger for a given sample size.

Example:
Let's consider an example to illustrate how sample size affects the margin of error.

Suppose a political polling organization wants to estimate the proportion of voters who support a particular candidate in an upcoming election. They conduct two separate surveys with different sample sizes:

Survey 1:

##### Sample Size (n) = 500
##### Proportion of Support (p) = 0.60 (60%)
##### Confidence Level = 95%

Survey 2:

##### Sample Size (n) = 1000
##### Proportion of Support (p) = 0.60 (60%)
##### Confidence Level = 95%

Using the formula for the margin of error for a proportion:

Margin of Error=Z-Score * √p*(1-p)/n


For a 95% confidence level, the Z-Score is approximately 1.96.

Survey 1:
MOE (Survey 1) = 1.96 * √0.60*(1-0.60)/500 = Approx(0.042)

MOE (Survey 2) = 1.96 * √0.60*(1-0.60)/1000 = Approx (0.030)


##### In this example, Survey 2 has a larger sample size (n=1000) compared to Survey 1 (n=500). As a result, Survey 2 has a smaller margin of error (0.030) compared to Survey 1 (0.042). This means that the estimate of the proportion of voters who support the candidate is more precise in Survey 2 due to the larger sample size, leading to a smaller range of possible values around the estimate.


## Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

In [2]:
import scipy.stats as stats

# Given values
data_point = 75
population_mean = 70
population_stddev = 5

# Calculate the z-score
z_score = (data_point - population_mean) / population_stddev

# Print the result
print("Z-Score:", z_score)


Z-Score: 1.0


## Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

In [3]:
import scipy.stats as stats

# Given values
sample_mean = 6
null_mean = 0
sample_stddev = 2.5
sample_size = 50
confidence_level = 0.95
degrees_of_freedom = sample_size - 1

# Calculate the t-statistic
t_statistic = (sample_mean - null_mean) / (sample_stddev / (sample_size ** 0.5))

# Calculate the critical t-value
critical_t_value = stats.t.ppf(1 - (1 - confidence_level) / 2, degrees_of_freedom)

# Perform the hypothesis test
if abs(t_statistic) > critical_t_value:
    print("Reject the null hypothesis: The drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis: The drug is not significantly effective.")


Reject the null hypothesis: The drug is significantly effective.


## Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

In [4]:
# Given values
sample_proportion = 0.65
sample_size = 500
confidence_level = 0.95
z_score = 1.96  # Z-Score for 95% confidence level

# Calculate the standard error
standard_error = (sample_proportion * (1 - sample_proportion) / sample_size) ** 0.5

# Calculate the margin of error
margin_of_error = z_score * standard_error

# Calculate the confidence interval
confidence_interval_lower = sample_proportion - margin_of_error
confidence_interval_upper = sample_proportion + margin_of_error

# Print the results
print("95% Confidence Interval:")
print("Lower Bound:", confidence_interval_lower)
print("Upper Bound:", confidence_interval_upper)


95% Confidence Interval:
Lower Bound: 0.608191771144905
Upper Bound: 0.6918082288550951


## Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

In [5]:
import scipy.stats as stats

# Given values for sample A
sample_mean_a = 85
sample_stddev_a = 6
sample_size_a = 30  # Assumed sample size

# Given values for sample B
sample_mean_b = 82
sample_stddev_b = 5
sample_size_b = 35  # Assumed sample size

# Significance level
alpha = 0.01

# Calculate the pooled standard deviation
pooled_stddev = ((sample_stddev_a**2 / sample_size_a) + (sample_stddev_b**2 / sample_size_b)) ** 0.5

# Calculate the degrees of freedom
degrees_of_freedom = sample_size_a + sample_size_b - 2

# Calculate the t-statistic
t_statistic = (sample_mean_a - sample_mean_b) / (pooled_stddev * (1/sample_size_a + 1/sample_size_b)**0.5)

# Calculate the critical t-value
critical_t_value = stats.t.ppf(1 - alpha/2, degrees_of_freedom)

# Perform the hypothesis test
if abs(t_statistic) > critical_t_value:
    print("Reject the null hypothesis: There is a significant difference in student performance between the two teaching methods.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference in student performance between the two teaching methods.")


Reject the null hypothesis: There is a significant difference in student performance between the two teaching methods.


## Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

In [6]:
# Given values
sample_mean = 65
population_stddev = 8
sample_size = 50
confidence_level = 0.90
z_score = 1.645  # Z-Score for 90% confidence level

# Calculate the standard error
standard_error = population_stddev / (sample_size ** 0.5)

# Calculate the margin of error
margin_of_error = z_score * standard_error

# Calculate the confidence interval
confidence_interval_lower = sample_mean - margin_of_error
confidence_interval_upper = sample_mean + margin_of_error

# Print the results
print("90% Confidence Interval:")
print("Lower Bound:", confidence_interval_lower)
print("Upper Bound:", confidence_interval_upper)


90% Confidence Interval:
Lower Bound: 63.13889495191701
Upper Bound: 66.86110504808299


## Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [7]:
import scipy.stats as stats

# Given values
sample_mean = 0.25
hypothesized_mean = 0.25
sample_stddev = 0.05
sample_size = 30
confidence_level = 0.90
degrees_of_freedom = sample_size - 1

# Calculate the t-statistic
t_statistic = (sample_mean - hypothesized_mean) / (sample_stddev / (sample_size ** 0.5))

# Calculate the critical t-value
critical_t_value = stats.t.ppf(1 - (1 - confidence_level) / 2, degrees_of_freedom)

# Perform the hypothesis test
if abs(t_statistic) > critical_t_value:
    print("Reject the null hypothesis: Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis: Caffeine does not have a significant effect on reaction time.")




Fail to reject the null hypothesis: Caffeine does not have a significant effect on reaction time.
