### Question 1

T-test and z-test are both statistical tests used for hypothesis testing, but they are applied under different conditions:

T-test:
1. A t-test is used when the sample size is small (typically less than 30) and the population standard deviation is unknown.
2. The t-test assumes that the population follows a normal distribution, but it is more robust to violations of this assumption compared to the z-test.

Example Scenario: If you want to compare the average test scores of two groups of students (where the sample sizes are small), you would use a t-test.

Z-test:
1. A z-test is used when the sample size is large (typically greater than 30) or when the population standard deviation is known.
2. The z-test assumes that the population follows a normal distribution and is more sensitive to violations of this assumption compared to the t-test.

Example Scenario: If you want to compare the average height of a sample of 100 adults to the known average height of the general population, you would use a z-test.

### Question 2

One-tailed test:
1. In a one-tailed test, the hypothesis is directional, and we are only interested in whether the sample mean is significantly greater than or less than the hypothesized value (one direction).
2. The critical region is only on one side of the distribution (either the left or right tail).

Example Scenario: Testing whether a new drug significantly increases the average response time (one-tailed in the positive direction).


Two-tailed test:
1. In a two-tailed test, the hypothesis is non-directional, and we are interested in whether the sample mean is significantly different from the hypothesized value (both directions).
2. The critical region is divided between the two tails of the distribution.

Example Scenario: Testing whether a new treatment significantly affects the average weight (two-tailed, as we want to see if it increases or decreases weight).

### Question 3

Type 1 Error (False Positive):
1. Type 1 error occurs when we reject the null hypothesis when it is actually true.
2. It is the probability of claiming an effect or relationship exists when it doesn't.

Example Scenario: A medical test incorrectly diagnoses a healthy person as having a disease (false positive).

Type 2 Error (False Negative):
1. Type 2 error occurs when we fail to reject the null hypothesis when it is actually false.
2. It is the probability of not detecting an effect or relationship that does exist.

Example Scenario: A medical test fails to diagnose a person with a disease when they actually have it (false negative).

### Question 4

Bayes's Theorem is a mathematical formula that allows us to update the probability of an event based on new evidence. It involves conditional probabilities and can be represented as:

P(A|B) = (P(B|A) * P(A)) / P(B)

Where:
P(A|B) is the probability of event A occurring given that event B has occurred.
P(B|A) is the probability of event B occurring given that event A has occurred.
P(A) and P(B) are the probabilities of events A and B occurring independently.

Example:
Suppose there is a medical test for a disease that is 95% accurate (P(Positive Test|Disease) = 0.95) and has a 2% false positive rate (P(Positive Test|No Disease) = 0.02). The prevalence of the disease in the population is 0.01% (P(Disease) = 0.0001).

### Question 5

A confidence interval is a range of values that provides an estimated range of a population parameter with a certain level of confidence. It is used in inferential statistics to quantify the uncertainty in a sample estimate.

Example:
Suppose you want to estimate the average height of students in a school. You take a random sample of 100 students and measure their heights. The sample mean height is 165 cm, and the sample standard deviation is 5 cm.

To calculate a 95% confidence interval for the population mean height, you can use the t-distribution (since the population standard deviation is unknown and the sample size is relatively small).

### Question 6

Bayes' Theorem is used to update the probability of an event based on new evidence. Let's consider a simple example:

Suppose there is a disease that affects 1% of the population (P(Disease) = 0.01). There is a medical test to detect the disease, and the test is 95% accurate (P(Positive Test|Disease) = 0.95) and has a 2% false positive rate (P(Positive Test|No Disease) = 0.02).

Now, let's calculate the probability of having the disease given a positive test result (P(Disease|Positive Test)) using Bayes' Theorem:

P(Disease|Positive Test) = (P(Positive Test|Disease) * P(Disease)) / P(Positive Test)

P(Disease|Positive Test) = (0.95 * 0.01) / ((0.95 * 0.01) + (0.02 * (1 - 0.01)))

P(Disease|Positive Test) = 0.32

Interpretation: The probability of having the disease given a positive test result is 32%. This means that if a person receives a positive test result, there is a 32% chance they actually have the disease.

In [4]:
# Question 7

import scipy.stats as stats

sample_mean = 50
sample_std_dev = 5
sample_size = 30
confidence_level = 0.95

# Calculate standard error of the mean
SE = sample_std_dev / (sample_size ** 0.5)

# Calculate critical value for 95% confidence level and 29 degrees of freedom
critical_value = stats.t.ppf((1 + confidence_level) / 2, df=sample_size - 1)

# Calculate margin of error
ME = critical_value * SE

# Calculate the confidence interval
lower_bound = sample_mean - ME
upper_bound = sample_mean + ME

print(f"95% Confidence Interval: ({lower_bound:.2f}, {upper_bound:.2f})")


95% Confidence Interval: (48.13, 51.87)


In [6]:
# Question 8
"""
 The margin of error (ME) in a confidence interval is the range within which the true population parameter is estimated to lie with a certain level of confidence. 
 It is a measure of the uncertainty in the estimate based on the sample data. 
 The margin of error is affected by the sample size.
"""

import scipy.stats as stats

sample_proportion_1 = 0.55
sample_size_1 = 2000
confidence_level = 0.95

# Calculate standard error of the proportion for sample 1
SE_1 = (sample_proportion_1 * (1 - sample_proportion_1) / sample_size_1) ** 0.5

# Calculate critical value for 95% confidence level
critical_value = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate margin of error for sample 1
ME_1 = critical_value * SE_1

sample_proportion_2 = 0.55
sample_size_2 = 500

# Calculate standard error of the proportion for sample 2
SE_2 = (sample_proportion_2 * (1 - sample_proportion_2) / sample_size_2) ** 0.5

# Calculate margin of error for sample 2
ME_2 = critical_value * SE_2

print(f"Margin of Error for Sample 1: {ME_1:.4f}")
print(f"Margin of Error for Sample 2: {ME_2:.4f}")


Margin of Error for Sample 1: 0.0218
Margin of Error for Sample 2: 0.0436


In [7]:
# Question 9

x = 75
population_mean = 70
population_std_dev = 5

z_score = (x - population_mean) / population_std_dev
print(f"The z-score for the data point is: {z_score:.2f}")


The z-score for the data point is: 1.00


In [13]:
# Question 10

import scipy.stats as stats

sample_mean = 6
sample_std_dev = 2.5
sample_size = 50
confidence_level = 0.95

# Calculate the standard error of the mean
SE = sample_std_dev / (sample_size ** 0.5)

# Calculate the t-score
t_score = (sample_mean - 0) / SE

# Find the critical t-value
critical_t = stats.t.ppf((1 + confidence_level) / 2, df=sample_size - 1)

# Make a decision
if abs(t_score) > critical_t:
    print("The weight loss drug is significantly effective at a 95% confidence level.")
else:
    print("There is no significant evidence that the weight loss drug is effective at a 95% confidence level.")

The weight loss drug is significantly effective at a 95% confidence level.


In [14]:
# Question 11

import scipy.stats as stats

sample_size = 500
proportion_satisfied = 0.65
confidence_level = 0.95

# Calculate the standard error of the proportion
SE = (proportion_satisfied * (1 - proportion_satisfied) / sample_size) ** 0.5

# Find the critical z-value
critical_z = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the confidence interval
lower_bound = proportion_satisfied - critical_z * SE
upper_bound = proportion_satisfied + critical_z * SE

print(f"The 95% confidence interval for the proportion of people satisfied with their job is "
      f"({lower_bound:.4f}, {upper_bound:.4f})")


The 95% confidence interval for the proportion of people satisfied with their job is (0.6082, 0.6918)


In [15]:
# Question 12

import scipy.stats as stats

sample_mean_A = 85
sample_std_dev_A = 6
sample_size_A = 30

sample_mean_B = 82
sample_std_dev_B = 5
sample_size_B = 30

confidence_level = 0.99

# Calculate pooled standard deviation
pooled_std_dev = ((sample_std_dev_A ** 2 + sample_std_dev_B ** 2) / 2) ** 0.5

# Calculate the standard error of the difference in means
SE_diff = pooled_std_dev * ((1 / sample_size_A) + (1 / sample_size_B)) ** 0.5

# Calculate the t-score
t_score = ((sample_mean_A - sample_mean_B) - 0) / SE_diff

# Find the critical t-value
critical_t = stats.t.ppf((1 + confidence_level) / 2, df=sample_size_A + sample_size_B - 2)

# Make a decision
if abs(t_score) > critical_t:
    print("There is a significant difference in student performance between the two teaching methods "
          "at a 99% confidence level.")
else:
    print("There is no significant difference in student performance between the two teaching methods "
          "at a 99% confidence level.")


There is no significant difference in student performance between the two teaching methods at a 99% confidence level.


In [16]:
# Question 13

import scipy.stats as stats

population_mean = 60
population_std_dev = 8
sample_size = 50
sample_mean = 65
confidence_level = 0.90

# Calculate the standard error of the mean
SE = population_std_dev / (sample_size ** 0.5)

# Find the critical z-value
critical_z = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the confidence interval
lower_bound = sample_mean - critical_z * SE
upper_bound = sample_mean + critical_z * SE

print(f"The 90% confidence interval for the true population mean is ({lower_bound:.2f}, {upper_bound:.2f})")


The 90% confidence interval for the true population mean is (63.14, 66.86)


In [17]:
# Question 14

sample_mean = 0.25
sample_std_dev = 0.05
sample_size = 30
confidence_level = 0.90

# Calculate the standard error of the mean
SE = sample_std_dev / (sample_size ** 0.5)

# Calculate the t-score
t_score = (sample_mean - 0.25) / SE

# Find the critical t-value
critical_t = stats.t.ppf((1 + confidence_level) / 2, df=sample_size - 1)

# Make a decision
if abs(t_score) > critical_t:
    print("Caffeine has a significant effect on reaction time at a 90% confidence level.")
else:
    print("There is no significant evidence that caffeine has an effect on reaction time at a 90% confidence level.")


There is no significant evidence that caffeine has an effect on reaction time at a 90% confidence level.
