# Hypothesis Testing

### 1.	A drug company is testing a new medication that is designed to lower cholesterol levels in patients. The company wants to be sure that the medication is effective, but also wants to avoid false claims. What is the risk of making a type 1 error in this scenario, and how could the company minimize this risk?

Ans)  A Type 1 error, also referred to as a false positive, happens when the null hypothesis is disregarded even though it is correct. A Type 1 error would be asserting that a drug is effective in lowering cholesterol levels when it isn't, as in the case of a pharmaceutical company testing a new cholesterol-lowering drug.

The drug company can use a number of tactics to reduce the risk of Type 1 errors, including:

a.	Choosing an appropriate sample size: A power analysis can be used to choose an appropriate sample size and ensure that the study has enough statistical power to identify significant variations in cholesterol levels. In general, the likelihood of Type 1 errors is decreased by larger sample sizes. 

b.	Strict significance level: Instead of using the standard significance level (alpha) of 0.05, the company may choose to use a strict significance level (alpha) for their hypothesis test, such as 0.01 or 0.005. The likelihood of rejecting the null hypothesis when it is true is decreased as a result.

c.	Using a placebo control group: The business can include a control group in a clinical trial that receives a placebo rather than the active medication. The company can distinguish between the medication's actual effectiveness and any potential placebo effects by comparing the effects of the medication group to the placebo group.

d.	Conduct a well-designed study: A well-designed study can reduce potential biases and increase the reliability of the results by adhering to strict protocols, such as proper randomization, blinding, and control groups.

e.	 Seek independent validation: The business can work with outside professionals or regulatory organizations to carry out an impartial review of the study's findings. By doing this, bias is lessened and the findings' accuracy and dependability are guaranteed.

f.	Repeat the study: Carrying out several independent studies on various populations can offer more solid proof of the drug's efficacy and support the preliminary findings.

### 2.	A court is trying to determine whether a defendant is guilty or innocent of a crime. What is the risk of making a type 2 error in this scenario, and how could the court minimize this risk?

Ans) When the court fails to reject the null hypothesis (innocence) even though it is false (the defendant is guilty), this is referred to as a Type 2 error, also known as a false negative in the context of a criminal trial. Making a Type 2 error here runs the risk of wrongful acquittal of an innocent defendant. The court has several options for reducing this risk. These consist of conducting a thorough investigation, gathering and evaluating all pertinent evidence, allowing for a fair and impartial trial procedure, giving the defense enough resources, allowing expert testimony when necessary, and emphasizing the idea of "innocent until proven guilty" to ensure the burden of proof rests with the prosecution. These procedures can help the court work towards its goals.

### 3.	A marketing company is testing the effectiveness of a new advertising campaign. The company wants to be sure that the campaign is effective, but also wants to avoid spending money on an ineffective campaign. What is the risk of making a type 1 error in this scenario, and how could the company minimize this risk?

Ans) A Type 1 error, or false positive, would happen if the business concludes that the campaign is effective when it isn't when testing the efficacy of a new advertising campaign. The danger in this situation is that the business could spend time and money on an ineffective campaign. The business has several options for reducing this risk. Before launching the campaign, these include thorough market research and audience analysis, setting clear, measurable goals for success, implementing appropriate tracking and measurement mechanisms to assess the campaign's impact, performing A/B testing or control group comparisons, and getting feedback from the target audience. The company can obtain more precise data and make knowledgeable decisions about the market by putting these measures into practice.

### 4.	A researcher is testing a hypothesis that a new drug can cure a particular disease. The researcher wants to be sure that the drug is effective, but also wants to avoid false claims. What is the risk of making a type 1 error in this scenario, and how could the researcher minimize this risk?

Ans) A Type 1 error, also known as a false positive, happens when a researcher concludes that a drug is effective in treating a disease when it isn't. This can happen when a new drug is being tested to determine its ability to treat a specific disease. Here, there is a chance that unjustified enthusiasm or patient-harming outcomes could result from false claims. The researcher can use several strategies to reduce this risk. These involve carrying out exacting and well-planned clinical trials with suitable sample sizes, incorporating placebo control groups, using statistical analysis to establish significance levels and confidence intervals, adhering to predetermined success criteria, making sure blinding and randomization are implemented whenever possible, seeking independent validation of results, and publishing findings in reputable peer-reviewed journals. By adhering to these procedures, the researcher can increase the study's validity and reliability, lessen the possibility of making exaggerated claims about the efficacy of the drug, and provide accurate and trustworthy information about the drug's ability to treat the disease.

### 5.	A teacher is trying to determine whether a new teaching method is effective in improving student performance. The teacher wants to be sure that the method is effective, but also wants to avoid false claims. What is the risk of making a type 1 error in this scenario, and how could the teacher minimize this risk?

Ans) An example of a Type 1 error, also known as a false positive, would be if a teacher incorrectly believes that a new teaching strategy is improving student performance when it isn't. Here, there is a risk that exaggerated claims could encourage the use of a bad teaching strategy and possibly harm students' academic progress. There are several tactics the teacher can use to reduce this risk. These include performing controlled experiments or comparative studies with control groups, using reliable and valid assessment measures to evaluate student performance, making sure the sample size is big enough for statistical power, thinking about longitudinal studies to look at the long-term effects of the teaching method, getting input from students and coworkers, and sharing the findings for peer review and collaboration. By putting these measures in place, the instructor can gather trustworthy data, assess the efficacy of the teaching strategy, and lessen the possibility of exaggerating the results of an ineffective instructional strategy.

### 6.	The average height of students in a school is 68 inches with a standard deviation of 4 inches. What is the z score of a student who is 72 inches tall?

Ans) We can use the following formula to determine a student's z-score who is 72 inches tall:
z = (x - μ) / σ
where: - x is the measurement we want to turn into a z-score (in this case, 72 inches);
- (Students' average height, given as 68 inches), is the population mean.
The population standard deviation is indicated as being 4 inches.

When the values are plugged in, we get:

z = (72 - 68) / 4

To make the calculation easier:

z = 4 / 4

Consequently, a student who is 72 inches tall has a z-score of 1. 

The z-score indicates how far away from the mean each individual value is. In this instance, the student's height of 72 inches is one standard deviation higher than the school's average student height.

In [1]:
mean_height = 68
stdev_height = 4
student_height = 72
Z_height = (student_height-mean_height)/stdev_height
Z_height

1.0

### 7.	A company wants to compare the salaries of its employees with those of other companies in the same industry. The mean salary in the industry is 60,000 USD per year with a standard deviation of 5,000 USD. If the company's mean salary is 65,000 USD per year, what is the z score for the company?

Ans) To calculate the z-score for the company's mean salary, we can use the formula:

z = (x - μ) / σ

where:
- x is the value we want to convert to a z-score 
(the company's mean salary, 65,000 USD per year in this case)
- μ is the population mean 
(mean salary in the industry, 60,000 USD per year)
- σ is the population standard deviation 
(5,000 USD per year)

Plugging in the values, we have:

z = (65,000 - 60,000) / 5,000

Simplifying the equation:

z = 5,000 / 5,000

Therefore, the z-score for the company's mean salary is 1.

The z-score measures how many standard deviations an individual value or group's value is from the mean. In this case, the company's mean salary of 65,000 USD per year is 1 standard deviation above the mean salary in the industry.

In [2]:
mean_salary = 60000
stdev_salary = 5000
company_salary = 65000
Z_company = (company_salary - mean_salary)/stdev_salary
Z_company

1.0

### 8.	A university has a policy that any student who scores below the 25th percentile on an admission test cannot be admitted. If the test scores are normally distributed with a mean of 75 and a standard deviation of 10, what is the minimum score required for admission?

Ans)  To find the minimum score required for admission, we need to determine the value at
the 25th percentile of the normal distribution.

The z-score corresponding to the 25th percentile can be found using the z-score formula:

z = (x − μ) / σ

where:
x is the value we want to find,
μ is the mean of the distribution (75 in this case), and
σ is the standard deviation of the distribution (10 in this case).
From standard normal distribution tables or using a calculator, we can find that the z-score
corresponding to the 25th percentile is approximately -0.674.
−0.674 = (x − 75) / 10

Solving for x:

-6.74 = x − 75
∴ x ≈ 68.26

Therefore, the minimum score required for admission is approximately 68.26.

In [3]:
from scipy import stats
mean_score = 75
stdev_score = 10
Z_25th_percentile = stats.norm.ppf(0.25)
min_score = Z_25th_percentile * stdev_score + mean_score
round(min_score, 2)

68.26

### 9.	A company produces light bulbs with a mean life of 1000 hours and a standard deviation of 100 hours. If the company wants to provide a warranty for bulbs that last in the top 10% of the distribution, what is the minimum life required for the warranty?

Ans)  To find the minimum life required for the warranty, we need to determine the value at
the 90th percentile of the distribution.
The z-score corresponding to the 90th percentile can be found using the z-score formula:

z = (x − μ) / σ

where:
x is the value we want to find,
μ is the mean of the distribution (1000 hours in this case), and
σ is the standard deviation of the distribution (100 hours in this case).
From standard normal distribution tables or using a calculator, we can find that the z-score
corresponding to the 90th percentile is approximately 1.28.
1.28 = (x − 1000) / 100

Solving for x:

1.28 ∗ 100 = x − 1000
∴ x = 1128

Therefore, the minimum life required for the warranty is 1128 hours.

In [4]:
mean_lightbulb_life_in_hrs = 1000
stdev_lightbulb_life_in_hrs = 100
#top 10% of distribution corresponds to Z score of cumulative prob. of 90%
Z_90th_percentile = stats.norm.ppf(0.9)
min_lightbulb_life_for_warranty = Z_90th_percentile * stdev_lightbulb_life_in_hrs + mean_lightbulb_life_in_hrs
round(min_lightbulb_life_for_warranty)

1128

### 10.	A teacher wants to know how well her students did on a test compared to the average performance of students across the country. The average score on the test across the country is 75 with a standard deviation of 10. If the teacher's class has an average score of 80, what is the z score for the class?

Ans) To calculate the z-score for the teacher's class, we can use the formula:
z = (x − μ) / σ
where:
x is the value we want to calculate the z-score for (80 in this case),
μ is the mean of the distribution (75, the average score across the country), and
σ is the standard deviation of the distribution (10, the standard deviation across the country).
z = (80 − 75) / 10 = 0.5
Therefore, the z-score for the teacher's class with an average score of 80 is 0.5.

In [5]:
avg_score = 75
stdev_avg_score = 10
class_score = 80
Z_class_score = (class_score - avg_score) / stdev_avg_score

In [6]:
Z_class_score

0.5

# More Hypothesis Testing

### Problem Statement 1: Blood glucose levels for obese patients have a mean of 100 with a standard deviation of 15. A researcher thinks that a diet high in raw cornstarch will have a positive effect on blood glucose levels. A sample of 36 patients who have tried the raw cornstarch diet have a mean glucose level of 108. Test the hypothesis that the raw cornstarch had an effect or not.

In [7]:
#1-tailed Z-test
Z_95th_percentile = stats.norm.ppf(0.95) #alpha = 0.05 (significance level)
mean_glucose = 100
stdev_mean_glucose = 15
mean_glucose_with_cornstarch = 108
sample_size = 36
Z_mean_glucose_with_cornstarch = (mean_glucose_with_cornstarch - mean_glucose) / (stdev_mean_glucose / 36**0.5)
print(Z_mean_glucose_with_cornstarch > Z_95th_percentile)
# therefore, since Z statistic of those with mean glucose levels of 108 is greater than the Z score at an alpha of 0.05, we can
# conclude that the raw cornstarch did have an effect on the patients' glucose levels

True


### Problem Statement 2: In one state, 52% of the voters are Republicans, and 48% are Democrats. In a second state, 47% of the voters are Republicans, and 53% are Democrats. Suppose a simple random sample of 100 voters are surveyed from each state. What is the probability that the survey will show a greater percentage of Republican voters in the second state than in the first state?

In [8]:
# Test the hypothesis or form a confidence interval for the difference between two independent population proportions, π1 – π2
# comparing Republicans from first state (52%) and second state (47%)
# n1*π1 >= 5? n2*π2 >= 5?
n1 = 100
state1_rep = 0.52
n2 = 100
state2_rep = 0.47
print(n1 * state1_rep >= 5, 
      n2 * state2_rep >= 5)
# required assumption is satisfied, so can use following equations to solve

True True


In [9]:
pooled_estimate = (state1_rep*n1 + state2_rep*n2) / (n1 + n2)
pooled_estimate

0.495

In [10]:
z_states_rep = (state2_rep - state1_rep) / (pooled_estimate*(1-pooled_estimate)*((1/100) + (1/100)))**(0.5)
z_states_rep

-0.7071421391774789

In [11]:
prob = stats.norm.cdf(z_states_rep)
print(f'{round(prob * 100, 2)}%')

23.97%


### Problem Statement 3: You take the SAT and score 1100. The mean score for the SAT is 1026 and the standard deviation is 209. How well did you score on the test compared to the average test taker?

In [12]:
your_score = 1100
mean_score = 1026
stdev_mean_score = 209
Z_your_score = (your_score - mean_score) / stdev_mean_score
percentile_your_score = stats.norm.cdf(Z_your_score)
round(percentile_your_score, 2)
# your score is at the 64th percentile, meaning you scored higher than the 14% that still scored above average (50th percentile)

0.64