<a href="https://colab.research.google.com/github/CANIND111/ML/blob/main/S1234.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. A researcher conducts a two-mean hypothesis test with a significant level of 0.05. If the power of the test is 0.80, what can be concluded about the probability of correctly rejecting the null hypothesis?

R. There is a 0.80 probability of correctly rejecting the null hypothesis.

The power of a statistical test is the probability of correctly rejecting the null hypothesis when it is indeed false. In other words, it measures the ability of the test to detect a true effect or difference.

Given that the power of the test is 0.80, it means that the probability of correctly rejecting the null hypothesis when it is false is 0.80 or 80%.

In practical terms, this suggests that if there truly is a difference between the means being tested (i.e., the null hypothesis is false), there is an 80% chance that the statistical test will correctly identify this difference and reject the null hypothesis at the chosen significance level of 0.05.

Therefore, a power of 0.80 indicates a relatively high likelihood of detecting a true effect if it exists, given the chosen significance level.

2. A pharmaceutical company is conducting a clinical trial to test the efficacy of a new drug for lowering blood pressure. Previous studies suggest that the standard deviation of blood pressure reduction with the existing drug is 12 mmHg. The company wants to detect a mean reduction of at least 6 mmHg with the new drug, with a power of 0.85 at a significance level of 0.05. How many patients should be enrolled in the trial?
R. 29

In [None]:
from scipy.stats import norm

# Given data
alpha = 0.05  # significance level
power = 0.85  # desired power
sigma = 12    # standard deviation of blood pressure reduction with the existing drug
mu1 = 0       # mean reduction of the existing drug (assuming it's 0)
mu2 = 6       # mean reduction expected with the new drug

# Calculate critical values
Z_alpha_over_2 = norm.ppf(1 - alpha/2)
Z_beta = norm.ppf(power)

# Calculate sample size per group
n = ((Z_alpha_over_2 + Z_beta)**2 * (sigma**2)) / (mu1 - mu2)**2

# Round up to the nearest whole number
sample_size = round(n)
print("Number of patients needed per group:", sample_size)

3. A study compares the blood pressure levels of individuals before and after a dietary intervention. Which statistical test is most appropriate for analyzing this paired data?

R. Paired t-test

For analyzing paired data, where each individual is measured before and after an intervention, the most appropriate statistical test is the paired t-test.

The paired t-test is used to determine whether there is a significant difference between the means of two related groups. In the context of your question, it assesses whether there is a significant difference between the blood pressure levels before and after the dietary intervention within the same individuals.

The paired t-test is preferred in this scenario because it takes into account the paired nature of the data, which reduces the variability and increases the sensitivity of the statistical test compared to an independent t-test. This test assumes that the differences between paired observations are normally distributed.

4. The mean salary of 30 employees in Company X is $52,000  with a standard deviation of $3,000, while the mean salary of 35 employees in Company Y is $54,500 with a standard deviation of $2,500, If the significance level is 0.05, conduct a two-sample test?

In [None]:
import numpy as np
from scipy.stats import t

# Given data for Company X
mean_X = 52000
std_X = 3000
n_X = 30

# Given data for Company Y
mean_Y = 54500
std_Y = 2500
n_Y = 35

# Significance level
alpha = 0.05

# Calculate the test statistic
t_statistic = (mean_X - mean_Y) / np.sqrt((std_X**2 / n_X) + (std_Y**2 / n_Y))

# Calculate degrees of freedom
df = ((std_X**2 / n_X + std_Y**2 / n_Y)**2) / ((std_X**2 / n_X)**2 / (n_X - 1) + (std_Y**2 / n_Y)**2 / (n_Y - 1))

# Calculate the critical value
critical_value = t.ppf(1 - alpha/2, df)

# Calculate the p-value
p_value = 2 * (1 - t.cdf(np.abs(t_statistic), df))

print("Test statistic:", t_statistic)
print("Critical value:", critical_value)
print("P-value:", p_value)

# Make a decision
if np.abs(t_statistic) > critical_value:
    print("Reject the null hypothesis. There is a significant difference in the mean salaries between Company X and Company Y.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in the mean salaries between Company X and Company Y.")


5. A researcher claims that the average IQ of a population is 110. A sample of 60 individuals has an average IQ of 115 with a standard deviation of 12. What is the p-value for testing the researchers' claim and conducting hypothesis t-test?

In [None]:
from scipy.stats import t

# Given data
sample_mean = 115
population_mean = 110
sample_std = 12
sample_size = 60

# Calculate the test statistic
t_statistic = (sample_mean - population_mean) / (sample_std / (sample_size ** 0.5))

# Calculate degrees of freedom
df = sample_size - 1

# Calculate the p-value (two-tailed test)
p_value = 2 * (1 - t.cdf(abs(t_statistic), df))

print("Test statistic:", t_statistic)
print("Degrees of freedom:", df)
print("P-value:", p_value)

# Make a decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is sufficient evidence to suggest that the average IQ of the population is not 110.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence to suggest that the average IQ of the population is not 110.")
