# Statistics Advance-4 Assignment

# Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

# Answer-1-
# T-Test:The t-test is used to determine if there is a significant difference between the means of two groups. It's often used when dealing with small sample sizes or when the population standard deviation is unknown.

In [1]:
import scipy.stats as stats
import numpy as np

In [2]:
group1 = np.array([82, 87, 88, 76, 93])
group2 = np.array([75, 78, 80, 85, 77])

In [3]:
t_statistic, p_value = stats.ttest_ind(group1, group2)

In [4]:
print("T-statistic:", t_statistic)
print("P-value:", p_value)

T-statistic: 1.8493049574428708
P-value: 0.1015849415220146


# Z-Test:
# The z-test is used to determine if there is a significant difference between a sample mean and a known population mean when the population standard deviation is known, or when the sample size is large (typically n > 30).

# A t-test and a z-test are statistical hypothesis tests used to determine if there is a significant difference between the means of two populations or to assess a sample mean in relation to a population mean. The main difference between them lies in the circumstances under which they are appropriate and the assumptions they make about the data.

In [5]:
sample_mean = 105 
population_mean = 100  
population_std = 15  
sample_size = 50  

In [6]:
z_statistic = (sample_mean - population_mean) / (population_std / (sample_size ** 0.5))
p_value = stats.norm.cdf(z_statistic)  

print("Z-statistic:", z_statistic)
print("P-value:", p_value)

Z-statistic: 2.3570226039551585
P-value: 0.9907889372729505


# Q2: Differentiate between one-tailed and two-tailed tests.

# Answer-2-One-tailed and two-tailed tests are terms used in hypothesis testing to define the regions of interest in a distribution and determine the direction of the hypothesis being tested.

# One-Tailed Test:In a one-tailed test, the critical region is defined on only one side of the distribution, either the left or the right, depending on the direction specified in the research hypothesis.
# It is used when there's a specific expectation or interest in only one direction of the effect (increase or decrease).
# Commonly used when researchers are interested in whether a parameter is significantly greater than or less than a particular value.
# The null hypothesis (H0) represents no effect or no difference.
# The alternative hypothesis (H1) specifies the direction of the effect.
# Examples:Testing if a new drug increases average survival time (right-tailed test). Checking if a change in a website's design decreases bounce rates (left-tailed test).
# Two-Tailed Test:In a two-tailed test, the critical region is divided into both tails of the distribution, allowing for assessment of the effect in both directions (increase or decrease).
# It is used when there's an interest in whether the effect is significant in either direction.
# Commonly used when researchers want to determine if a parameter is significantly different from a particular value, but the direction of the difference is not specified.
# The null hypothesis (H0) represents no effect or no difference.
# The alternative hypothesis (H1) suggests that there is a difference, without specifying the direction.
# Examples:Testing if a new drug has any effect on average survival time (two-tailed test).
# Checking if a change in a website's design has any effect on bounce rates (two-tailed test).

# Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.

# Answer-3-In hypothesis testing, Type I and Type II errors are potential errors that can occur when accepting or rejecting a null hypothesis based on the sample data.

# Type I Error:A Type I error occurs when you reject a true null hypothesis. In other words, you conclude that there is a significant effect or difference when, in fact, there is no such effect or difference.
# The significance level (often denoted as α) controls the probability of making a Type I error. Common values for α are 0.05 or 0.01, representing a 5% or 1% chance of a Type I error, respectively.
# Example Scenario:Suppose a pharmaceutical company is testing a new drug to determine if it reduces blood pressure. The null hypothesis (H0) is that the drug has no effect on blood pressure. If, based on the study, the researchers incorrectly conclude that the drug does reduce blood pressure (rejecting H0), when in reality it doesn't, that would be a Type I error.

# Type II Error:A Type II error occurs when you fail to reject a false null hypothesis. In this case, there is a true effect or difference in the population, but your test fails to detect it, leading you to accept the null hypothesis incorrectly.
# The probability of a Type II error is denoted as β. Power of the test (1 - β) is a measure of the test's ability to detect a true effect when it exists.
# Example Scenario:Continuing with the drug example, suppose the new drug does, in fact, reduce blood pressure. However, the study fails to detect this effect, and the researchers incorrectly conclude that the drug has no effect (fail to reject H0), which would be a Type II error.

# Q4: Explain Bayes's theorem with an example.

# Answer-4-Bayes' Theorem is a fundamental principle in probability theory and statistics that describes the probability of an event based on prior knowledge of related events. It's often used to update beliefs or probabilities as new evidence becomes available. The theorem is named after Thomas Bayes, an 18th-century statistician and theologian.

In [7]:
def bayes_theorem(prior_prob, likelihood, marginal_likelihood):
    posterior_prob = (likelihood * prior_prob) / marginal_likelihood
    return posterior_prob

In [8]:
prior_prob_disease = 0.01 
likelihood_positive_given_disease = 0.95 
likelihood_positive_given_no_disease = 0.05  

# Calculate marginal likelihood: P(B)
marginal_likelihood = (likelihood_positive_given_disease * prior_prob_disease) + \
                      (likelihood_positive_given_no_disease * (1 - prior_prob_disease))

In [9]:
posterior_prob_disease_given_positive_test = bayes_theorem(prior_prob_disease, likelihood_positive_given_disease, marginal_likelihood)

print("Probability of having the disease given a positive test result:", posterior_prob_disease_given_positive_test)

Probability of having the disease given a positive test result: 0.16101694915254236


# Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

# Answer-5-A confidence interval (CI) is a range of values derived from sample data to estimate an unknown population parameter. It provides a level of confidence (expressed as a percentage) that the true parameter lies within that range. The higher the confidence level, the wider the interval, as it needs to encompass a greater range of possible values

In [10]:
import math

In [11]:
sample_mean = 170 
sample_std = 5  
sample_size = 50 
confidence_level = 0.95  

In [12]:
standard_error = sample_std / math.sqrt(sample_size)
critical_z = stats.norm.ppf((1 + confidence_level) / 2)
margin_of_error = critical_z * standard_error

In [13]:
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("Confidence Interval: ({:.2f}, {:.2f})".format(lower_bound, upper_bound))

Confidence Interval: (168.61, 171.39)


# Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.

# Answer-6-Suppose you have a deck of cards, and you draw a card at random. The deck has 52 cards, and you know that there are 4 aces in the deck (Event A: drawing an ace). You also know that 8 cards are face cards (Event B: drawing a face card). You draw a card, and it's a face card. What is the probability that it's an ace?

In [14]:
P_A = 4/52  
P_B_given_A = 3/4  
P_B = 8/52  

In [15]:
P_A_given_B = (P_B_given_A * P_A) / P_B

print("Probability of drawing an ace given a face card: {:.2f}".format(P_A_given_B))

Probability of drawing an ace given a face card: 0.38


# Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.

# Answer-7-

In [16]:
sample_mean = 50  
population_std = 5  
confidence_level = 0.95 
critical_z = 1.96 

In [17]:
margin_of_error = critical_z * (population_std / math.sqrt(1))

In [18]:
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("95% Confidence Interval: ({:.2f}, {:.2f})".format(lower_bound, upper_bound))

95% Confidence Interval: (40.20, 59.80)


# Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

# Answer-8-The margin of error (MOE) in a confidence interval is the amount by which we expect the sample estimate (e.g., sample mean) to vary from the true population parameter. It quantifies the uncertainty or variability associated with our sample estimate and is a critical component of a confidence interval.

In [19]:
standard_deviation = 10000  
confidence_level = 0.95  
sample_size_scenario1 = 100
sample_size_scenario2 = 500
critical_z = 1.96

In [20]:
margin_of_error_scenario1 = critical_z * (standard_deviation / math.sqrt(sample_size_scenario1))
margin_of_error_scenario2 = critical_z * (standard_deviation / math.sqrt(sample_size_scenario2))

print("Margin of Error for Scenario 1: ${:.2f}".format(margin_of_error_scenario1))
print("Margin of Error for Scenario 2: ${:.2f}".format(margin_of_error_scenario2))

Margin of Error for Scenario 1: $1960.00
Margin of Error for Scenario 2: $876.54


# Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.

# Answer-9-

In [22]:
data_point = 75
population_mean = 70
population_std = 5
z_score = (data_point - population_mean) / population_std

In [23]:
z_score

1.0

# Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

# Answer-10-

In [25]:
sample_mean = 6  
population_mean_null = 0  
sample_std = 2.5 
sample_size = 50  
confidence_level = 0.95  
t_statistic = (sample_mean - population_mean_null) / (sample_std / math.sqrt(sample_size))
critical_t_value = stats.t.ppf(1 - (1 - confidence_level) / 2, sample_size - 1)

In [26]:
if abs(t_statistic) > critical_t_value:
    print("Reject the null hypothesis. The drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis. The drug is not significantly effective.")

print("T-statistic:", t_statistic)
print("Critical t-value:", critical_t_value)

Reject the null hypothesis. The drug is significantly effective.
T-statistic: 16.970562748477143
Critical t-value: 2.009575234489209


# Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.

In [27]:
sample_proportion = 0.65  
sample_size = 500  
confidence_level = 0.95  
z_score = 1.96  
margin_of_error = z_score * math.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

lower_bound = sample_proportion - margin_of_error
upper_bound = sample_proportion + margin_of_error

In [28]:
print("95% Confidence Interval: ({:.4f}, {:.4f})".format(lower_bound, upper_bound))

95% Confidence Interval: (0.6082, 0.6918)


# Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

# Answer-12-

In [29]:
mean_A = 85  
std_A = 6  
sample_size_A = 30  

In [30]:
mean_B = 82  
std_B = 5 
sample_size_B = 25  
alpha = 0.01
t_statistic = (mean_A - mean_B) / math.sqrt((std_A**2 / sample_size_A) + (std_B**2 / sample_size_B))
degrees_of_freedom = sample_size_A + sample_size_B - 2
critical_t_value = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)

In [31]:
if abs(t_statistic) > critical_t_value:
    print("Reject the null hypothesis. There is a significant difference in student performance between the teaching methods.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in student performance between the teaching methods.")

print("T-statistic:", t_statistic)
print("Critical t-value:", critical_t_value)

Fail to reject the null hypothesis. There is no significant difference in student performance between the teaching methods.
T-statistic: 2.0225995873897262
Critical t-value: 2.6718226362410027


# Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.

# Answer-13-

In [32]:
sample_mean = 65  
population_mean = 60  
population_std = 8  
sample_size = 50  
confidence_level = 0.90  
z_score = 1.645  

In [33]:
margin_of_error = z_score * (population_std / math.sqrt(sample_size))
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

In [34]:
print("90% Confidence Interval: ({:.2f}, {:.2f})".format(lower_bound, upper_bound))

90% Confidence Interval: (63.14, 66.86)


# Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

# Answer-14-

In [35]:
sample_mean = 0.25  
population_mean_null = 0.25  
sample_std = 0.05  
sample_size = 30  
confidence_level = 0.90  
t_statistic = (sample_mean - population_mean_null) / (sample_std / math.sqrt(sample_size))
degrees_of_freedom = sample_size - 1
critical_t_value = stats.t.ppf(1 - (1 - confidence_level) / 2, degrees_of_freedom)

In [36]:
if abs(t_statistic) > critical_t_value:
    print("Reject the null hypothesis. Caffeine has a significant effect on reaction time.")
else:
    print("Fail to reject the null hypothesis. Caffeine does not have a significant effect on reaction time.")

print("T-statistic:", t_statistic)
print("Critical t-value:", critical_t_value)

Fail to reject the null hypothesis. Caffeine does not have a significant effect on reaction time.
T-statistic: 0.0
Critical t-value: 1.6991270265334972


# Completed Assignment