In [None]:
# Q1
"""
Key Differences:
Feature	                                     T - Test	                                   Z -Test
Sample                                   Size	Small (n < 30)	                          Large (n ≥ 30)
Population Standard Deviation	     Unknown (estimated from sample)	                       Known
Distribution                    	Uses t-distribution (heavier tails)       	    Uses normal distribution
Example Scenarios:

t-Test Example:
A researcher wants to compare the average test scores of two small groups of students (n = 15 in each group) to determine if a new teaching method is effective.
Since the sample size is small and the population standard deviation is unknown, a t-test is appropriate.

z-Test Example:
A manufacturer knows the population standard deviation of product weights and wants to test whether a new machine is producing items with the same mean weight.
If the sample size is large (n = 50), and the population standard deviation is known, a z-test would be suitable. """



In [None]:
# Q2

""" Differentiation Between One-Tailed and Two-Tailed Tests :
In statistical hypothesis testing, the choice between one-tailed and two-tailed tests is crucial as it influences the interpretation of results, the power of the test,
and the overall conclusions drawn from data analysis. Below is a comprehensive differentiation between these two types of tests.

One-Tailed Tests :
Definition :
A one-tailed test is a statistical test that evaluates whether a parameter (such as a mean) is either greater than or less than a specified value, but not both. This means
that the critical region for rejecting the null hypothesis is located entirely in one tail of the distribution.

Characteristics
Directional Hypothesis: The alternative hypothesis specifies a direction (e.g., greater than or less than). For example:
Right-tailed test: H1:μ>μ0
Left-tailed test: H1:μ<μ0
Critical Region: The rejection area lies in only one tail of the distribution.
Power: A one-tailed test generally has more power to detect an effect in one specified direction because all of the alpha level (significance level) is allocated to that tail.
Sample Size: Typically requires a smaller sample size compared to two-tailed tests for achieving similar power levels.
Applications :
One-tailed tests are often used when researchers have a strong theoretical basis or prior evidence suggesting that an effect can only occur in one direction. Examples include:

Testing if a new drug improves recovery rates compared to an existing treatment.
Evaluating whether a marketing campaign increases sales beyond a certain threshold.
Two-Tailed Tests
Definition :
A two-tailed test assesses whether a parameter differs from a specified value in either direction. This means that it tests for the possibility of an effect occurring in both directions—greater than or less than.

Characteristics :
Non-Directional Hypothesis: The alternative hypothesis does not specify a direction. For example:H1:μ≠μ0
Critical Region: The rejection areas are split between both tails of the distribution, with half of alpha allocated to each tail.
Power: Two-tailed tests generally require larger sample sizes to achieve similar power levels compared to one-tailed tests because they must account for effects in both directions.
Conservativeness: They are considered more conservative since they allow for detection of effects regardless of their direction.
Applications
Two-tailed tests are commonly used when researchers do not have strong prior expectations about the direction of an effect or when it is important to detect effects in both directions. Examples include:

Comparing two different medical treatments where either could be superior.
Testing changes in website conversion rates without prior knowledge about which change might be better. """


In [None]:
# Q3

""" Understanding Type 1 and Type 2 Errors in Hypothesis Testing
Hypothesis testing is a fundamental aspect of statistical inference, allowing researchers to make decisions about populations based on sample data. Within this framework,
 two critical concepts arise: Type 1 errors and Type 2 errors. These errors are essential for understanding the reliability and validity of statistical conclusions.

Hypothesis Testing Framework
In hypothesis testing, researchers begin by formulating two competing hypotheses:

Null Hypothesis (H₀): This hypothesis posits that there is no effect or no difference; it serves as a default position that indicates no change or relationship.
Alternative Hypothesis (H₁ or Hₐ): This hypothesis suggests that there is an effect or a difference; it represents what the researcher aims to prove.
The goal of hypothesis testing is to determine whether there is sufficient evidence in the sample data to reject the null hypothesis in favor of the alternative hypothesis.

Type 1 Error (False Positive)
A Type 1 error occurs when the null hypothesis is rejected when it is actually true. This error represents a false positive result, leading researchers to conclude that there is
an effect or difference when none exists. The probability of making a Type 1 error is denoted by alpha (α), which is typically set at levels such as 0.05 or 0.01. This means that
there is a 5% or 1% chance, respectively, of incorrectly rejecting the null hypothesis.

Example Scenario for Type 1 Error
Consider a clinical trial testing a new medication intended to lower blood pressure. The null hypothesis (H₀) states that the medication has no effect on blood pressure compared to
a placebo. After conducting the trial and analyzing the results, researchers find statistically significant evidence suggesting that the medication does lower blood pressure (p < α).
Consequently, they reject H₀ and conclude that the medication works.

However, suppose in reality, the medication has no actual effect on blood pressure; thus, they have committed a Type 1 error by falsely concluding its efficacy.

Type 2 Error (False Negative)
Conversely, a Type 2 error occurs when the null hypothesis is not rejected when it is actually false. This error represents a false negative result, leading researchers to conclude
that there is no effect or difference when one truly exists. The probability of making a Type 2 error is denoted by beta (β). The power of a test, which reflects its ability to
detect an effect if one exists, can be calculated as (1 - β).

Example Scenario for Type 2 Error
Continuing with our clinical trial example, let’s say that after analyzing data from another similar study involving different participants but using the same medication and
placebo setup, researchers fail to find statistically significant evidence against H₀ (p > α). They conclude that there is insufficient evidence to suggest that the medication
lowers blood pressure and thus do not reject H₀.

However, if in reality, the medication does indeed lower blood pressure but their study lacked sufficient power due to small sample size or variability in response rates among
participants, they have committed a Type 2 error by failing to recognize its effectiveness.

Implications of Errors in Research
Both types of errors carry significant implications for research findings:

Type 1 Errors can lead to unnecessary treatments being approved or implemented based on false claims of effectiveness.
Type 2 Errors can prevent beneficial treatments from being recognized and utilized effectively within medical practice or other fields.
Researchers must carefully consider their significance levels and ensure adequate sample sizes to minimize these errors while balancing risks associated with both types."""



In [None]:
# Q4

"""Bayes’s Theorem : Bayes’s theorem is a fundamental concept in probability theory and statistics that describes how to update the probability of a hypothesis based on new
evidence. Named after the Reverend Thomas Bayes, this theorem provides a mathematical framework for reasoning about uncertainty and making decisions in the presence of incomplete
information. It is widely used across various fields, including statistics, finance, medicine, and machine learning."""

In [1]:
def bayes_theorem(P_D, P_T_given_D, P_T_given_not_D):
    P_not_D = 1 - P_D
    P_T = (P_T_given_D * P_D) + (P_T_given_not_D * P_not_D)

    P_D_given_T = (P_T_given_D * P_D) / P_T
    return P_D_given_T

P_D = 0.001
P_T_given_D = 0.9
P_T_given_not_D = 0.05

result = bayes_theorem(P_D, P_T_given_D, P_T_given_not_D)
print(f"Probability of actually having the disease given a positive test: {result:.4f}")


Probability of actually having the disease given a positive test: 0.0177


In [None]:
# Q5

""" A confidence interval (CI) is a statistical tool used to estimate the range within which a population parameter, such as a mean or proportion, is likely to fall, based on
sample data. It provides an interval estimate rather than a point estimate, thereby reflecting the uncertainty inherent in sampling. The concept of confidence intervals is
fundamental in inferential statistics, allowing researchers to make probabilistic statements about population parameters based on sample statistics."""

In [2]:
import scipy.stats as stats
import numpy as np

def confidence_interval(mean, std_dev, n, confidence=0.95):
    z_score = stats.norm.ppf(1 - (1 - confidence) / 2)
    margin_error = z_score * (std_dev / np.sqrt(n))
    return (mean - margin_error, mean + margin_error)

mean = 170
std_dev = 10
n = 100
confidence = 0.95

ci_lower, ci_upper = confidence_interval(mean, std_dev, n, confidence)
print(f"95% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")


95% Confidence Interval: (168.04, 171.96)


In [3]:
# Q6

def bayes_theorem(P_A, P_B_given_A, P_B_given_not_A):
    P_not_A = 1 - P_A
    P_B = (P_B_given_A * P_A) + (P_B_given_not_A * P_not_A)

    P_A_given_B = (P_B_given_A * P_A) / P_B
    return P_A_given_B

P_Spam = 0.2
P_Offer_given_Spam = 0.7
P_Offer_given_Not_Spam = 0.1

P_Spam_given_Offer = bayes_theorem(P_Spam, P_Offer_given_Spam, P_Offer_given_Not_Spam)
print(f"Probability that an email is spam given it contains 'offer': {P_Spam_given_Offer:.4f}")


Probability that an email is spam given it contains 'offer': 0.6364


In [4]:
# Q7

import scipy.stats as stats
import numpy as np

def confidence_interval(mean, std_dev, n, confidence=0.95):
    z_score = stats.norm.ppf(1 - (1 - confidence) / 2)
    margin_error = z_score * (std_dev / np.sqrt(n))
    return (mean - margin_error, mean + margin_error)

mean = 50
std_dev = 5
n = 30

ci_lower, ci_upper = confidence_interval(mean, std_dev, n)
print(f"95% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")


95% Confidence Interval: (48.21, 51.79)


In [5]:
# Q8

import numpy as np
import scipy.stats as stats

def margin_of_error(std_dev, n, confidence=0.95):
    z_score = stats.norm.ppf(1 - (1 - confidence) / 2)
    return z_score * (std_dev / np.sqrt(n))

std_dev = 1.5
confidence = 0.95

sample_sizes = [25, 50, 100, 200]
for n in sample_sizes:
    me = margin_of_error(std_dev, n, confidence)
    print(f"Sample size: {n}, Margin of Error: {me:.3f}")



Sample size: 25, Margin of Error: 0.588
Sample size: 50, Margin of Error: 0.416
Sample size: 100, Margin of Error: 0.294
Sample size: 200, Margin of Error: 0.208


In [6]:
# Q9

def calculate_z_score(X, mean, std_dev):
    return (X - mean) / std_dev

X = 75
mean = 70
std_dev = 5

z_score = calculate_z_score(X, mean, std_dev)
print(f"Z-score: {z_score:.2f}")


Z-score: 1.00


In [7]:
# Q10

import scipy.stats as stats
import numpy as np

sample_mean = 6
pop_mean = 0
std_dev = 2.5
n = 50

t_stat = (sample_mean - pop_mean) / (std_dev / np.sqrt(n))

alpha = 0.05
df = n - 1
t_critical = stats.t.ppf(1 - alpha, df)

print(f"t-Statistic: {t_stat:.2f}")
print(f"Critical t-Value: {t_critical:.3f}")

if t_stat > t_critical:
    print("Reject the null hypothesis: The drug is significantly effective.")
else:
    print("Fail to reject the null hypothesis: No significant effect detected.")


t-Statistic: 16.97
Critical t-Value: 1.677
Reject the null hypothesis: The drug is significantly effective.


In [8]:
# Q11

import scipy.stats as stats
import numpy as np

n = 500
p_hat = 0.65
z_score = stats.norm.ppf(0.975)

margin_error = z_score * np.sqrt((p_hat * (1 - p_hat)) / n)

lower_bound = p_hat - margin_error
upper_bound = p_hat + margin_error

# Print results
print(f"95% Confidence Interval: ({lower_bound:.4f}, {upper_bound:.4f})")
print(f"95% Confidence Interval as Percentage: ({lower_bound*100:.2f}%, {upper_bound*100:.2f}%)")


95% Confidence Interval: (0.6082, 0.6918)
95% Confidence Interval as Percentage: (60.82%, 69.18%)


In [9]:
# Q12

import scipy.stats as stats
import numpy as np

mean_A = 85
std_A = 6
n_A = 30

mean_B = 82
std_B = 5
n_B = 30

t_stat, p_value = stats.ttest_ind_from_stats(mean_A, std_A, n_A, mean_B, std_B, n_B, equal_var=False)

alpha = 0.01
df = n_A + n_B - 2
t_critical = stats.t.ppf(1 - alpha/2, df)

print(f"t-Statistic: {t_stat:.2f}")
print(f"Critical t-Value: {t_critical:.3f}")
print(f"p-Value: {p_value:.4f}")

if abs(t_stat) > t_critical:
    print("Reject the null hypothesis: Significant difference in teaching methods.")
else:
    print("Fail to reject the null hypothesis: No significant difference detected.")


t-Statistic: 2.10
Critical t-Value: 2.663
p-Value: 0.0399
Fail to reject the null hypothesis: No significant difference detected.


In [10]:
# Q13

import scipy.stats as stats
import numpy as np

sample_mean = 65
pop_std = 8
n = 50
z_score = stats.norm.ppf(0.95)

margin_error = z_score * (pop_std / np.sqrt(n))

lower_bound = sample_mean - margin_error
upper_bound = sample_mean + margin_error

print(f"90% Confidence Interval: ({lower_bound:.2f}, {upper_bound:.2f})")


90% Confidence Interval: (63.14, 66.86)


In [11]:
# Q14

import scipy.stats as stats
import numpy as np

sample_mean = 0.25
pop_mean = 0.30
std_dev = 0.05
n = 30

t_stat = (sample_mean - pop_mean) / (std_dev / np.sqrt(n))

alpha = 0.10
df = n - 1
t_critical = stats.t.ppf(alpha, df)

print(f"t-Statistic: {t_stat:.2f}")
print(f"Critical t-Value: {t_critical:.3f}")

if t_stat < t_critical:
    print("Reject the null hypothesis: Caffeine significantly reduces reaction time.")
else:
    print("Fail to reject the null hypothesis: No significant effect detected.")


t-Statistic: -5.48
Critical t-Value: -1.311
Reject the null hypothesis: Caffeine significantly reduces reaction time.
