QTS.1

Certainly! The primary difference between a t-test and a z-test lies in the 
scenarios they are best suited for and the assumptions they make about the population.

**T-test:**  
- **Scenario:** It's used when the sample size is small (typically less than 30) or 
when the population standard deviation is unknown.
- **Example Scenario:** You might use a t-test to compare the mean blood pressure of 
two groups of 20 individuals each who are undergoing different treatments.

**Z-test:**  
- **Scenario:** It's employed when the sample size is large (usually greater than 30) 
and when the population standard deviation is known.
- **Example Scenario:** If you're analyzing the mean heights of a population of 500 
people and have access to the standard deviation of the entire population, a z-test 
would be appropriate for comparing the average height before and after a dietary intervention.

In summary, the choice between a t-test and a z-test hinges on the sample size and whether 
the population standard deviation is known. T-tests are more flexible for smaller samples 
and when the population standard deviation is unknown, while z-tests are suitable for larger
samples with a known population standard deviation.

QTS.2

In a one-tailed test, significance is tested in one direction (either greater than or 
less than), while in a two-tailed test, significance is assessed in both directions 
(greater than or less than). One-tailed tests are more sensitive to detecting effects in a 
specific direction, while two-tailed tests are more general, detecting effects in either direction.

QTS.3

Type 1 error occurs when you reject a true null hypothesis. An example could be 
in a criminal trial: an innocent person (null hypothesis) is wrongly convicted 
(rejected) as guilty.

Type 2 error happens when you fail to reject a false null hypothesis. For instance,
in medical testing, it occurs when a sick person (null hypothesis) is mistakenly 
considered healthy (not rejected) based on a test result.

QTS.4

Bayes's theorem is a fundamental concept in probability theory that 
describes how to update the probability of an event based on new 
information or evidence. It's expressed as P(A|B) = [P(B|A) * P(A)] / P(B),
where P(A|B) is the probability of event A given that event B has occurred.

An example: Let's say you want to determine the probability that someone 
has a certain disease. The overall occurrence of the disease in the 
population (prior probability) is 5%. You have a test for the disease that
is 90% accurate in detecting it and 10% false positive rate. If an individual
tests positive, Bayes's theorem helps revise the probability of actually 
having the disease based on the test result and prior probability.

QTS.5

A confidence interval is a range of values that likely contains the true 
value of a population parameter. It provides a measure of the uncertainty 
associated with estimating population parameters from sample data. 

To calculate a confidence interval:
1. **Gather a sample**: Collect data from a population.
2. **Calculate sample statistics**: Find the sample mean and standard deviation.
3. **Choose a confidence level**: Typically 90%, 95%, or 99%.
4. **Determine the margin of error**: Based on the chosen confidence level and sample data.
5. **Compute the interval**: Construct the range around the sample mean using the margin of error.

For example, suppose you want to estimate the average height of students in
a school. You measure the heights of 100 randomly selected students and find the 
sample mean height to be 65 inches with a standard deviation of 3 inches. With 
a 95% confidence level, the confidence interval could be calculated to be, say,
(64.5, 65.5) inches. This means that you're 95% confident that the true average
height of all students in the school falls within this range.

QTS.6

Certainly! Bayes' Theorem allows us to update the probability of an event 
occurring based on new evidence or information. Here's an example problem
and a solution using Python:

**Problem:**
Suppose that a certain medical test is 95% accurate in detecting a disease 
when it's actually present, and the disease occurs in 1% of the population.
If a person tests positive for the disease, what is the probability that 
they actually have the disease?

**Solution using Bayes' Theorem:**

In [3]:
# Prior probability of having the disease
prior_probability_disease = 0.01

# Probability of a positive test given that the person has the disease
probability_positive_given_disease = 0.95

# Probability of a positive test given that the person does not have the disease
# This is the false positive rate (1 - specificity)
probability_positive_given_no_disease = 0.05

# Probability of not having the disease
prior_probability_no_disease = 1 - prior_probability_disease

# Using Bayes' Theorem to calculate the probability of having the disease given a positive test
probability_disease_given_positive = (prior_probability_disease * probability_positive_given_disease) / ((prior_probability_disease * probability_positive_given_disease) + (prior_probability_no_disease * probability_positive_given_no_disease))

print(f"Probability of having the disease given a positive test: {probability_disease_given_positive}")



Probability of having the disease given a positive test: 0.16101694915254236


QTS.7

In [4]:
sample_mean = 50  # Sample mean
sample_std = 5  # Sample standard deviation
confidence = 0.95  # Confidence level

# For a normal distribution, use the z-score (1.96 for 95% confidence)
z_score = 1.96  # 95% confidence level

# Calculate the margin of error
margin_of_error = z_score * (sample_std / (sample_mean ** 0.5))

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"Confidence Interval for the sample data: ({lower_bound}, {upper_bound})")


Confidence Interval for the sample data: (48.614070708874365, 51.385929291125635)


QTS.8


The margin of error in a confidence interval represents the range within 
which the true population parameter is likely to fall. It's influenced by
sample variability and the chosen confidence level. A smaller margin of 
error indicates a more precise estimate of the population parameter.

Sample size inversely affects the margin of error: a larger sample size often
results in a smaller margin of error. This is because as the sample size increases
, the variability in the sample mean decreases, leading to a more accurate 
estimation of the population parameter.

For instance, consider estimating the average time spent on a website by users
. If you survey 50 users and find a sample mean time of 4 minutes with a margin 
of error of ±0.5 minutes, increasing the sample size to 500 might yield a smaller
margin of error, say ±0.2 minutes. The larger sample provides a more precise estimate,
narrowing the range in which the true average time spent on the website likely falls.

QTS.9

In [8]:
## calculation of a z-score
datapt=75
pop_mean=70
pop_std=5

z_score=(datapt-pop_mean)/pop_std
print(f"z_score:{z_score}")

z_score:1.0


QTS.10

In [9]:
import numpy as np
from scipy import stats

size=50
s_mean=6
s_std=2.5
ci=0.95
alpha=0.05
pop_mean=0

## hypothesis
##null- drug is not effective at 95% confidence interval,pop_mean=0
##alternate- drug is effective at 95% confidence interval,pop_mean not equals to zero

##calculation
t_score=(s_mean-0)/(s_std/(size**0.5))

## critical t value for a two tailed test
cri_t=stats.t.ppf(alpha/2,df=size-1)

if abs(t_score)>cri_t:
    print("rejected the null hypothesis")
else:
    print("fail to reject the null hypothesis")


rejected the null hypothesis


QTS.11

In [15]:
import math

# Given data
sample_proportion = 0.65  # 65% as a decimal
confidence = 0.95  # 95% confidence level
sample_size = 500
z_critical = 1.96  # Z-score for a 95% confidence interval

# Calculate the margin of error
margin_of_error = z_critical * math.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

# Calculate the confidence interval
lower_bound = sample_proportion - margin_of_error
upper_bound = sample_proportion + margin_of_error

# Convert bounds to percentages
lower_bound_percent = lower_bound * 100
upper_bound_percent = upper_bound * 100

print(f"95% Confidence Interval: ({lower_bound_percent}%, {upper_bound_percent}%)")


95% Confidence Interval: (60.8191771144905%, 69.18082288550951%)


QTS.12

In [18]:
import scipy.stats as stats

# Sample A data
mean_A = 85
std_dev_A = 6
sample_size_A = 30

# Sample B data
mean_B = 82
std_dev_B = 5
sample_size_B = 30

# Performing the t-test
t_stat, p_value = stats.ttest_ind_from_stats(mean_A, std_dev_A, sample_size_A, mean_B, std_dev_B, sample_size_B)

# Significance level
alpha = 0.01

# Check for statistical significance
if p_value < alpha:
    print("The difference in performance between the teaching methods is statistically significant.")
else:
    print("There's no statistically significant difference in performance between the teaching methods.")


There's no statistically significant difference in performance between the teaching methods.


QTS.13

In [30]:
import numpy as np
from scipy import stats

pop_mean=60
pop_std=8
samp_mean=65
samp_size=50
ci=0.90

z_critical=stats.norm.cdf((1+ci)/2)

moe=z_critical*(pop_std/(samp_size**0.5))

low_ci=samp_mean-moe
high_ci=samp_mean+moe

print(f"at the 90% confidence interval population lies between low_ci:{low_ci}, high_ci:{high_ci}")


at the 90% confidence interval population lies between low_ci:64.06215706510349, high_ci:65.93784293489651


QTS.14

In [1]:
from scipy import stats

# Given data
sample_mean = 0.25  # Sample mean reaction time
sample_std = 0.05   # Standard deviation
sample_size = 30     # Sample size
hypothetical_mean = 0  # Hypothetical population mean under the null hypothesis

# Calculate the t-statistic
t_statistic = (sample_mean - hypothetical_mean) / (sample_std / (sample_size ** 0.5))

# Degrees of freedom
degrees_of_freedom = sample_size - 1

# Calculate the critical t-value at a 90% confidence level
alpha = 0.1  # 1 - Confidence level
critical_t = stats.t.ppf(1 - alpha / 2, degrees_of_freedom)

# Calculate the p-value
p_value = stats.t.sf(abs(t_statistic), degrees_of_freedom) * 2  # Two-tailed test

# Output results
print(f"Calculated t-statistic: {t_statistic}")
print(f"Critical t-value: {critical_t}")
print(f"P-value: {p_value}")

# Decision
if abs(t_statistic) > critical_t:
    print("Null hypothesis rejected. Caffeine has a significant effect on reaction time.")
else:
    print("Failed to reject the null hypothesis. Caffeine may not have a significant effect on reaction time.")


Calculated t-statistic: 27.386127875258307
Critical t-value: 1.6991270265334972
P-value: 2.8325244885113353e-22
Null hypothesis rejected. Caffeine has a significant effect on reaction time.
