## Question 1

Estimation statistics involves using sample data to make inferences or estimates about population parameters. These estimates help researchers and analysts draw conclusions about a population based on limited information from a sample. There are two main types of estimates: point estimates and interval estimates.


Point estimate:

A point estimate is a single numerical value that is used to approximate a population parameter. It is derived from sample data and serves as the best guess or single value that is assumed to be close to the true, unknown population parameter. For example, if you want to estimate the average height of a population based on a sample, the sample mean would be the point estimate for the population mean.


Interval Estimate:

An interval estimate, on the other hand, provides a range of values within which the true population parameter is likely to lie. It recognizes the uncertainty inherent in making estimates based on a sample and provides a measure of the precision or confidence associated with the estimate. The most common form of interval estimate is the confidence interval, which specifies a range of values and an associated level of confidence that the true parameter falls within that range. For example, a 95% confidence interval for the average height might be from 65 inches to 75 inches.

## Question 2

In [1]:
import math

def estimate_population_mean(sample_mean,sample_std_dev,sample_size):
    standard_error=sample_std_dev/math.sqrt(sample_size)
    margin_error=1.96*standard_error # 1.96 corresponds to 95% CI (Z-score val)
    population_mean_estimate=sample_mean+margin_error
    return population_mean_estimate

sample_mean=70
sample_std_dev=5
sample_size=100

estimated_population_mean=estimate_population_mean(sample_mean,sample_std_dev,sample_size)

print("estimated population mean : ",estimated_population_mean)
    

estimated population mean :  70.98


## Question 3

Hypothesis testing is a statistical method used to make inferences about a population based on a sample of data. The process involves formulating a hypothesis about a population parameter, collecting data, and then using statistical techniques to determine whether the observed data provides enough evidence to reject or fail to reject the null hypothesis.

Null Hypothesis (H0): This is a statement that there is no significant difference, effect, or relationship. It often represents the status quo or a default assumption.

Alternative Hypothesis (H1): This is a statement that contradicts the null hypothesis, suggesting a significant difference, effect, or relationship.

Hypothesis testing allows researchers and analysts to make informed decisions about population parameters based on sample data. It provides a systematic approach to evaluating evidence and drawing conclusions. In fields such as manufacturing, hypothesis testing is used to ensure product quality. By testing samples, manufacturers can make inferences about the quality of the entire production batch.

## Question 4

Let's formulate a hypothesis test to whether the average weight of male college students is greater than the averae weight of feemale coleege students.

Null Hypothesis (HO) :

The average height of male college students is equal to or less than the average weight of female college students.

Alternate Hypothesis (H1) : 

The average weight of male college students is greater than the average weight of female college students.

## Question 5

In [2]:
import numpy as np
from scipy.stats import stats

def Hypothesis__test(sample_1,sample_2,alpha=0.05):
    t_stats,p_val=stats.ttest_ind(sample_1,sample_2,equal_var=False)
    
    reject_null=p_val<alpha
    
    result={
        't-statistics':t_stats,
        'p_value':p_val,
        'alpha':alpha,
        'reject null':reject_null
        
    }
    return result

np.random.seed(42)
sample_1=np.random.normal(loc=150,scale=10,size=30)
sample_2=np.random.normal(loc=140,scale=8,size=30)
result=Hypothesis__test(sample_1,sample_2)

print("two sample T-test result")
print(f"T-statistics: {result['t-statistics']}")
print(f"P-value: {result['p_value']}")
print(f"significance level(alpha): {result['alpha']}")
print(f"Reject Null hypothesis: {result['reject null']}")

two sample T-test result
T-statistics: 4.2606586660578865
P-value: 7.871665679251151e-05
significance level(alpha): 0.05
Reject Null hypothesis: True


  t_stats,p_val=stats.ttest_ind(sample_1,sample_2,equal_var=False)


## Question 6

Null Hypothesis (H0): This is a statement that there is no significant difference, effect, or relationship. It often represents the status quo or a default assumption.

Alternative Hypothesis (H1): This is a statement that contradicts the null hypothesis, suggesting a significant difference, effect, or relationship.

For example:

Null Hypothesis (HO) :

The average height of male college students is equal to or less than the average weight of female college students.

Alternate Hypothesis (H1) :

The average weight of male college students is greater than the average weight of female college students.

## Question 7

The steps involved in hypothesis testing are :

1. Stating the Null Hypothesis (H0) :

This is a statement that there is no significant difference, effect, or relationship. It often represents the status quo or a default assumption.

2. Stating the Alternate hypothesis (H1) :

This is a statement that contradicts the null hypothesis, suggesting a significant difference, effect, or relationship.

3. Significance level (alpha) :

This is the threshold for determining statistical significance. Common choices include 0.05 or 0.01, representing a 5% or 1% chance, respectively, of rejecting the null hypothesis when it is true.

4. Test statistics and P-value calculation :

 A test statistic is calculated from the sample data, and a p-value is associated with it. The p-value represents the probability of obtaining results as extreme as the observed results if the null hypothesis is true.
 
5. Decision Rule :
 
 Based on the p-value and the chosen significance level, a decision is made to either reject or fail to reject the null hypothesis.

6. Drawing conclusions :

The results of the hypothesis test lead to a conclusion about the validity of the null hypothesis and, indirectly, the support for the alternative hypothesis.

## Question 8

In hypothesis testing, the p-value is a measure that helps assess the evidence against a null hypothesis. It quantifies the probability of observing a test statistic as extreme as, or more extreme than, the one computed from the sample data, assuming that the null hypothesis is true.

The p-value is a probability that ranges between 0 and 1. A low p-value (typically below a chosen significance level, such as 0.05) indicates that the observed data is unlikely under the assumption of the null hypothesis. A high p-value suggests that the observed data is consistent with the null hypothesis. The p-value is not the probability of the null hypothesis being true or false. It is the probability of observing the data or more extreme data, assuming the null hypothesis is true.

## Question 9

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

deg_of_freedom=10

x=np.linspace(-5,5,1000)
pdf_vals=t.pdf(x,df=deg_of_freedom)

plt.plot(x,pdf_vals,label="Students t-distribution dof=10")
plt.xlabel('x')
plt.ylabel('Probability density')
plt.legend()
plt.grid(True)
plt.show()

## Question 10

In [None]:
import numpy as np
from scipy.stats import stats

def Hypothesis_testing(sample_1,sample_2,alpha=0.05):
    
    t_statistics,P_value=stats.ttest_ind(sample_1,sample_2,equal_var=False)
    reject_null=P_value<alpha
    
    results={
        'T_statistics':t_statistics,
        'alpha':alpha,
        'P_value':P_value,
        'reject null':reject_null
    }
    return results
np.random.seed(42)
sample1=np.random.normal(loc=10,scale=2,size=40)
sample2=np.random.normal(loc=10,scale=2,size=40)
ttest_res=Hypothesis_testing(sample1,sample2)

print("Two-Sample T-test Results")
print(f"T-statistic result: {ttest_res['T_statistics']}")
print(f"P-value: {ttest_res['P_value']}")
print(f"Significance Value: {ttest_res['alpha']}")
print(f"Reject Null Hypothesis: {ttest_res['reject null']}")


## Question 11

Student's t-distribution, often simply referred to as the t-distribution, is a probability distribution that arises in the context of estimating the population mean of a normally distributed population when the sample size is small and the population standard deviation is unknown. It is named after William Sealy Gosset, who published under the pseudonym "Student."

The t-distribution is similar in shape to the normal distribution but has heavier tails. As the sample size increases, the t-distribution approaches the normal distribution. The key parameter that defines the shape of the t-distribution is the degrees of freedom (df).


The t-distribution is commonly used in the following scenarios:

1. When dealing with small sample sizes (typically less than 30) and the population standard deviation is unknown. It is used in situations where the sample size is too small for the normal distribution to be a good approximation

2. In applications involving confidence intervals and hypothesis testing for the population mean.

3. When comparing the means of two independent groups, especially when the sample sizes are small and the assumption of equal variances may not be valid.

4. The t-distribution is less sensitive to outliers compared to the normal distribution, making it more robust in certain situations.

## Question 12

The t-statistic is a measure that quantifies how far a sample estimate (such as the sample mean) is from the hypothesized population parameter, expressed in terms of the standard error of the estimate. It is commonly used in hypothesis testing and confidence interval construction when dealing with small sample sizes and unknown population standard deviations.

The formula for the t-statistic is given by:

t= (X - mu)/(s/sqrt(n))

Where:

1. t is the statistics value
2. X: is the sample mean
3. mu is the population mean under null hypothesis
4. s is the sample standard deviation
5. n is the sample size


## Question 13

The sample mean(X) revenue given to us is $500

The sample standard deviation(s) value is $50

given sample size(n)=50

desired confidence level=95%

The formula for calculation of confidence interval is :

confidence interval=(X- z* s/sqrt(n) , X+ z* s/sqrt(n))

for 95% confidence interval the equivalent z-value is 1.96 Therefore the confidence interval is :

(500 - 1.96 * 50/sqrt(50) , 500 + 1.96 * 50/sqrt(50))

This results to the range (486.14 , 513.859)

In [None]:
import math 

sample_size=50
std_sample=50

alpha=0.05
sample_mean=500


def Confidence_Interval(sample_mean,sample_std,sample_size,alpha):
    return ( sample_mean-1.96*sample_std/math.sqrt(sample_size), sample_mean+1.96*sample_std/math.sqrt(sample_size) )


Confidence_range=Confidence_Interval(sample_mean,std_sample,sample_size,alpha)

print(Confidence_range)

## Question 14

H0:  The null hypothesis is that the mean decrease in blood pressure is equal to 10 mmHg.

H1:  The alternative hypothesis is that the mean decrease in blood pressure is not equal to 10 mmHg.

It's a two tail test as the blood pressure could be greater or less than 10 mm hg.

As the value of the population standard deviation is missing we make use of the t-test.

Given :

Sample Mean(X)=8

hypothetical mean(mu) =10

sample standard deviation (s)= 3

sample size(n)=100

degree of freedom = n-1 =99


The formula for t-test is given as:

t= (X - mu)/(s/sqrt(n))

therefore , t= 8-10/(3/sqrt(100))
t= -6.67

We look for the t-stats value in the table for finding out the critical value corresponding to the confidence interval and degree of freedom : The value is (+-)1.984 (approximate). 
 Our calculated value is far from the critical value therefore we reject the null hypothesis and conclude that there is enough evidence to suggest that the new drug has a significant effect on decreasing blood pressure, as the mean decrease is not equal to the hypothesized value of 10 mmHg..

In [None]:
from scipy.stats import t 
import numpy as np

sample_mean=8
hypo_mean=10
sample_std=3
sample_size=100
significance_level=0.05
t_stats=(sample_mean-hypo_mean)/(sample_std/np.sqrt(sample_size))

dof=sample_size-1

critical_value=t.ppf(1-significance_level/2,df=dof)

if np.abs(t_stats)>critical_value:
    print("reject the null hypothesis")
else:
    print("Failed to reject the null hypothesis")

## Question 15

Population mean (mu): 5 pounds

Sample mean (X): 4.8 pounds

Population standard deviation (std): 0.5 pounds

Sample size (n): 25

Significance level (alpha): 0.01

The null hypothesis is that the true mean weight is equal to 5 pounds.

The alternative hypothesis is that the true mean weight is less than 5 pounds.

Z= X-mu /(std/sqrt(n))

z=4.8-5/(0.5/sqrt(25))

z=-2

The z-criticalvalue corresponding to 0.01 significance level is -2.3 (aproximate) Therefore as it is a one tail test and z-stats value(-2) is less than the critical value (-2.3) we fail to reject the null hypothesis and conclude that the evidence is not sufficient to suggest the true mean weight is less than 5 pounds.

In [None]:
import numpy as np
from scipy.stats import norm

population_mean = 5  
sample_mean = 4.8  
population_std_dev = 0.5
sample_size = 25 
significance_level = 0.01 


z_statistic = (sample_mean - population_mean)/(population_std_dev / np.sqrt(sample_size))
critical_value = norm.ppf(significance_level)


print(f"z-statistic: {z_statistic}")
print(f"Critical value: {critical_value}")

# Test the hypothesis
if z_statistic < critical_value:
    print("Reject the null hypothesis. ")
else:
    print("Fail to reject the null hypothesis.")


## Question 16

To test the hypothesis that the population means for the two groups are equal, we can use a two-sample t-test for independent samples. The null hypothesis (H0) is that the population means are equal (u1 = u2), and the alternative hypothesis (H1) is that the population means are not equal (u1!=u2).

X1=80 and X2=75 are the sample means of two groups. s1=10 and s2=8 are the sample standard deviations for the two groups and n1=30 and n2=40 are the sample sizes.

In [None]:
import numpy as np
from scipy.stats import t

# 1st group
mean_1 = 80
std_dev_1 = 10
sample_size_1 = 30

# 2nd group
mean_2 = 75
std_dev_2 = 8
sample_size_2 = 40

significance_level = 0.01

t_statistic = (mean_1 - mean_2) / np.sqrt((std_dev_1**2 / sample_size_1) + (std_dev_2**2 / sample_size_2))

df = ((std_dev_1**2 / sample_size_1) + (std_dev_2**2 / sample_size_2))**2 / \
     (((std_dev_1**2 / sample_size_1)**2 / (sample_size_1 - 1)) + ((std_dev_2**2 / sample_size_2)**2 / (sample_size_2 - 1))) / \
     (sample_size_1 + sample_size_2 - 2)

critical_value = t.ppf(1 - significance_level / 2, df=df)

print(f"t-statistic: {t_statistic}")
print(f"Critical value: {critical_value}")

if np.abs(t_statistic) > critical_value:
    print("Reject the null hypothesis. There is enough evidence to suggest a significant difference in means.")
else:
    print("Fail to reject the null hypothesis. The evidence is not sufficient to suggest a significant difference in means.")


## Question 17

The sample mean(X) revenue given to us is 4

The sample standard deviation(s) value is 1.5

given sample size(n)=50

desired confidence level=99%

The formula for calculation of confidence interval is :

confidence interval=(X- z* s/sqrt(n) , X+ z* s/sqrt(n))

for 95% confidence interval the equivalent z-value is 2.576 Therefore the confidence interval is :

(4 - 2.576 * 50/sqrt(1.5) , 4 + 2.576 * 50/sqrt(1.5))

This results to the range (3.453, 4.546)

In [None]:
import math 

sample_size=50
std_sample=1.5

alpha=0.01
sample_mean=4


def Confidence_Interval(sample_mean,sample_std,sample_size,alpha):
    return ( sample_mean-2.576*sample_std/math.sqrt(sample_size), sample_mean+2.576*sample_std/math.sqrt(sample_size) )


Confidence_range=Confidence_Interval(sample_mean,std_sample,sample_size,alpha)

print(Confidence_range)