Reference
https://www.geeksforgeeks.org/understanding-hypothesis-testing/?ref=lbp

Hypothesis testing is a statistical method that is used in making a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data.

**Example:** You say an average student in the class is 30 or a boy is taller than a girl. All of these is an assumption that we are assuming and we need some statistical way to prove these. We need some mathematical conclusion whatever we are assuming is true.

**Need for Hypothesis Testing**

Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data.

**Python Implementation of Hypothesis Testing**

We will use the scipy python library to compute the p-value and z-score for our sample dataset. Scipy is a mathematical library in Python that is mostly used for mathematical equations and computations. In this code, we will create a function hypothesis_test in which we will pass arguments like pop_mean(population parameter upon which we are checking our hypothesis), sample dataset, level of confidence(alpha value), and type of testing (whether it’s a one-tailed test or two-tailed test).

In [None]:
import numpy as np
from scipy.stats import norm

# from scipy.stats import norm. A normal continuous random variabl

# We define a function for evaluating hypothesis testing. In this function we
# need 4 arguments as sample, population mean, the level of significance (alpha)
# and the null hypothesis.
# The null hypothesis in this function is that a coin is tricky and have 2 tails
def hypothesis_test(sample, pop_mean,
					alpha=0.05, two_tailed=True):
	# len sample dataset
	n = len(sample)
	# mean and standard-deviation of dataset
	sample_mean = np.mean(sample)
	sample_std = np.std(sample, ddof=1)

	# Calculate the test statistic
	z = (sample_mean - pop_mean) / (sample_std / np.sqrt(n))

	# Calculate the p-value based on the test type
	if two_tailed:
		p_value = 2 * (1 - norm.cdf(abs(z))) # cdf(x, loc=0, scale=1) Cumulative distribution function.
	else:
		if z < 0:
			p_value = norm.cdf(z)
		else:
			p_value = 1 - norm.cdf(z)

	# Determine whether to reject or fail to
	# reject the null hypothesis
	if p_value < alpha:
		result = "reject"
	else:
		result = "fail to reject"

	return z, p_value, result


**Evaluate Hypothesis Function on Sample Dataset **


To evaluate our hypothesis test function we will create a sample dataset of 20 points having 4.5 as the mean and 2 as the standard deviation.  Here, We will consider that our population has a mean equals to 5 .

In [None]:
np.random.seed(0)
sample = np.random.normal(loc=4.5, scale=2, size=20)
pop_mean = 5.0

# Test the null hypothesis that
# the population mean is equal to 5.0
# The other two arguments were defined in the function previously
z, p_value, result = hypothesis_test(sample, pop_mean)

print(f"Test statistic: {z:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Result: {result} null hypothesis at alpha=0.05")


Test statistic: 1.6372
P-value: 0.1016
Result: fail to reject null hypothesis at alpha=0.05


In the above example, we can see that we are getting a p-value of 0.101 from the dataset which is less than our level of confidence(alpha value) which is 0.5 hence in this case we will reject our null hypothesis the population mean is 5.0  

What if we get a p-value greater than our test statistics but we still reject our null hypothesis in this case we will be making an error. Based on the error we make we define error in two types.

**Error in Hypothesis Testing**

Type I error: When we reject the null hypothesis, although that hypothesis was true. Type I error is denoted by alpha.
Type II errors: When we accept the null hypothesis but it is false. Type II errors are denoted by beta.