<a id="lib"></a>
# 1. Import Libraries

**Let us import the required libraries.**

In [1]:
import scipy.stats as stats
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

<a id="defn"></a>
# 3. Test of Hypothesis

It is the process of evaluating the validity of the claim made using the sample data obtained from the population. A statistical test is a rule used to decide the acceptance or rejection of the claim.

**Examples of hypothesis:**

        1. One can get 'A' grade if the attendance in the class is more than 75%.
        2. A probiotic drink can improve the immunity of a person. 

<a id="types"></a>
## 3.1 Types of Hypothesis

`Null Hypothesis`: The null hypothesis is the claim suggesting 'no difference'. It is denoted as H<sub>0</sub>.

`Alternative Hypothesis`: It is the hypothesis that is tested against the null hypothesis. The acceptance or rejection of the hypothesis is based on the likelihood of H<sub>0</sub> being true. It is denoted by H<sub>a</sub> or H<sub>1</sub>.



<a id="test_type"></a>
# 4. Types of Test

The hypothesis test is used to validate the claim given by the null hypothesis. The types of tests are based on the nature of the alternative hypothesis. 

<a id="2tailed"></a>
## 4.1 Two Tailed Test

Two tailed test considers the value of the population parameter is less than or greater than (i.e. not equal) a specific value. <br>
If we test the population mean ($\mu$) with a specific value ($\mu_{0}$) the null hypothesis is: $H_{0}: \mu = \mu_{0}$. 

The alternative hypothesis for the two tailed test is given as: $H_{1}: \mu \neq \mu_{0}$

#### Example:

A company that produces tennis balls claimed that the diameter of a tennis ball is 2.625 inches on average. To test the company's claim, a statistical test can be performed considering the hypothesis:

                    

<a id="1tailed"></a>
## 4.2 One Tailed Test

One tailed test considers the value of the population parameter is less than or greater than (but not both) a specific value. <br>
If we test the population mean ($\mu$) with a specific value ($\mu_{0}$) the null hypothesis is: $H_{0}: \mu \leq \mu_{0}$ and the alternative hypothesis is $H_{1}: \mu > \mu_{0}$, the one tailed test is also known as a `right-tailed test`.

If we test the population mean ($\mu$) with a specific value ($\mu_{0}$) the null hypothesis is: $H_{0}: \mu \geq \mu_{0}$ and the alternative hypothesis is $H_{1}: \mu < \mu_{0}$, the one tailed test is also known as a `left-tailed test`.


### Example:

**1.** The company's annual quality report of machines states that a lathe machine works efficiently at most for 8 months on average after the servicing. The production manager claims that after the special tuxan servicing, the machine works efficiently for more than 8 months. To test the claim of production manager consider the hypothesis:

                    Null Hypothesis: Machine efficiency ≤ 8 months
                    Alternative Hypothesis: Machine efficiency > 8 months

This is the example of a **right-tailed test**. 

**2.** A railway authority claims that all the trains on the Chicago-Seattle route run with a speed of at least 54 mph on average. A customer forum declares that there are various records from passengers claiming that the speed of the train is less than what railway has claimed. In this scenario, a statistical test can be performed to test the claim of customer forum considering the hypothesis:

                    Null Hypothesis: Speed ≥ 54 mph
                    Alternative Hypothesis: Speed < 54 mph

This is the example of a **left-tailed test**. 

<a id="eg"></a>
# 5. Hypothesis Tests with Z Statistic

Let us perform one sample Z test for the population mean. We compare the population mean with a specific value. The sample is assumed to be taken from a population following a normal distribution.

To check the normality of the data, a test for normality is used. The `Shapiro-Wilk Test` is one of the methods used to check the normality. The hypothesis of the test is given as:
<p style='text-indent:25em'> <strong> H<sub>0</sub>:  The data is normally distributed </strong> </p>
<p style='text-indent:25em'> <strong> H<sub>1</sub>:  The data is not normally distributed </strong> </p>

The `shapiro()` from scipy library performs a Shapiro-Wilk normality test. 

The null and alternative hypothesis of Z-test is given as:
<p style='text-indent:25em'> <strong> $H_{0}: \mu = \mu_{0}$ or $\mu \geq \mu_{0}$ or $\mu \leq \mu_{0}$</strong></p>
<p style='text-indent:25em'> <strong> $H_{1}: \mu \neq \mu_{0}$ or $\mu < \mu_{0}$ or $\mu > \mu_{0}$</strong></p>

Consider a normal population with standard deviation $\sigma$. Let us take a sample of size n, 
The test statistic for one sample Z-test is given as:
<p style='text-indent:25em'> <strong> $Z = \frac{\overline{X} -  \mu}{\frac{\sigma}{\sqrt(n)}}$</strong></p>

Where, <br>
$\overline{X}$: Sample mean<br>
$\mu$: Specified mean<br>
$\sigma$: Population standard deviation<br>
$n$: Sample size





### Example:

#### 1. A car manufacturing company claims that the mileage of their new car is 25 kmph with a standard deviation of 2.5 kmph. A random sample of 45 cars was drawn and recorded their mileage as per the standard procedure. From the sample, the mean mileage was seen to be 24 kmph. Is this evidence to claim that the mean mileage is 25kmph? (assume the normality of the data) Use α = 0.01.

In [None]:
# Ho : mu_mil = 25
# Ha : mu_mil != 25

In [None]:
# one sample z test(two tailed)

In [2]:
x_bar = 24
n= 45
sigma = 2.5
mu = 25
z = (x_bar-mu)/(sigma/n**0.5)
print(z)

-2.6832815729997477


In [6]:
pval = stats.norm.sf(abs(z))*2
pval

0.007290358091535638

In [7]:
sig_lvl=0.01
if pval<sig_lvl:
    print('Ha is selected')
else:
    print('Ho is selected')

Ha is selected


In [None]:
# The average mileage is not 25kmph

## Practice:

#### 2. The average calories in a slice bread of the brand 'Alphas' are 82 with a standard deviation of 15. An experiment is conducted to test the claim of the dietitians that the calories in a slice of bread are not as per the manufacturer's specification. A sample of 40 slices of bread is taken and the mean calories recorded are 95. Test the claim of dietitians with ⍺ value (significance level) as 0.05. (assume the normality of the data).

In [8]:
# Ho : mu_cal = 82
# Ha : mu_cal != 82

In [9]:
# one sample z test(two tailed)

In [10]:
x_bar = 95
n= 40
sigma = 15
mu = 82
z = (x_bar-mu)/(sigma/n**0.5)
print(z)

5.4812812776251905


In [11]:
pval = stats.norm.sf(abs(z))*2
pval

4.222565249683579e-08

In [12]:
sig_lvl=0.01
if pval<sig_lvl:
    print('Ha is selected')
else:
    print('Ho is selected')

Ha is selected


In [None]:
# The average calories is not 82grams.

#### 3. A typhoid vaccine in the market inscribes 3 mg of ascorbic acid in the vaccine with standard deviation of 1.2mg . A research team claims that the vaccines contain less than 3 mg of acid. We collected the data of 40 vaccines by using random sampling from a population and recorded the amount of ascorbic acid. Test the claim of the research team using the sample data ⍺ value (significance level) to 0.05.Assume Data is normal

    acid_amt = [2.57, 3.06, 3.28 , 3.24, 2.79, 3.40, 3.36, 3.07, 2.46, 3.03, 3.05, 2.94, 3.46, 3.19, 3.09, 2.81, 3.13, 2.88, 
                2.76, 2.75, 3.17, 2.89, 2.54, 3.18, 3.08, 2.60, 3.06, 3.13, 3.11, 3.08, 2.93, 2.90, 3.06, 2.97, 3.24, 2.86, 
                2.87, 3.18, 3, 2.95]

In [None]:
# Ho : mu_acid >= 3
# Ha : mu_acid < 3

In [None]:
# one sample z test(one tailed -left tailed)

In [14]:
acid_amt = [2.57, 3.06, 3.28 , 3.24, 2.79, 3.40, 3.36, 3.07, 2.46, 3.03, 3.05, 2.94, 3.46, 3.19, 3.09, 2.81, 3.13, 2.88, 
            2.76, 2.75, 3.17, 2.89, 2.54, 3.18, 3.08, 2.60, 3.06, 3.13, 3.11, 3.08, 2.93, 2.90, 3.06, 2.97, 3.24, 2.86, 
            2.87, 3.18, 3, 2.95]
x_bar = np.mean(acid_amt)
n= len(acid_amt)
sigma = 1.2
mu = 3
z = (x_bar-mu)/(sigma/n**0.5)
print(z)

0.015811388300842496


In [15]:
pval = stats.norm.cdf(z)
pval

0.5063075684886019

In [16]:
sig_lvl=0.05
if pval<sig_lvl:
    print('Ha is selected')
else:
    print('Ho is selected')

Ho is selected


In [None]:
# The average acid content is greater than or equal t0 3.

## Practice

#### 4. A sample of 900 PVC pipes is found to have an average thickness of 12.5 mm. The sample is coming from a normal population. Is there any evidence that pvc pipe thickeness is less than 13 mm. The population standard deviation is 1 mm. Test the hypothesis at 5% level of significance.

In [None]:
# Ho : mu_thickness >= 13
# Ha : mu_thickness < 13

In [None]:
# one sample z test(one tailed -left tailed)

In [17]:
x_bar = 12.5
n= 900
sigma = 1
mu = 13
z = (x_bar-mu)/(sigma/n**0.5)
print(z)

-15.0


In [18]:
pval = stats.norm.cdf(z)
pval

3.6709661993126986e-51

In [19]:
sig_lvl=0.05
if pval<sig_lvl:
    print('Ha is selected')
else:
    print('Ho is selected')

Ha is selected


In [None]:
# The average thickness is less than 13.

#### 5. A company claims that average weight of a ball is greater than 120grams. A sample of 75 balls are taken and their average weight is 123gm. Assume data is from normal population with population standard deviation of 3 grams. Perform the hypothesis testing with 90% confidence level.

In [None]:
# Ho : mu_weight <= 120
# Ha : mu_weight > 120

In [None]:
# one sample z test(one tailed -right tailed)

In [20]:
x_bar = 123
n= 75
sigma = 3
mu = 120
z = (x_bar-mu)/(sigma/n**0.5)
print(z)

8.660254037844387


In [21]:
pval = stats.norm.sf(z)
pval

2.353570295070176e-18

In [22]:
sig_lvl=0.1
if pval<sig_lvl:
    print('Ha is selected')
else:
    print('Ho is selected')

Ha is selected


In [None]:
# The average weight is greater than 120.



#### 6. A farmer claims that avergae weight of watermelon in greater that 5kg. A sample 55 watermelon are choosen and found that their average weight is 6.2kg. Assume data is from normal population with population standard deviation of 1.5 kg. Perform the hypothesis testing with 95% confidence level.

In [23]:
# Ho : mu_weight <= 5
# Ha : mu_weight > 5

In [24]:
# one sample z test(one tailed -right tailed)

In [25]:
x_bar = 6.2
n= 55
sigma = 1.5
mu = 5
z = (x_bar-mu)/(sigma/n**0.5)
print(z)

5.9329587896765315


In [26]:
pval = stats.norm.sf(z)
pval

1.4876177437412462e-09

In [28]:
sig_lvl=0.05
if pval<sig_lvl:
    print('Ha is selected')
else:
    print('Ho is selected')

Ha is selected


In [None]:
# The average weight is greater than 5.

<a id="2z"></a>
## 2.2 Two Sample Z Test

Let us perform a two sample Z test for the population mean. We compare the means of the two independent populations. The samples are assumed to be taken from populations such that they follow a normal distribution. Also, the sample must have equal variance.

The `Shapiro-Wilk Test` is used to check the normality of the data.



The null and alternative hypothesis of two sample Z-test is given as:

<p style='text-indent:25em'> <strong> $H_{0}: \mu_{1} - \mu_{2} = \mu_{0}$ or $\mu_{1} - \mu_{2} \geq \mu_{0}$ or $\mu_{1} -\mu_{2} \leq \mu_{0}$</strong></p>
<p style='text-indent:25em'> <strong> $H_{1}: \mu_{1} - \mu_{2} \neq \mu_{0} $ or $\mu_{1} - \mu_{2} < \mu_{0}$ or $\mu_{1} -\mu_{2} > \mu_{0}$</strong></p>



The test statistic for two sample Z-test is given as:
<p style='text-indent:25em'> <strong> $Z = \frac{(\overline{X_{1}} - \overline{X_{2}})  - \mu_{0}} {\sqrt{\frac{\sigma_{1}^{2}}{n_{1}} + \frac{\sigma_{2}^{2}}{n_{2}}}}$</strong></p>

Where, <br>
$\overline{X_{1}}$, $\overline{X_{2}}$ : Mean of both the samples<br>
$\mu_{0}$: Mean difference given in the null hypothesis<br>
$\sigma_{1}, \sigma_{2}$: Standard deviation of both the populations<br>
$n_{1}, n_{2}$: Size of samples from both the populations




#### 1. A study was carried out to understand amount of haemoglobin in blood for males and females. A random sample of 160 males and 180 females have means of 13 g/dl and 15 g/dl. The two population have standard deviation of 4.1 g/dl for male donors and 3.5 g/dl for female donor . Can it be said the population means of concentrations of the elements are the same for men and women? Use  α = 0.01.Assume data is normally distributed

In [29]:
# Ho : mu_m(mu1)=mu_f(mu2)
# Ha : mu_m(mu1)!=mu_f(mu2)


# Ho : mu1-mu2=0
# Ha : mu1-mu2!=0

In [None]:
# Two sample z test (two tailed)


In [30]:
x1_bar = 13
x2_bar = 15
sigma1 = 4.1
sigma2=3.5
n1 = 160
n2 = 180

num = (x1_bar-x2_bar)-0
den = np.sqrt( (sigma1**2/n1) + (sigma2**2/n2) )
z = num/den
print(z)

-4.806830552525058


In [31]:
pval = stats.norm.sf(abs(z))*2
pval

1.5334185117556497e-06

In [32]:
sig_lvl=0.01
if pval<sig_lvl:
    print('Ha is selected')
else:
    print('Ho is selected')

Ha is selected


In [None]:
# The haemoglobin concentration of male is not equal to haemoglobin concentration of female

## Practice

#### 2.  Avergae sales of 25 items in shop A is Rs.15000  with population standard deviation of Rs.2000. Avergae sales of 20 items in shop B is Rs.14000  with population standard deviation of Rs.1500.Perform hypothesis testing to check wether sales in shop A is greater than shop B with 5% significance level? .Assume data is normal

In [33]:
# Ho : mu_A(mu_1) <= mu_B(mu2)
# Ha : mu_A(mu1) > mu_B(mu2)


# Ho : mu1-mu2 <= 0
# Ha : mu1-mu2 > 0

In [None]:
# Two sample z test (one tailed- right)


In [34]:
x1_bar = 15000
x2_bar = 14000
sigma1 = 2000
sigma2=1500
n1 = 25
n2 = 20

num = (x1_bar-x2_bar)-0
den = np.sqrt( (sigma1**2/n1) + (sigma2**2/n2) )
z = num/den
print(z)

1.9156525704423026


In [35]:
pval = stats.norm.sf(z)
pval

0.02770466665184583

In [36]:
sig_lvl=0.05
if pval<sig_lvl:
    print('Ha is selected')
else:
    print('Ho is selected')

Ha is selected


In [None]:
# The avg sales in shop A is greater than average sale in shop B.