# Statistics for Analytics and Data Science: Hypothesis Testing and Z-Test vs. T-Test

Source: https://www.analyticsvidhya.com/blog/2020/06/statistics-analytics-hypothesis-testing-z-test-t-test/

SUBHASH MEENA, JUNE 18, 2020

## Fundamentals of Hypothesis Testing

Let’s take an example to understand the concept of Hypothesis Testing. A person is on trial for a criminal offense and the judge needs to provide a verdict on his case. Now, there are four possible combinations in such a case:

- First Case: The person is innocent and the judge identifies the person as innocent
- Second Case: The person is innocent and the judge identifies the person as guilty
- Third Case: The person is guilty and the judge identifies the person as innocent
- Fourth Case: The person is guilty and the judge identifies the person as guilty

### Steps to Perform Hypothesis Testing

- Set the hypothesis
- Set the significance level
- Compute test statistic
- Make a decision

If the p-value is less than $\alpha$, we rehect the Null Hypothesis, if otherwise, we fail to reject the null hypothesis.

## What is the Z Test?

We use this when:
- We know the population variance
- We do not know the population variance, but we have n samples greater or equal to 30.

## What is the T Test?

We use this when:
- We do not know the population variance
- Our sample size is less than 30

## Case Study: Hypothesis Testing for Coronavirus using Python

In [6]:
!wget https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/06-28-2020.csv

--2020-06-29 15:59:27--  https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/06-28-2020.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 510946 (499K) [text/plain]
Saving to: ‘06-28-2020.csv’


2020-06-29 15:59:27 (5.49 MB/s) - ‘06-28-2020.csv’ saved [510946/510946]



In [7]:
!mv 06-28-2020.csv corona_virus.csv

In [None]:
import pandas as pd
import numpy as np
corona = pd.read_csv('Corona_Updated.csv')
corona['Temp_Cat'] = corona['Temprature'].apply(lambda x : 0 if x < 24 else 1)
corona_t = corona[['Confirmed', 'Temp_Cat']]

In [None]:
def TwoSampZ(X1, X2, sigma1, sigma2, N1, N2):
    from numpy import sqrt, abs, round
    from scipy.stats import norm
    ovr_sigma = sqrt(sigma1**2/N1 + sigma2**2/N2)
    z = (X1 - X2)/ovr_sigma
    pval = 2*(1 - norm.cdf(abs(z)))
    return z, pval

In [None]:
d1 = corona_t[(corona_t['Temp_Cat']==1)]['Confirmed']
d2 = corona_t[(corona_t['Temp_Cat']==0)]['Confirmed']

m1, m2 = d1.mean(), d2.mean()
sd1, sd2 = d1.std(), d2.std()
n1, n2 = d1.shape[0], d2.shape[0]

z, p = TwoSampZ(m1, m2, sd1, sd2, n1, n2)

z_score = np.round(z,8)
p_val = np.round(p,6)

if (p_val<0.05):
    Hypothesis_Status = 'Reject Null Hypothesis : Significant'
else:
    Hypothesis_Status = 'Do not reject Null Hypothesis : Not Significant'

print (p_val)
print (Hypothesis_Status)