### Types of Statistical Analysis

1. One Sample Analysis
2. Two sample Analysis

### Steps in Hypothesis Testing

**STEP #1**: Setup Null-Hypothesis $H_0$ and Alternate-Hypothesis $H_a$  
**STEP #2**: Choose a Distribution, test-statistic and Significance-level $(\alpha)$  
**STEP #3**: Select Tail type: Left / Right / Two-tailed  
**STEP #4**: Compute $\text{p-value}$  
**STEP #5**: Compare $\text{p-value}$ with $\alpha$, reject $H_0$ if $\text{p-value} < \alpha$ else reject $H_a$

### Application of CLT in Hypothesis Testing

### Critical Value

## Examples

### Marketing Case study

Suppose there is a Retail Store Chain that sells Shampoo bottles:

This chain has **2000 stores** across India.  

The parameters for weekly sales of the shampoo bottle were reported as:
- Mean: 1800
- Standard deviation: 100

This was calculated by analyzing a lot of historical data.  
As a Manager / Owner / Data Scientist, you want to increase these sales, to generate more revenue.

**Q1. What are the techniques at your disposal?**

- Hire a marketing team

But there is an important factor to consider. These marketing teams/firms are not cheap, and would add a significant cost.  
It stands to reason that you would not straightaway hand over all 2000 stores to them.  
You would want an assurance that their work actually does impact the sales, and generate enough revenue that it is feasible to hire them.

**Q2. How would you get that assurance?**

Perhaps you can allot them a few stores, and analyze the sale parameters (Mean and Standard deviation).  
If results are good in a couple of weeks, then hire for all 2000 stores.

You decide to do this experiment with 2 competing marketing firms:

**Firm A**

- Worked on **50 stores**
- Sold an **average** **1850** bottles of shampoo

**Firm B**

- Worked on **5 stores**
- Sold an **average** **1900** bottles of shampoo

**Q1. Which firm gave better results?**

Clearly the sales are more for Firm B, but it seems that the number of stores under them were significantly less than Firm A.  
It is possible that this increase by Firm B is just a chance factor because the standard deviation of the population was 100.

**Q2. How do we quantify this and determine if it is just by chance or if it is actually statistically significant?**

When we talk about statistical significance, the word significance level pops into mind.  
Since this is a big decision that would affect revenue, you want to be very very sure (99% confidence) about your decision, i.e. $Î±=0.01$  
So, we need to employ the framework we saw and conduct hypothesis testing to see which firm's results are more significant.

#### Hypothesis Testing on Firm A

##### STEP #1

$H_0 := mean = 1800$   
$H_a := mean \ge 1800$

##### STEP #2

Distribution: Normal Distribution  
Test Statistic: 1850  
Significance level: 0.01

In [1]:
alpha = 0.01

##### STEP #3

Since we have to find $P(x > 1850)$ we need to perform Right-Tailed test.

##### STEP #4

Compute $p-value$

In [2]:
import numpy as np
from scipy import stats

In [3]:
mu = 1800
sigma = 100
n = 50

In [4]:
# Find: P(x > 1850)
x = 1850

In [5]:
se = sigma / np.sqrt(n)  # Standard error
se.round(4).item()

14.1421

In [6]:
# x = mu + (z * sigma)
# For sample distribution sigma is Standard error.
z = (x - mu) / se
z.round(4).item()

3.5355

In [7]:
p_x_gt_1850 = 1 - stats.norm.cdf(z)
p_value = p_x_gt_1850.round(4).item()
p_value

0.0002

##### STEP #5

Compare $p-value$ with $\alpha$

In [8]:
if p_value < alpha:
    print("Reject Null-Hypothesis i.e., accept Alternate-Hypothesis.")
else:
    print("Failed to reject Null-Hypothesis i.e., reject Alternate-Hypothesis.")

Reject Null-Hypothesis i.e., accept Alternate-Hypothesis.


Probability of finding data given that the null-hypothesis is true is very low hence we accept the alternate-hypothesis.

#### Business Insights

1. The very low probability of 0.0002 indicates that there is no enough evidence backup the null hypothesis being true.
2. There is less than 1% probability i.e., around 0.02% chances that the average sales would have reached 1850 without the effort of Marketing Firm A.
3. There is a 99.98% probability that average sales improved from 1800 to 1850 after hiring Marketing Firm A.

#### Hypothesis Testing on Firm B

##### STEP #1

$H_0 := mean = 1800$   
$H_a := mean \ge 1900$

##### STEP #2

Distribution: Normal Distribution  
Test Statistic: 1850  
Significance level: 0.01

In [9]:
alpha = 0.01

##### STEP #3

Since we have to find $P(x > 1850)$ we need to perform Right-Tailed test.

##### STEP #4

Compute $p-value$

In [10]:
import numpy as np
from scipy import stats

In [11]:
mu = 1800
sigma = 100
n = 5

In [12]:
# Find: P(x > 1900)
x = 1900

In [13]:
se = sigma / np.sqrt(n)  # Standard error
se.round(4).item()

44.7214

In [14]:
# x = mu + (z * sigma)
# For sample distribution sigma is Standard error.
z = (x - mu) / se
z.round(4).item()

2.2361

In [15]:
p_x_gt_1900 = 1 - stats.norm.cdf(z)
p_value = p_x_gt_1900.round(4).item()
p_value

0.0127

##### STEP #5

Compare $p-value$ with $\alpha$

In [16]:
if p_value < alpha:
    print("Reject Null-Hypothesis i.e., accept Alternate-Hypothesis.")
else:
    print("Failed to reject Null-Hypothesis i.e., reject Alternate-Hypothesis.")

Failed to reject Null-Hypothesis i.e., reject Alternate-Hypothesis.


In [19]:
0.0127 * 100, 100 - 0.0127 * 100

(1.27, 98.73)

The is a 1.27% chance of finding data given that the null-hypothesis is true, which is higher than the threshold value of 1%, hence we reject the alternate-hypothesis.

#### Business Insights

1. The probability of 0.0127 indicates that there is enough evidence to backup the null hypothesis being true.
2. There is around 1.27% chances that the average sales would have reached 1900 without the effort of Marketing Firm B.
3. There is a 98.73% chances that average sales improved from 1800 to 1900 after hiring Marketing Firm B.

#### Conclusion

1. Firm B has 98.73% chances of improving the average sales from 1800 to 1900.
2. Firm A has 99.98% chances of improving the average sales from 1800 to 1850.
3. Because the confidence level of 98.73% is slightly below our 99% target for confirming a marketing  
   firm's effectiveness in improving average sales, we will proceed with Firm A rather than Firm B.