# Determining Significance

Now that we’ve practiced simulating data for an A/B test, let’s actually run a Chi-Square test for each simulated dataset and consider the decision we would make based on the outcome.

If we were really running this test, we would want to use the data to make a decision about whether to use the control (old) or name (new) email subject. To make that decision, we can use a significance threshold. For example, if we’re using a significance threshold of 0.05, we’ll “reject the null hypothesis” for any p-value less than 0.05. In this context, rejecting the null would mean that we conclude that there is a significant difference between the open rates for the two email subjects and therefore we **should** switch to the email subject that uses the recipient’s first name.

We can use the following Python statement to record whether a particular p-value is significant or not, based on a threshold of 0.05:

```python
result = ('significant' if pval < 0.05 else 'not significant')
print(result)
```



## Instructions

1. The code from the previous exercises is provided for you in script.py. This code generates a simulated dataset named `sim_data` and then runs a Chi-Square test for that data, saving the p-value as `pval`.

    An additional variable named `significance_threshold` has been defined for you in script.py, which is equal to the significance threshold for the test. After the p-value calculation, add a line of code that uses `significance_threshold` to determine whether the p-value is `'significant'` or `'not significant'`. Save the result as `result` and print it out.

    <details>
        <summary>Stuck? Get a hint</summary>
    
    Use the same code as in the narrative but use the `significance_threshold` variable instead of the hard-coded 0.05.
    </details>

2. Press “Run” a few times until you see both a `'significant'` and a `'not significant'` result. Note that it is possible to get different results every time you sample a new group of 100 recipients.


In [None]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency

# pre-set values
significance_threshold = 0.05
sample_size = 100
lift = .3
control_rate = .5
name_rate = (1 + lift) * control_rate

# simulate a dataset
sample_control = np.random.choice(['yes', 'no'], size=int(sample_size/2), p=[control_rate,1-control_rate])
sample_name = np.random.choice(['yes', 'no'], size=int(sample_size/2), p=[name_rate, 1-name_rate])

group = ['control']*int(sample_size/2) + ['name']*int(sample_size/2)
outcome = list(sample_control) + list(sample_name)
sim_data = {"Email": group, "Opened": outcome}
sim_data = pd.DataFrame(sim_data)

# run a chi-square test
ab_contingency = pd.crosstab(np.array(sim_data.Email), np.array(sim_data.Opened))
chi2, pval, dof, expected = chi2_contingency(ab_contingency, correction=False)
print("P Value:")
print(pval)

# determine significance here:
result = None

print("Result:")
print(result)
