# Uses of A/B testing
In the video, you saw how Electronic Arts used A/B testing on their website when launching SimCity 5. One version of the page showed an advertisement for a discount, and one version did not. Half the users saw one version of the page, and the other half saw the second version of the page.

What is the main reason to use an A/B test?

## ANSWER:

* It provides a way to check outcomes of competing scenarios and decide which way to proceed.

````
A/B testing lets you compare scenarios to see which best achieves some goal.
````

# Calculating the sample mean

The late_shipments dataset contains supply chain data on the delivery of medical supplies. Each row represents one delivery of a part. The late columns denotes whether or not the part was delivered late. A value of "Yes" means that the part was delivered late, and a value of "No" means the part was delivered on time.

Let's begin our analysis by calculating a point estimate (or sample statistic), namely the proportion of late shipments.

late_shipments is available, and pandas is loaded as pd.

## Instructions 

* Print the late_shipments dataset.
* Calculate the proportion of late shipments in the sample; that is, the mean cases where the late column is "Yes".

In [None]:
# Print the late_shipments dataset
print(late_shipments)

# Calculate the proportion of late shipments
late_prop_samp = (late_shipments['late'] == "Yes").mean()

# Print the results
print(late_prop_samp)

# Calculating a z-score
Since variables have arbitrary ranges and units, we need to standardize them. For example, a hypothesis test that gave different answers if the variables were in Euros instead of US dollars would be of little value. Standardization avoids that.

One standardized value of interest in a hypothesis test is called a z-score. To calculate it, you need three numbers: the sample statistic (point estimate), the hypothesized statistic, and the standard error of the statistic (estimated from the bootstrap distribution).

The sample statistic is available as late_prop_samp.

late_shipments_boot_distn is a bootstrap distribution of the proportion of late shipments, available as a list.

pandas and numpy are loaded with their usual aliases.

# Instructions

* Hypothesize that the proportion of late shipments is 6%.
* Calculate the standard error from the standard deviation of the bootstrap distribution.
* Calculate the z-score.

In [None]:
# Hypothesize that the proportion is 6%
late_prop_hyp = 0.06

# Calculate the standard error
std_error = np.std(late_shipments_boot_distn, ddof=1)

# Find z-score of late_prop_samp
z_score = (late_prop_samp - late_prop_hyp) / std_error

# Print z_score
print(z_score)

# Criminal trials and hypothesis tests
In the video, you saw how hypothesis testing follows a similar process to criminal trials.

Which of the following correctly matches up a criminal trial with properties of a hypothesis test?

## ANSWER:

* Just as the defendant is initially assumed not guilty, the null hypothesis is first assumed to be true.

# Left tail, right tail, two tails
Hypothesis tests are used to determine whether the sample statistic lies in the tails of the null distribution. However, the way that the alternative hypothesis is phrased affects which tail(s) we are interested in.

## Two Tails
* Is there a difference between the voting preferences of 40 y.o. and 80 y.o.?
* should we spect Slack and Zoom to have dissimilar mean numbers of employees over the last three years?

## Left Tails
* Are grapes lower in popularity than raisins, on average?
* Is there evidence to conclude that Belgians workers tend to have lower salaries than Italian workers?

## Right Tails
* Do hamburgers have more calories than hot dogs, on average?
* Does there tend to be more than 12 fluid ounces for soda per can?
* Do cats tend to live longer than dogs?

````
Top tail choices! The tails of the distribution that are relevant depend on whether the alternative hypothesis refers to "greater than", "less than", or "differences between."
````

# Calculating p-values
In order to determine whether to choose the null hypothesis or the alternative hypothesis, you need to calculate a p-value from the z-score.

You'll now return to the late shipments dataset and the proportion of late shipments.

The null hypothesis, $H_0$, is that the proportion of late shipments is six percent.

The alternative hypothesis, $H_A$, is that the proportion of late shipments is greater than six percent.

The observed sample statistic, late_prop_samp, the hypothesized value, late_prop_hyp (6%), and the bootstrap standard error, std_error are available. norm from scipy.stats has also been loaded without an alias.

## Question

What type of test should be used for this alternative hypothesis?

## Possible Answers

* Two-tailed

* Left-tailed

* Right-tailed

* It doesn't matter; any one will do.

* A hypothesis test isn't appropriate to answer this question.

## ANSWER:

* Right-tailed


````
A two-tailed test is appropriate when the alternative hypothesis talks about differences between the sample statistic and the null statistic.

A left-tailed test is appropriate when the alternative hypothesis talks about the sample statistic being less than the null statistic.
````

In [None]:
# Calculate the z-score of late_prop_samp
z_score = (late_prop_samp - late_prop_hyp) / std_error

# Calculate the p-value
p_value = 1 - norm.cdf(z_score, loc=0, scale=1)
                 
# Print the p-value
print(p_value) 

# Decisions from p-values
The p-value, denoted here as $p$, is a measure of the amount of evidence to reject the null hypothesis or not. By comparing the p-value to the significance level, $\alpha$, you can make a decision about which hypothesis to support.

Which of the following is the correct conclusion from the decision rule for a significance level $\alpha$?

## ANSWER:

* If the $p <= \alpha$, reject $H_0$.

````
If the p-value is less than or equal to the significance level, you reject the null hypothesis.
````

# Calculating a confidence interval
If you give a single estimate of a sample statistic, you are bound to be wrong by some amount. For example, the hypothesized proportion of late shipments was 6%. Even if evidence suggests the null hypothesis that the proportion of late shipments is equal to this, for any new sample of shipments, the proportion is likely to be a little different due to sampling variability. Consequently, it's a good idea to state a confidence interval. That is, you say, "we are 95% 'confident' that the proportion of late shipments is between A and B" (for some value of A and B).

Sampling in Python [demonstrated](https://campus.datacamp.com/courses/sampling-in-python/pull-your-data-up-by-its-bootstraps-4?ex=10) two methods for calculating confidence intervals. Here, you'll use quantiles of the bootstrap distribution to calculate the confidence interval.

late_prop_samp and late_shipments_boot_distn are available; pandas and numpy are loaded with their usual aliases.

## Instructions

* Calculate a 95% confidence interval from late_shipments_boot_distn using the quantile method, labeling the lower and upper intervals lower and upper.

## Question:
* Does the confidence interval match up with the conclusion to stick with the original assumption that 6% is a reasonable value for the unknown population parameter?

In [None]:
# Calculate 95% confidence interval using quantile method
lower = np.quantile(late_shipments_boot_distn, 0.025)
upper = np.quantile(late_shipments_boot_distn, 0.975)

# Print the confidence interval
print((lower, upper))

## ANSWER:

* Yes, since 0.06 is included in the 95% confidence interval and we failed to reject  $H_0$ due to a large p-value, the results are similar.

````
When you have a confidence interval width equal to one minus the significance level, if the hypothesized population parameter is within the confidence interval, you should fail to reject the null hypothesis.
````

# Type I and type II errors

For hypothesis tests and for criminal trials, there are two states of truth and two possible outcomes. Two combinations are correct test outcomes, and there are two ways it can go wrong.

The errors are known as false positives (or "type I errors"), and false negatives (or "type II errors").

## Instructions

* Match the scenarios to the appropriate error type, or to "Not an error" for correct decisions.

## False Positive
* Finding the defendant guilty when in fact the defendant was innocent.
* Reject the null hypotesis when in fact the null hypotesis is true.

## False Negative
* Finding the defendant not guilty when in fact the defendant did commit the crime.
* Failing to reject the null hypotesis when in fact the null hypotesis is false.
*

## Not An Error
* Finding the defendant not guilty when in fact the defendant was innocent.
* Finding the defendant guilty when in fact the defendant did commit the crime.
* Reject the null hypotesis when in fact the null hypotesis is false.
* Failing to reject the null hypotesis when in fact the null hypotesis is true.
