# Statistical Experiments and Significance Testing<a href="#Statistical-Experiments-and-Significance-Testing" class="anchor-link">¶</a>

![](https://affiliatemarketingvietnam.com/wp-content/uploads/2019/08/ab-testing.png)

# A/B Testing<a href="#A/B-Testing" class="anchor-link">¶</a>

-   An **A/B test** is an experiment with **two groups** to establish
    which of two treatments, products, procedures, or the like is
    **superior**. Often one of the two treatments is the standard
    existing treatment, or no treatment. If a standard (or no) treatment
    is used, it is called the **control**. A typical hypothesis is that
    treatment is better than control. <span class="image"></span>
-   A/B tests are common in web design and marketing, since results are
    so readily measured. Some examples of A/B testing include:
    -   Testing two soil treatments to determine which produces better
        seed germination
    -   Testing two therapies to determine which suppresses cancer more
        effectively
    -   Testing two prices to determine which yields more net profit
    -   Testing two web headlines to determine which produces more
        clicks (Figure 3-2)
    -   Testing two web ads to determine which generates more
        conversions
-   You also need to pay attention to the test statistic or metric you
    use to compare group A to group B. Perhaps the most common metric in
    data science is a binary variable: click or no-click, buy or don’t
    buy, fraud or no fraud, and so on. The metric can also be a
    continuous variable (purchase amount, profit, etc.), or a count
    (e.g., days in hospital, pages visited) you are interested not in
    conversion, but in revenue per page view.

# Hypothesis Tests / Significance Tests<a href="#Hypothesis-Tests-/-Significance-Tests" class="anchor-link">¶</a>

-   Purpose: help you learn whether random chance might be responsible
    for an observed effect.
-   Null hypothesis: The hypothesis that chance is to blame.
-   Alternative hypothesis: Counterpoint to the null (what you hope to
    prove).
-   One-way test: Hypothesis test that counts chance results only in one
    direction.
-   Two-way test: Hypothesis test that counts chance results in two
    directions.
   ![](https://statisticsguruonline.com/wp-content/uploads/2019/01/hypothesis-testing-steps.png)

# Hypothesis Tests / Significance Tests<a href="#Hypothesis-Tests-/-Significance-Tests" class="anchor-link">¶</a>

-   In a properly designed A/B test, you collect data on treatments A
    and B in such a way that any observed difference between A and B
    must be due to either:
    -   Random chance in assignment of subjects
    -   A true difference between A and B
-   A statistical hypothesis test is further analysis of an A/B test, or
    any randomized experiment, to assess whether random chance is a
    reasonable explanation for the observed difference between groups A
    and B. <span class="image"></span>

# Hypothesis Tests / Significance Tests<a href="#Hypothesis-Tests-/-Significance-Tests" class="anchor-link">¶</a>

## Null Hypothesis vs Alternative Hypothesis<a href="#Null-Hypothesis-vs-Alternative-Hypothesis" class="anchor-link">¶</a>

-   Hypothesis tests by their nature involve not just a null hypothesis,
    but also an offsetting alternative hypothesis. Here are some
    examples:
    -   Null = “no difference between the means of group A and group
        B,”, alternative = “A is different from B” (could be bigger or
        smaller)
    -   Null = “A ≤ B,”, alternative = “A \> B”
    -   Null = “B is not X% greater than A,”, alternative = “B is X%
        greater than A”
-   Taken together, the null and alternative hypotheses must account for
    all possibilities. The nature of the null hypothesis determines the
    structure of the hypothesis test.

## One-Way, Two-Way Hypothesis Test<a href="#One-Way,-Two-Way-Hypothesis-Test" class="anchor-link">¶</a>

![](https://saylordotorg.github.io/text_introductory-statistics/section_12/ecf5f771ca148089665859c88d8679df.jpg)

![](https://www.statisticshowto.com/wp-content/uploads/2013/08/t-score-vs.-z-score.png)
![](https://slidetodoc.com/presentation_image_h/beab8589e6bd4cba06ca4251cfe4930c/image-57.jpg)

# Statistical Significance and P-Values<a href="#Statistical-Significance-and-P-Values" class="anchor-link">¶</a>

-   **Statistical significance** is how statisticians measure whether an
    experiment (or even a study of existing data) yields a result more
    extreme than what chance might produce. If the result is beyond the
    realm of chance variation, it is said to be **statistically
    significant**.
-   **P-value**: Given a chance model that embodies the null hypothesis,
    the p-value is the probability of obtaining results as unusual or
    extreme as the observed results.
-   **Alpha**: The probability threshold of “unusualness” that chance
    results must surpass, for actual outcomes to be deemed statistically
    significant.
-   **Type 1 error**: Mistakenly concluding an effect is real (when it
    is due to chance).
-   **Type 2 error**: Mistakenly concluding an effect is due to chance
    (when it is real).

![](https://i.stack.imgur.com/idDTA.png)

![](https://www.conversion-uplift.co.uk/wp-content/uploads/2020/06/Z-score-table-to-calculate-proportion-within-1-standard-deviation-of-the-mean.png)

## Exercise 1
A doctor claims that the mean age that children start to use words or talk is **12.5** months with population standard deviation of **.8** months. We want to check if this is true, so we collect about **50** samples with a mean age of about **12.9** months. Assuming a normal distribution, how can we calculate the p-value for the test that the mean age at which children start to utter words is different from 12.5 months. Also, determine the decision if we use a significance level of **1**%.

## Exercise 2
A doctor claims that the mean age that children start to use words or talk is **12.5** months. We want to check if this is true, so we collect about **18** samples with a mean age of about **12.9** months with a standard deviation of **.8** months. Assuming a normal distribution, how can we calculate the p-value for the test that the mean age at which children start to utter words is different from 12.5 months. Also, determine the decision if we use a significance level of **1**%.

## Excercise 3
A battery-producing company claims that its top batteries are good, on average, for at least **65** months. A third-party source decides to verify this and tests about **45** batteries. It was found that the mean life of these 45 batteries was about **63.4** months with a standard deviation of about **three** months. Find the p-value for the test that the mean of all such batteries is less than 65 months. What is your conclusion if the significance level is **2.5**%?

# Chi-square Test

- Chi-square test is a non-parametric (distribution-free) method used to compare the relationship between the two categorical (nominal) variables in a contingency table.

Goodness of fit test

![](https://i.stack.imgur.com/8dd22.gif)

![](https://www.itl.nist.gov/div898/handbook/eda/section3/gif/chspdftb.gif)


- You have been given the following information about the number of takeaway orders in a restaurant and asked to determine whether the number of orders received at the takeaway restaurant on each of the five days of the week is the same. The restaurant also took a sample of about 400 orders received during a four-week period and came up with the following observations:

![Chi square test sample.PNG](attachment:a830cdbb-d613-4b5f-9bfa-84acf0f6e742.PNG)

- With a significance level of 5%, can you test whether the null hypothesis that the orders are equally distributed for all weekdays?

- H0: P(Monday) = P(Tuesday) = P(Wednesday) = P(Thursday) = P(Friday) = 0.2
- H1: H0 is false




# ANOVA (analysis of variance)

![](https://www.questionpro.com/blog/wp-content/uploads/2016/03/rsz_anova.jpg)

# ANOVA - Permutation Approach<a href="#ANOVA---Permutation-Approach" class="anchor-link">¶</a>

-   Resampling procedure (specified here for the A-B-C-D test of 4
    groups):
    -   Combine all the data together in a single box
    -   Shuffle and draw out four resamples of nA, nB, nC, nD values
    -   Record the mean of each of the four groups
    -   Record the variance among the four group means
    -   Repeat steps 2–4 many times (say 1,000)
-   What proportion of the time did the resampled variance exceed the
    observed variance? This is the p-value.
    ![](https://www.datanovia.com/en/wp-content/uploads/dn-tutorials/r-statistics-2-comparing-groups-means/images/one-way-anova-basics.png)

## One-way ANOVA
![one way ANOVA.PNG](attachment:44366509-fbb2-48d2-85a8-1d6651702ccc.PNG)

- H0: the average number of customers served per hours by each employee are equal
- HA: H0 is false, alpha = 0.05


## Two-way ANOVA

- H0 : The height means of all water groups are equal
- H1: The hegiht mean of at least one water group is different


- H0 : The height means of all sun groups are equal
- H1: The hegiht mean of at least one sun group is different


- H0: There is no interaction between the sun and water 
- H1: There is interaction between the sun and water 

significance level = 0.05

Reference:
Internet and University of Derby