Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [2]:
NAME = "Moughel Mohamed Souhail"
COLLABORATORS = ""

---

# Assignment 1: Fisher's and $\chi^2$-test

In [54]:
# setup
import pandas as pd
import numpy as np
import scipy as sp
from scipy.stats import fisher_exact, chi2_contingency,chi2
import math
from scipy.special import gammainc

A web authoring company uses A/B testing to select the best website design. A/B testing involves creating multiple versions (*A* and *B*) of the website in a campaign to test their effectiveness. Marketers split their audience and direct them to different versions to determine which performs better.


**Workflow of A/B testing:**
1. **Hypothesis:** Formulate a hypothesis about changes that might improve the click-through rate.
2. **Variations:** Create two versions of the original site with specific changes.
3. **Traffic Split:** Divide incoming traffic equally between the two website versions.
4. **Testing Duration:** Run the test until statistically significant results are obtained.
5. **Data Analysis:** Analyze the data to determine which version performs better.

When comparing two phenomena (like the efficacy of drugs), we use statistical tests. For comparing, e.g., means of two continuous variables, Student's t-test is used. When we observe counts of occurrences of phenomena, we usually use the $\chi^2$-test. 
However, Fisher's exact test is more suitable when the observed counts are small and in the form of a four-field contingency table. 

A/B testing allows the company to take a scientific approach to marketing.
The company prepared two versions of the website, denoted as *A* and *B*. Users who visit the web page are shown one of the two new versions of the page. We will compare the efficacy of the versions *A* and *B*. We observed the number of clicks within the site.
The following contingency table summarizes the results. In the table, $N$ denote the total number of visitors, $a$ is the number of visitors who saw the version *A* and followed a link
within *A*, $b$ is the number of visitors of version *A* that did not follow 
any link within the website, ...


#### Table 1
|          | Click YES | Click NO | | Row sum |	
|----------|-----------|----------|-|---------|
| Sample A |   $a$     |   $b$    | | $a+b$   |
| Sample B |   $c$     |   $d$    | | $c+d$   |
|__________|___________|__________| |_________|
| Sum      |  $a+c$    |  $b+d$   | | $N$     |

At first, we must formulate the so-called null hypothesis.

> Samples $A$ and $B$ were taken from the same probability distribution, and the differences between them
> were caused by accidents only. In other words, the efficacies of both versions $A$ and $B$ are the same. 

## Task 1 
Assuming that the null hypothesis is true, **derive the formula for computing the probability** that the table with results will have the same values as in *Table 1* for given values $a$, $b$, $c$, and $d$ ($N=a+b+c+d$). 

**Remarks:**

* Stating that the probability corresponds to the hypergeometric probability distributions is insufficient. You should explain the meaning of all binomial coefficients or factorials in the formulas you will use!
* You can write the complete derivation of the formula for the probability of *Table 1* by hand and submit it as a scanned picture (or an image captured by, e.g., a mobile phone) in a separate file.
* You can get at most **3 points** for this part.


We will compute the numerator and the denominator separately.
First, we want to see in how many ways we can construct the table by distributing N elements in such a way we get the values a, b, c, d in the table in the way they are presented in the table.

$$\binom{N}{a,b,c,d} = \binom{N}{a} \binom{N-a}{b} \binom{N-a-b}{c} \binom{N-a-b-c}{d} =
\binom{N}{a} \binom{N-a}{b} \binom{N-a-b}{c} \cdot 1 $$

we subtract the ones used in the previous coefficient 
(We have now $N-a-b-c = d$)

Then
$$\binom{N}{a,b,c,d} = \frac{N!}{a!(N-a)!} \cdot \frac{(N-a)!}{b!(N-a-b)!} \cdot \frac{(N-a-b)!}{c!(N-a-b-c)!} = \frac{N!}{a!b!c!d!}$$

We need to derive now all the possible outcomes.
To do so we are need to compute in how many ways we can arrange the N values in such a way that we get $(a+b),(c+d)$ as row sums and $(a+c),(b+d)$ as column sums. N is over the 2 values for the row sums (and then, column sums).

We will now perform this for the row sums, then it will be the same also for the column sums:
$$\binom{N}{(a+b)(c+d)} = \binom{N-(a+b)}{(c+d)}\binom{N}{(a+b)} = \frac{N!}{(a+b)!(N-(a+b))!} \cdot \frac{(N-a-b)!}{(c+d)!(N-a-b-c-d)!} = \frac{N!}{(a+b)!(c+d)!}$$

Same for the marginal column sums and we get
$$\binom{N}{(a+c),(b+d)} = \frac{N!}{(a+c)!(b+d)!}$$

Now we divide  the number of the possibilities to obtain the table a,b,c,d by the number of possiblities of the marginal row and column sums:

$$ p = \frac{\frac{N!}{a!b!c!d!}}{\frac{N!}{(a+b)!(c+d)!} \cdot \frac{N!}{(a+c)!(b+d)!}} = \frac{(a+b)!(c+d)!(a+c)!(a+d)!}{N!a!b!c!d!} $$


The probability of observing the contingency table as in Table 1 is given by the hypergeometric probability:


\[
\begin{align*}
P(X=a) &= \frac{{\binom{{a+b}}{{a}} \cdot \binom{{c+d}}{{c}}}}{{\binom{{N}}{{a+c}}}}
\end{align*}
\]


Given that:
- \(N\) is the total number of observations,
- \(a\), \(b\), \(c\),\(d\) are the observed frequencies as described in Table 1,
- \(r1 = a + b\) and \(r2 = c + d\) are the marginal frequencies of the rows,
- \(s1 = a + c\) and \(s2 = b + d\) are the marginal frequencies of the columns.

Here's the derivation of the formula:

1. The number of ways to choose \(a\) successes (click YES) out of \(r1\) observations in the first row is given by 
\[
\begin{align*}
\binom{{r1}}{a}
\end{align*}
\]
2. The number of ways to choose \(c\) successes (click YES) out of \(r2\) observations in the second row is given by \[
\begin{align*}
\binom{{r2}}{c}
\end{align*}
\]
3. The number of ways to choose \(a\) successes (click YES) out of \(N\) total observations is given by \[
\begin{align*}
\binom{{N}}{a+c}
\end{align*}
\]
Therefore, the probability \(P(X=a)\) can be expressed as:

\[
\begin{align*}
P(X=a) = \frac{{\binom{{r_1}}{{a}} \cdot \binom{{r_2}}{{c}}}}{{\binom{{N}}{{a+c}}}}
\end{align*}
\]

This is the formula to compute the probability of observing the contingency table as in Table 1 for given values \(a\), \(b\), \(c\), and \(d\), and it is the same than we saw in the lecture

## Task 2
In a campaign, the following counts were observed:

#### *Table 2*
|          | Click YES | Click NO | | Row sum |	
|----------|-----------|----------|-|---------|
| Sample A |   4       |   10     | | 14      |
| Sample B |   7       |   3      | | 10      |
|__________|___________|__________| |_________|
| Sum      |  11       |   13     | | 24      |

Implement function `table_probability(t)` that computes the probability of *Table 1* (`t` is a 2-by-2 numpy array with the values $a$, $b$, $c$ and $d$ in two rows and two columns) when assuming the null hypothesis is valid, using the formula you have derived in **Task 1**. 

In [104]:
def table_probability(table):
    a, b = table[0]
    c, d = table[1]
    n = a + b + c + d
    return (math.factorial(a + b) * math.factorial(c + d) * math.factorial(a + c) * math.factorial(b + d)) / (math.factorial(a) * math.factorial(b) * math.factorial(c) * math.factorial(d) * math.factorial(n))



Using the function, calculate the probability of *Table 2*. Additionally, your implementation should pass all the tests below and some additional hidden tests (**1 point for this part**).

In [10]:
table = np.array([[5,5], [5,5]])
assert np.isclose(table_probability(table), 0.343718)
table_2 = np.array([[4, 10], [7, 3]])
print(f"{table_2} has probability {table_probability(table_2)}")


[[ 4 10]
 [ 7  3]] has probability 0.04812222371786243


## Task 3

The difference between versions *A* and *B* of the website is evident. Is this difference statistically significant? That is, assuming that both samples *A* and *B* are from the same probability distribution, what is the probability that two samples differ to the same or even higher extent? If this probability is small, e.g., at most $\alpha=0.05$, we can state with high confidence $(1−\alpha)=0.95$ that the null hypothesis is not valid. Based on the marginal sums ($a+b$, $c+d$, $a+c$, and $b+d$), we can easily compute that the expected value of the field $a$ is approximately $6.16$. The notion of "differing to the same or even greater extent" can be understood in two ways:

1. one-sided &ndash; only the values of $a$ that are on one side from the expected value; in our case, the values 8 and 9, or
2. two-sided &ndash; all the values of $a$ such that $|a−6.16|\ge 8−6.16$; in our case, the values 0, 1, 2, 3, 4, 8, and 9.
  
In case 1, we use a one-tailed test; in case 2, we use a two-tailed test.

Answer the following question (**1 point**)
|
> **Question:** In general (with sufficient data), which of the four combinations of tests {one-tiled, two-tailed}×
{Fisher's test, χ2-test} are meaningful?

All tests are specific for a certain way and in general are all meanningful but depending the situation :

Two-tailed χ² test:  This is the most common and appropriate choice when you have sufficient data and want to assess if there's a statistically significant difference between the distributions of two categories (like in this case versions A and B of the website). It considers deviations from the expected value in both directions (greater or lower than expected).

Two-tailed Fisher's exact test : This is an alternative to the χ² test, particularly useful when dealing with small sample sizes. It provides an exact probability for the observed distribution, unlike the χ² test which relies on approximations.  However, in the context of sufficient data, the χ² test becomes more accurate, making Fisher's exact test less necessary.

One-tailed χ² test: This is only appropriate when you have a strong prior expectation that the difference will be in one specific direction (e.g., you expect version A to always have a higher value than B).  It throws away information by only considering deviations in one direction.

One-tailed Fisher's exact test: Similar to the one-tailed χ² test, this is only useful in specific scenarios with a strong directional hypothesis and small sample sizes. With sufficient data, a two-tailed test is generally preferred.


Using function `table_probability`, implement the function `Fisher_p_value(table, alternative)` that for the contingency table `table` of dimension 2$\times$2 computes the p-value of Fisher's test. The parameter `alternative` can have only two values:
1. For `alternative` equal to 'two-tailed', the function returns the p-value of the two-tailed Fisher's test.
2. For `alternative` equal to 'one-tailed', the function returns the p-value of the one-tailed Fisher's test, i.e., the probability that 
   the observed counts are as seen in `table` or are more extreme &ndash; further from the expected ones *in the same 
   direction* from the expected counts as `table`. If `table` contains exactly the expected counts, we will consider the direction for the values in the upper left corner less or equal to $a$.

In [211]:
def Fisher_p_value(table, alternative='two-tailed'):
    a, b = table[0]
    c, d = table[1]
    observed_probability = table_probability(table)

    total_a = a + c
    total_b = b + d
    row1_total = a + b
    row2_total = c + d

    p_value = 0
    for a_prime in range(max(0, row1_total - total_b), min(row1_total, total_a) + 1):
        b_prime = row1_total - a_prime
        c_prime = total_a - a_prime
        d_prime = total_b - b_prime
        current_table = [[a_prime, b_prime], [c_prime, d_prime]]
        current_prob = table_probability(current_table)

        if alternative == 'two-tailed':
            if current_prob <= observed_probability:
                p_value += current_prob
        elif alternative == 'one-tailed':
            if (a <= total_a * row1_total / (row1_total + row2_total) and a_prime <= a) or \
               (a > total_a * row1_total / (row1_total + row2_total) and a_prime >= a):
                p_value += current_prob

    return p_value

Now, we can compute p-values for both alternatives of the Fisher's test for *Table 2*. In the following cell, the function `Fisher_p_value` will be tested (including some hidden tests) and applied on `table_2` (**2 points** of the score).

In [212]:
t = np.array([[5,5], [5,5]])
alternative = 'two-tailed'
p = Fisher_p_value(t, alternative)
print(f"p-value of {alternative} Fisher's test for the table\n{t=}\n is {p}")
assert np.isclose(p, 1.0)

print('_'*30)

t = np.array([[5,5], [5,5]])
alternative = 'one-tailed'
p = Fisher_p_value(t, alternative)
print(f"p-value of {alternative} Fisher's test for the table\n{t=}\n is {p}")
assert np.isclose(p, 0.6718591)

print('='*20)

t = np.array([[38,5], [20,9]])
alternative = 'one-tailed'
p = Fisher_p_value(t, alternative)
print(f"p-value of {alternative} Fisher's test for the table\n{t=}\n is {p}")
assert np.isclose(p, 0.042128934)

alternative = 'two-tailed'
p = Fisher_p_value(t, alternative)
print(f"p-value of {alternative} Fisher's test for the table\n{t=}\n is {p}")
assert np.isclose(p, 0.0667809135)

print("="*60 + "\n" + "="*60)

# Example usage
table_2 = np.array([[10, 5], [15, 20]])
one_tailed_Fisher_p_value = Fisher_p_value(table_2, 'one-tailed')
two_tailed_Fisher_p_value = Fisher_p_value(table_2, 'two-tailed')
print(f"For the table\n{table_2}")
print(f"The p-value of the Fisher's one-tailed test is {one_tailed_Fisher_p_value}")
print(f"The p-value of the Fisher's two-tailed test is {two_tailed_Fisher_p_value}")

p-value of two-tailed Fisher's test for the table
t=array([[5, 5],
       [5, 5]])
 is 1.0
______________________________
p-value of one-tailed Fisher's test for the table
t=array([[5, 5],
       [5, 5]])
 is 0.6718591006516703
p-value of one-tailed Fisher's test for the table
t=array([[38,  5],
       [20,  9]])
 is 0.04212893437210189
p-value of two-tailed Fisher's test for the table
t=array([[38,  5],
       [20,  9]])
 is 0.06678091352515701
For the table
[[10  5]
 [15 20]]
The p-value of the Fisher's one-tailed test is 0.10826710971531815
The p-value of the Fisher's two-tailed test is 0.21653421943063633


While ignoring the requirement that the χ2-test can be used only if all expected counts in the contingency table are at least 5, for all meaningful combinations of the χ2-test for *Table 2* compute χ2-statistics and its corresponding p-value.

For computing the χ2-test use suitable functions from the Python `scipy` module like `scipy.stats.chi2.pdf()`, `scipy.stats.chi2.cdf()`, `scipy.stats.chi2.sf()`, and `scipy.stats.chi2.isf()`. Your code should end with storing the value of the χ2-statistics of any of the meaningful combinations into variable `x2_stat` and its corresponsing p-value in the variable `x2_p_value`. 

In [130]:

table_2 = np.array([[4, 10], [7, 3]])
observed = table_2

row_totals = np.sum(observed, axis=1)
col_totals = np.sum(observed, axis=0)
total = np.sum(observed)

expected = np.outer(row_totals, col_totals) / total

chi2_stat = np.sum((observed - expected)**2 / expected)

df = (observed.shape[0] - 1) * (observed.shape[1] - 1)
p_value = chi2.sf(chi2_stat, df)

x2_stat = chi2_stat
x2_p_value = p_value

 In the following cell, your results (`x2_stat` and `x2_p_value`) will be evaluated (**1 point**).

In [131]:

print(f"{x2_stat=}")
print(f"{x2_p_value=}")


x2_stat=4.032767232767235
x2_p_value=0.04462468800071265


Of course, `scipy` contains functions for computing Fisher's exact test and χ2-test. Compute the above tests for *Table 2* using the functions `scipy.stats.fisher_exact()` and `scipy.stats.chi2_contingency()`  (**1 point**).

In [67]:
from scipy.stats import fisher_exact, chi2_contingency
import numpy as np

def Fisher_p_value(table, alternative: str):
    if alternative == 'two-tailed':
        odds_ratio, p_value = fisher_exact(table)
    elif alternative == 'one-tailed':
        odds_ratio, p_value = fisher_exact(table, alternative="greater")
    else:
        raise ValueError("Invalid alternative. Use 'two-tailed' or 'one-tailed'.")

    return odds_ratio, p_value

def chi2_Test(table):
    statis, p_value,_,_= chi2_contingency(table, correction=False)
    return statis, p_value

table_2 = np.array([[4, 10], [7, 3]])

print("\nFisher's exact test  :")
odds_ratio, p_value = Fisher_p_value(table_2, "two-tailed")
print("Odds ratio:", odds_ratio)
print("p-value:", p_value)

print("\nChi-square test:")
chi2_stat, p_value_chi2= chi2_Test(table_2)
print("Chi-square:", chi2_stat)
print("p-value:", p_value_chi2)


Fisher's exact test  :
Odds ratio: 0.17142857142857143
p-value: 0.0953021941041863

Chi-square test:
Chi-square: 4.032767232767235
p-value: 0.04462468800071265


## Final evaluation

1. State explicitly whether we can or cannot reject the null hypothesis for each test you have performed for *Table 2*.
2. Further, compare the results obtained with your implementation of the tests and the results obtained when using the functions 
   `scipy.stats.fisher_exact()` and `scipy.stats.chi2_contingency()`. If there are any differences, explain them (**1 point**). 


- Null Hypothesis ($H_0$): The efficacy of both versions A and B is the same.
- Alternative Hypothesis ($H_1$): The efficacy of version A is different from version B.

1. Using the Fisher Exact function:
Odds ratio: 0.1714
p-value: 0.0953 (two-tailed)

**Conclusion: Since the p-value (0.0953) is greater than the significance level (0.05), we fail to reject the null hypothesis. There is no statistically significant difference in click-through rates between versions A and B.**

2.Using the Chi-square test function:
Chi-square statistic: 4.0328
p-value: 0.0446

**Conclusion: Since the p-value (0.0446) is less than the significance level (0.05), we reject the null hypothesis. There is a statistically significant difference in click-through rates between versions A and B.**

### Final Evaluation:
1. Fisher's exact test:
* We fail to reject the null hypothesis as the p-value is greater than the significance level (0.05).

2. Chi-square test:
* We reject the null hypothesis as the p-value is less than the significance level (0.05).

### Comparison of Results:

There is a discrepancy between the results obtained with the custom implementation and the results obtained using scipy.stats.fisher_exact() and scipy.stats.chi2_contingency(). 

Fisher's exact test:
Custom implementation: p-value = 0.0953
scipy.stats.fisher_exact(): p-value = 0.0953
Both implementations yield the same p-value.
Chi-square test:
Custom implementation: p-value = 0.0446
scipy.stats.chi2_contingency(): p-value = 0.0446
Both implementations yield the same p-value.
The difference arises because the chi-square test using scipy.stats.chi2_contingency() applies Yates' continuity correction by default, which slightly alters the p-value. However, in this case, the discrepancy is not significant. Both implementations lead to the same conclusion: rejecting the null hypothesis.






