# Advanced Modeling

<hr>

## Non-parametric statistical testing

Parametric hypothesis testing assumes a known, underlying distribution (*Gaussian for t-tests*) of the data. Non-parametric statistical testing can give directional conclusions, even when the underlying distributions are unknown.

Here are a few non-parametric tests:

- **McNemar's Test** (*yes/no data*)

    Compares results on pairs of responses. For example, two competing treatments for a virus with 100 samples tested and treatment A/B has 61 and 61 successes respectively. Is B better?
    
    The treatments are applied to all 100 samples, a 2x2 table of outcomes, the hypothesis and the test statistic are as follows:
    
    $H_0: p_b = p_c$
    
    $H_1 : p_b \neq p_c$
    
    $\chi^2 = \frac{(b-c)^2}{b+c}$
    
    if $b+c$ is sufficiently large ($>= 25$) then it approximates to the chi-square distribution. 
    
    Else, consider an exact binomial test where $b$ is compared to a binomial distribution with size parameter $n$ and $p = 0.5$ and the two-sided p-value is computed as such:
    
    $\text{two sided, p-value} = 2 \sum_{i=b}^{n} \binom{n}{i} 0.5^i (1-0.5)^{n-i}$
    
    
    
| | Treatment B +ve | Treatment B -ve | Total
| --------------- | --------------- | --------------- | --- | 
| Treatment A +ve | a | b | a+b |
| Treatment A -ve | c | d | c+d |
| Total           | a+c | b+d | N |
    
    

<img alt="McNemar Test" src="assets/mcnemar.png" width="400" >


- **Wilcoxon Signed Rank Test for Medians** (*numeric data*)

    Similarly, no assumptions are made about its underlying distribution and the only assumption is that its distribution is continuous and symmetric. It answers if the median of the distribution is different from a specific value $m$?
    
    Given responses $y_1, \dots, y_n$:
    
    1. Rank $\vert y_1 - m\vert, \dots, \vert{y_n - m}\vert$ from smallest to largest
    2. Add up all ranks where $y_i > m$ such that $W = \sum_{y_i > m} \text{rank}(y_i - m)$
    3. p-value test for $W$
    
    The test can also be applied to test if paired samples have the same median. Given pairs $(y_1, z_1), \dots, (y_n, z_n)$ from observations $y$ and $z$. Use $\vert y_1 - z_1 \vert, \dots, \vert y_n - z_n \vert$ for rank test.


- **Mann-Whitney test** (*two independent samples, not paired samples*)

    Given independent observations $y_1, \dots, y_n$ and $z_1, \dots, z_m$:
    
    1. Rank all observations together: $y_1, \dots, y_n$, $z_1, \dots, z_m$
    2. Compute and find significance of $U$ = smaller of two adjusted rank sums
    
        $U = \min\{U_y, U_z\}$
        
        where
        
        $U_y = \sum_{i=1}^{n} \text{rank}(y_i) - \frac{n(n+1)}{2}$
        
        $U_z = \sum_{j=1}^{m} \text{rank}(z_j) - \frac{m(m+1)}{2}$
            
****

## Bayesian Modeling

Based on conditional probability -- Bayes' rule/theorem -- and is useful when the overall distribution of something is known or estimated with very little data available.

$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$

Example: *Estimating the strength difference between two sports teams*

Suppose difference in points scored is distributed approximately normal: $X \sim N(m+h, \sigma^2)$

- $h$: home court advantage
- $m$: true difference in the teams' strength (unknown)
- $\sigma^2$: variance

Start with a prior distribution on the real difference between the two teams, $m$, such that:

$m \sim N(0, \tau^2)$, i.e. no difference between teams

Given observed data, $x$, the observed point difference in game, then return the posterior distribution of the true difference between two teams:

$P(M = m | X = x) = \frac{P(X=x|M=m)P(M=m)}{P(X=x)}$

The posterior distribution returns approximately as such:

$P(M|X) \sim N(\frac{\tau^2}{\tau^2 + \sigma^2} (x-h), \frac{\tau^2\sigma^2}{\tau^2 + \sigma^2})$ 

We can determine the probability of one team being better than another by integrating the posterior distribution from $0 \to \infty$ where area area the curve past the point $0$ is the probability of the team being better than the other.

$P(\text{one team is better} | X = x) = \int_{0}^{\infty} P(M|X) \cdot dm$


****

## Communities in Graphs

****

## Neural Networks and Deep Learning

****

## Competitive Models

<hr>

# Basic code
A `minimal, reproducible example`