# Grubbs's test
In statistics, Grubbs's test or the Grubbs test (named after Frank E. Grubbs, who published the test in 1950[1]), also known as the maximum normalized residual test or extreme studentized deviate test, is a test used to detect outliers in a univariate data set assumed to come from a normally distributed population.
## Definition
Grubbs's test is based on the assumption of normality. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs test.

Grubbs's test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or fewer since it frequently tags most of the points as outliers.

Grubbs's test is defined for the hypothesis:

* $H_0$: There are no outliers in the data set
* $H_a$: There is exactly one outlier in the data set

The Grubbs test statistic is defined as:

\begin{equation}
    G =  \frac{\displaystyle\max_{i=1,\ldots, N}\left \vert Y_i - \bar{Y}\right\vert}{s}
\end{equation}

with $\overline {Y}$ and $s$ denoting the sample mean and standard deviation, respectively. The Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation.

This is the two-sided test, for which the hypothesis of no outliers is rejected at significance level $\alpha$ if

\begin{equation}
    G > \frac{N-1}{\sqrt{N}} \sqrt{\frac{t_{\alpha/(2N),N-2}^2}{N - 2 + t_{\alpha/(2N),N-2}^2}}
\end{equation}

with $t_{\alpha/(2N),N-2}$ denoting the upper critical value of the t-distribution with $N-2$ degrees of freedom and a significance level of $\alpha/(2N)$.

### One-sided case
The Grubbs test can also be defined as a one-sided test, replacing $\alpha/(2N)$ with $\alpha/(N)$. To test whether the minimum value is an outlier, the test statistic is

\begin{equation}
    G=\frac{\overline{Y}-Y_{min}}{s}
\end{equation}

with $Y_{min}$ denoting the minimum value. To test whether the maximum value is an outlier, the test statistic is

\begin{equation}
    G=\frac{Y_{max}-\overline{Y}}{s}
\end{equation}

with $Y_{max}$ denoting the maximum value.

In [6]:
from scipy import stats

In [45]:
n=4 #0.05 ->1.46
alpha=0.05

In [52]:
t_two=stats.t.ppf(1-alpha/(2*n), n-2)
t_two

8.860200034654257

In [53]:
#### prima parte

In [54]:
a=(n-1)/n**(1/2)
a

1.5

In [55]:
#### seconda parte

In [56]:
b=((t_two**2)/(n-2+t_two**2))**(1/2)
b

0.9874999999999998

### insieme

In [57]:
a*b

1.4812499999999997

In [70]:
x=np.array([2,3,5,6,1,5,3,2,6,8,6,9,10])

## Funzione python

In [71]:
import numpy as np
from scipy import stats

In [88]:
x=np.array([0,3,5,3,12,8,10,3,1,4,3,6,5,7,8,9,2,3,4,5,3])

In [112]:
def grubbs_test (x, alpha=0.05, type="two_side", side=None):
    # x è il dataset che vuoi testare
    # alpha è il valore di significatività del test, 1-alpha = p livello di confidenza, del 90% se alpha=0.1, default 0.05
    # type: two_side or one_side, two_side default
    # side serve solo nel caso one_side e fa scegliere quale valore si vuole controllare, right -> valore massimo, left -> valore minimo
    
    n=len(x) # numero di dati
    if type == "two_side":
        t=stats.t.ppf(1-alpha/(2*n), n-2) # t di student calcolata
        g=np.amax(abs(x-x.mean()))/np.std(x)
        
    elif type == "one_side": 
        t=stats.t.ppf(1-alpha/(n), n-2) # t di student calcolata
        if side=="right":
            g=(np.amax(x)-x.mean())/np.std(x)
        elif side=="left":
            g=(x.mean()-np.min(x))/np.std(x)
        
    g_crit=(n-1)/n**(1/2)*((t**2)/(n-2+t**2))**(1/2) #valore critico di G
    
    if g > g_crit:
        print("H_0 At this significance level, no outlier is detected")
    elif g < g_crit:
        if type == "two_side":
            print(f"H_a There is exactly one outlier in the data set, is {g*np.std(x)+x.mean()}")
        elif type == "one_side": 
            if side=="right":
                print(f"H_a There is exactly one outlier in the data set, is {g*np.std(x)+x.mean()}")
            elif side=="left": 
                print(f"H_a There is exactly one outlier in the data set, is {x.mean()-(g*np.std(x))}")

In [113]:
grubbs_test(x=x, type="one_side", side="left")

H_a There is exactly one outlier in the data set, is 0.0
