# AsaPy

### Asa Analysis

#### Analysis.hypothesis

    """ Method that performs hypothesis testing

    Args:
        df : (Pandas DataFrame)
            Input data (must contain at least two distributions).
        alpha : (float)
            Significance level. Represents a cutoff value, a criterion that we set to reject or not H0. Default 0.05.
        verbose :  (bool, optional)
            Variable that defines whether or not to display detailed messages. Defaults to False.

    Raises:
        ValueError: Input variable is empty.
        ValueError: Input data must match at least two distributions.

    Returns:
        (Pandas DataFrame): Indicates which distributions are statistically similar.
    
    .. seealso::
    
        `pingouin.homoscedasticity <https://pingouin-stats.org/build/html/generated/pingouin.homoscedasticity.html#pingouin.homoscedasticity>`_: teste de igualdade de variância.

        `pingouin.normality <https://pingouin-stats.org/build/html/generated/pingouin.normality.html#pingouin.normality>`_: teste de normalidade.

        `scipy.stats.f_oneway <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html>`_: one-way ANOVA.

        `scipy.stats.tukey_hsd <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.tukey_hsd.html>`_: teste HSD de Tukey para igualdade de médias.

        `scipy.stats.kruskal <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html>`_: teste H de Kruskal-Wallis para amostras independentes.
        
        `scikit_posthocs.posthoc_conover <https://scikit-posthocs.readthedocs.io/en/latest/tutorial.html>`_: teste de Conover.

    """

In [1]:
import asapy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

In [2]:
Analysis = asapy.Analysis

df = pd.read_csv('datasets/ANOVA.csv')

output = Analysis.hypothesis(df, verbose = True)

Normality test
                W      pval  normal
-------  --------  --------  --------
class_0  1.13577   0.566723  True
class_1  1.06326   0.587647  True
class_2  0.480087  0.786594  True
class_3  0.670844  0.715036  True
class_4  0.838907  0.657406  True
Conclusion: Distributions can be considered Gaussian (normal).

Homoscedasticity test
               W      pval  equal_var
------  --------  --------  -----------
levene  0.606885  0.657845  True
Conclusion: Distributions have statistically similar variances (homoscedasticity).

Teste de ANOVA
statistic = 0.9651033688405953, pvalue = 0.4262383534762838
Conclusion: Statistically, the samples correspond to the same distribution (ANOVA).

      dist1    dist2  same?
--  -------  -------  -------
 0        0        1  True
 1        0        2  True
 2        0        3  True
 3        0        4  True
 4        1        2  True
 5        1        3  True
 6        1        4  True
 7        2        3  True
 8        2        4  True

In [3]:
df = pd.read_csv('./datasets/Tukey.csv')
output = Analysis.hypothesis(df, verbose = True)

Normality test
                W      pval  normal
-------  --------  --------  --------
class_0  1.13577   0.566723  True
class_1  1.06326   0.587647  True
class_2  0.480087  0.786594  True
class_3  0.670844  0.715036  True
class_4  0.421267  0.810071  True
Conclusion: Distributions can be considered Gaussian (normal).

Homoscedasticity test
              W      pval  equal_var
------  -------  --------  -----------
levene  0.25236  0.908208  True
Conclusion: Distributions have statistically similar variances (homoscedasticity).

Teste de ANOVA
statistic = 3.7967247638257873, pvalue = 0.004724714999604516
Conclusion: Statistically, the samples are different distributions (ANOVA).

Tukey test

Tukey's HSD Pairwise Group Comparisons (95.0% Confidence Interval)
Comparison  Statistic  p-value  Lower CI  Upper CI
 (0 - 1)      0.240     0.998    -1.865     2.345
 (0 - 2)     -0.010     1.000    -2.115     2.095
 (0 - 3)     -0.730     0.877    -2.835     1.375
 (0 - 4)     -2.350     0.020

In [4]:
df = pd.read_csv('./datasets/Kruskal.csv')
output = asapy.Analysis.hypothesis(df, verbose = True)

Normality test
            W         pval  normal
----  -------  -----------  --------
col1  52.7721  3.47286e-12  False
col2  48.0014  3.77241e-11  False
col3  14.3566  0.00076295   False
col4  58.1361  2.37637e-13  False
col5  42.1236  7.1281e-10   False
Conclusion: At least one distribution does not resemble a Gaussian (normal) distribution.

Homoscedasticity test
               W      pval  equal_var
------  --------  --------  -----------
levene  0.833985  0.503964  True
Conclusion: Distributions have statistically similar variances (homoscedasticity).

Kruskal test
statistic = 5.577362980773645, pvalue = 0.2330124429695932
Conclusion: Statistically, the samples correspond to the same distribution (Kruskal-Wallis).

      dist1    dist2  same?
--  -------  -------  -------
 0        0        1  True
 1        0        2  True
 2        0        3  True
 3        0        4  True
 4        1        2  True
 5        1        3  True
 6        1        4  True
 7        2        3  

In [5]:
df = pd.read_csv('./datasets/Conover.csv')
output = Analysis.hypothesis(df, verbose = True)

Normality test
             W         pval  normal
----  --------  -----------  --------
col1   74.7177  5.96007e-17  False
col2   31.6041  1.3717e-07   False
col3   40.6985  1.45356e-09  False
col4   10.2107  0.00606431   False
col5  212.599   6.8361e-47   False
Conclusion: At least one distribution does not resemble a Gaussian (normal) distribution.

Homoscedasticity test
              W       pval  equal_var
------  -------  ---------  -----------
levene  2.03155  0.0888169  True
Conclusion: Distributions have statistically similar variances (homoscedasticity).

Kruskal test
statistic = 182.22539784431183, pvalue = 2.480716493859747e-38
Conclusion: Statistically, the samples correspond to different distributions (Kruskal-Wallis).

Conover test
              1             2             3             4             5
1  1.000000e+00  3.280180e-04  8.963739e-01  1.632161e-08  6.805120e-21
2  3.280180e-04  1.000000e+00  5.316246e-04  3.410392e-02  2.724152e-35
3  8.963739e-01  5.316246e-

In [6]:
df = pd.read_csv('./datasets/Welch_ANOVA.csv')
output = Analysis.hypothesis(df, verbose = True)

Normality test
                   W         pval  normal
----------  --------  -----------  --------
Unnamed: 0   34.6736  2.95619e-08  False
col1         74.7177  5.96007e-17  False
col2         31.6041  1.3717e-07   False
col3         40.6985  1.45356e-09  False
col4         10.2107  0.00606431   False
col5        212.599   6.8361e-47   False
Conclusion: At least one distribution does not resemble a Gaussian (normal) distribution.

Homoscedasticity test
              W        pval  equal_var
------  -------  ----------  -----------
levene  7.21779  1.4219e-06  False
Conclusion: Distributions have statistically different variances (heteroscedasticity).

One-Way Welch ANOVA test
      dist1    dist2  same?
--  -------  -------  -------
 0        0        1  False
 1        0        2  False
 2        0        3  False
 3        0        4  False
 4        0        5  False
 5        1        2  False
 6        1        3  False
 7        1        4  False
 8        1        5  False
 9