# Tutorial: Permutation Testing using `acore`

In this notebook we will demonstrate how to use acore's permutation testing functions on metagenomics data collected by [Ju and colleagues (2018)](https://doi.org/10.1038/s41396-018-0277-8).

The samples were collected from wastewaster treatment plant inffluent and effluent.

## Downloading the dataset

The analysed samples can be downloaded via the Mgnify API.

In [5]:
import requests 
from io import StringIO
import pandas as pd

enf_url = 'https://www.ebi.ac.uk/metagenomics/api/v1/studies/MGYS00005058/pipelines/4.1/file/ERP112696_GO_abundances_v4.1.tsv'
inf_url = 'https://www.ebi.ac.uk/metagenomics/api/v1/studies/MGYS00005056/pipelines/4.1/file/ERP111072_GO_abundances_v4.1.tsv'

# init dict to hold dataframes
data = {}
for url in [enf_url, inf_url]:
    print(f"Processing {url}")
    # download
    response = requests.get(url)
    if response.status_code == 200:
        # treat as file
        file_like = StringIO(response.content.decode('utf-8'))
        # read into pd df
        df = pd.read_csv(file_like, sep='\t')
        # to the dict
        data[url] = df
    else: 
        print(response)
        print(url, " download skipped")

# sanity check
data[enf_url].head()

Processing https://www.ebi.ac.uk/metagenomics/api/v1/studies/MGYS00005058/pipelines/4.1/file/ERP112696_GO_abundances_v4.1.tsv
Processing https://www.ebi.ac.uk/metagenomics/api/v1/studies/MGYS00005056/pipelines/4.1/file/ERP111072_GO_abundances_v4.1.tsv


Unnamed: 0,GO,description,category,ERR2985255,ERR2985256,ERR2985257,ERR2985258,ERR2985259,ERR2985260,ERR2985261,...,ERR2985269,ERR2985270,ERR2985271,ERR2985272,ERR2985273,ERR2985274,ERR2985275,ERR2985276,ERR2985277,ERR2985278
0,GO:0000001,mitochondrion inheritance,biological process,0,0,0,0,0,3,0,...,1,0,1,1,1,2,0,0,2,2
1,GO:0000002,mitochondrial genome maintenance,biological process,0,9,7,0,2,7,4,...,15,10,0,0,0,5,0,4,11,1
2,GO:0000012,single strand break repair,biological process,0,3,1,1,0,0,0,...,0,0,1,0,0,2,1,2,1,1
3,GO:0000015,phosphopyruvate hydratase complex,cellular component,477,384,500,188,226,232,301,...,203,432,152,341,319,520,290,349,270,292
4,GO:0000030,mannosyltransferase activity,molecular function,189,200,57,67,39,129,181,...,70,42,58,73,92,159,67,179,58,91


For this demo we will only look at [go term 0030655](https://gowiki.tamu.edu/wiki/index.php/Category:GO:0030655_!_beta-lactam_antibiotic_catabolic_process)

In [21]:
term = 'GO:0030655'

inf_go = data[inf_url].query("GO == @term").drop(columns=['GO', 'description', 'category']).to_numpy()[0]
enf_go = data[enf_url].query("GO == @term").drop(columns=['GO', 'description', 'category']).to_numpy()[0]

inf_go, enf_go

(array([ 54, 101,  69,  52,  61,  68, 401,  72,  67,  80,  87,  49,  63,
         53, 345, 131,  67,  69,  42,  67, 107, 126,  63,  94]),
 array([  23,   87, 1026,   63,  311,   47,   71,  121,  109,  334,   40,
         111,  111,   31,   21,  919,   58,   95,   98,  120,   96,   84,
          26,   89]))

get metadata

In [29]:
from acore.permutation_test import paired_permutation

# optional for reproducibility
import numpy as np
rng = np.random.default_rng(12345)

In [31]:
result = paired_permutation(inf_go, enf_go, metric='mean', n_permutations=10000, rng=rng)

result

{'metric': <function mean at 0x130d17df0>,
 'observed': np.float64(-70.95833333333333),
 'p_value': np.float64(0.2664)}