# Meth_Comp API usage

## Import module

In [1]:
# Import main module 
from pycoMeth.Meth_Comp import Meth_Comp

# optionally inport jupyter helper functions
from pycoMeth.common import head, jhelp

## Getting help

In [23]:
jhelp(Meth_Comp)

**Meth_Comp** (aggregate_fn_list, ref_fasta_fn, output_tsv_fn, output_bed_fn, max_missing, min_diff_llr, sample_id, verbose, quiet, progress, kwargs)

Compare methylation values for each CpG positions or intervals between n samples and perform a statistical test to evaluate if the positions are significantly different. For 2 samples a Mann_Withney test is performed otherwise multiples samples are compared with a Kruskal Wallis test. pValues are adjusted for multiple tests using the Benjamini & Hochberg procedure for controlling the false discovery rate.

---

* **aggregate_fn_list** (required) [list(str)]

A list of output tsv files corresponding to several samples to compare generated by either CpG_Aggregate or Interval_Aggregate.

* **ref_fasta_fn** (required) [str]

Reference file used for alignment in Fasta format (ideally already indexed with samtools faidx)

* **output_tsv_fn** (default: None) [str]

Path to write an more extensive result report in TSV format (At least 1 output file is required)

* **output_bed_fn** (default: None) [str]

Path to write a summary result file in BED format (At least 1 output file is required)

* **max_missing** (default: 0) [int]

Max number of missing samples to perform the test

* **min_diff_llr** (default: 2) [int]

Minimal llr boundary for negative and positive median llr. The test if only performed if at least one sample has a median llr above (methylated) and 1 sample has a median llr below (unmethylated)

* **sample_id** (default: "") [str]

Sample ID to be used for the BED track header

* **verbose** (default: False) [bool]

* **quiet** (default: False) [bool]

* **progress** (default: False) [bool]

* **kwargs**



## Example usage

#### Usage with CpG Aggregate output

In [2]:
ff = Meth_Comp (
    aggregate_fn_list=[
        "./data/CpG_Aggregate_sample_1.tsv", 
        "./data/CpG_Aggregate_sample_2.tsv", 
        "./data/CpG_Aggregate_sample_3.tsv", 
        "./data/CpG_Aggregate_sample_4.tsv"],
    ref_fasta_fn="./data/yeast.fa",
    output_bed_fn="./results/CpG_Yeast.bed",
    output_tsv_fn="./results/CpG_Yeast.tsv",
    sample_id="CpG_Yeast",
    max_missing = 1,
    min_diff_llr = 1,
    progress=True)

head("./results/CpG_Yeast.tsv")
head("./results/CpG_Yeast.bed")

## Checking options and input files ##
## Parsing files ##
	Reading input files header and checking consistancy between headers
	Starting asynchronous file parsing
	:   1%|          | 120k/16.8M [00:00<00:07, 2.11M bytes/s]
	Results summary
		Sites with insufficient samples: 1,672
		Sites with insufficient effect size: 5
		Valid sites: 1


JSONDecodeError: Extra data: line 1 column 6 (char 5)

#### Usage with Interval Aggregate output

In [3]:
ff = Meth_Comp (
    aggregate_fn_list=[
        "./data/Medaka_sample_1_CGI_aggregate.tsv", 
        "./data/Medaka_sample_2_CGI_aggregate.tsv", 
        "./data/Medaka_sample_3_CGI_aggregate.tsv", 
        "./data/Medaka_sample_4_CGI_aggregate.tsv"],
    ref_fasta_fn="./data/medaka_toplevel.fa",
    output_bed_fn="./results/CGI_Medaka.bed",
    output_tsv_fn="./results/CGI_Medaka.tsv",
    sample_id="CGI_Medaka",
    max_missing = 1,
    min_diff_llr = 2,
    progress=True)

head("./results/CGI_Medaka.tsv")
head("./results/CGI_Medaka.bed")

## Checking options and input files ##
## Parsing files ##
	Reading input files header and checking consistancy between headers
	Starting asynchronous file parsing
	: 100%|██████████| 105M/105M [00:10<00:00, 9.80M bytes/s] 
	Adjust pvalues
	Writing output file
	: 100%|██████████| 801/801 [00:00<00:00, 13.4k sites/s]
	Results summary
		Sites with insufficient effect size: 238,121
		Sites with insufficient samples: 22,543
		Valid sites: 801
		Sites with significant pvalue (< 0.01): 758
		Sites with significant FDR adj pvalue (< 0.01): 756


chromosome start   end     n_samples pvalue                 statistic          adj_pvalue             neg_med pos_med ambiguous_med labels    med_llr_list                raw_llr_list                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  