# Aggregate API usage

## Import module

In [2]:
# Import main module 
from pycoMeth.Aggregate import Aggregate

# optionally inport jupyter helper functions
from pycoMeth.common import head, jhelp

## Getting help

In [3]:
jhelp(Aggregate)

**Aggregate** (input_fn, fasta_index, output_bed_fn, output_tsv_fn, min_depth, sample_id, min_llr, kwargs)

Calculate methylation frequency at genomic CpG sites from the output of nanopolish call-methylation

---

* **input_fn** (required) [str]

Path to a nanopolish call_methylation tsv output file

* **fasta_index** (required) [str]

fasta index file obtained with samtools faidx needed for coordinate sorting

* **output_bed_fn** (default: "") [str]

Path to write a summary result file in BED format (At least 1 output file is requires in CLI mode)

* **output_tsv_fn** (default: "") [str]

Path to write an more extensive result report in TSV format (At least 1 output file is requires in CLI mode)

* **min_depth** (default: 10) [int]

Minimal number of reads covering a site to be reported

* **sample_id** (default: "") [str]

Sample ID to be used for the bed track header

* **min_llr** (default: 2) [float]

Minimal log likelyhood ratio to consider a site significantly methylated or unmethylated

* **kwargs**

Allow to pass extra options such as verbose, quiet and progress



## Example usage

#### Basic interactive usage

If no output file is given, Aggregate can return a pandas dataframe containing all the results 

In [4]:
ff = Aggregate (
    input_fn="./data/sample_1.tsv",
    fasta_index="./data/ref.fa.fai",
    progress=True)

display(ff.df.head())

## Checking arguments ##
## Parsing methylation_calls file ##
	Starting to parse file Nanopolish methylation call file
	: 100%|█████████▉| 60.4M/60.4M [00:04<00:00, 14.1M bytes/s]
	Filtering out low coverage sites
	Sorting by coordinates
	Processing valid sites found and write to file
	: 100%|██████████| 804/804 [00:01<00:00, 657 sites/s]  
## Results summary ##
	Lines Parsed: 605,248
	Total Valid Lines: 605,248
	Initial Sites: 229,389
	Low Count Sites: 228,585
	Valid Sites Found: 804
	Total Sites Written: 804
	Unmethylated sites: 589
	Ambiguous sites: 215


Unnamed: 0,chromosome,start,end,strand,methylated_reads,unmethylated_reads,ambiguous_reads,sequence,num_motifs,median_llr,llr_list
0,chr-VIII,138415,138416,.,0,7,3,GGTCTCGCTTT,1,-2.335,"[-9.63, -5.51, -5.64, -5.44, 1.06, -2.23, -0.4..."
1,chr-VIII,138429,138430,.,0,8,2,AGCTTCGAGGA,1,-5.055,"[-3.64, -5.14, -4.97, 1.16, -0.43, -9.53, -2.0..."
2,chr-VIII,212351,212352,.,0,8,4,TGGGGCGACAT,1,-2.95,"[-3.14, -6.06, -9.1, 0.53, 0.17, -11.61, -2.48..."
3,chr-VIII,212392,212393,.,1,5,6,ATTAACGTATA,1,-1.87,"[-6.91, -1.82, 0.21, -4.89, -3.07, 3.09, -1.92..."
4,chr-VIII,212457,212461,.,0,8,4,AGAATCGTCGATTA,2,-4.155,"[-6.33, 0.08, -3.48, -0.33, -1.71, -13.56, -4...."


#### Basic usage with files output 

In [5]:
ff = Aggregate (
    input_fn="./data/sample_1.tsv",
    fasta_index="./data/ref.fa.fai",
    output_bed_fn="./results/sample_1.bed",
    output_tsv_fn="./results/sample_1.tsv")

head("./results/sample_1.tsv")
head("./results/sample_1.bed")

## Checking arguments ##
## Parsing methylation_calls file ##
	Starting to parse file Nanopolish methylation call file
	Filtering out low coverage sites
	Sorting by coordinates
	Processing valid sites found and write to file
		Start writing BED output
		Start writing TSV output
## Results summary ##
	Lines Parsed: 605,248
	Total Valid Lines: 605,248
	Initial Sites: 229,389
	Low Count Sites: 228,585
	Valid Sites Found: 804
	Total Sites Written: 804
	Unmethylated sites: 589
	Ambiguous sites: 215


chromosome start  end    strand methylated_reads unmethylated_reads ambiguous_reads sequence       num_motifs median_llr llr_list                                                                      
chr-VIII   138415 138416 .      0                7                  3               GGTCTCGCTTT    1          -2.335     -9.63;-5.51;-5.64;-5.44;1.06;-2.23;-0.47;0.53;-2.29;-2.38                     
chr-VIII   138429 138430 .      0                8                  2               AGCTTCGAGGA    1          -5.055     -3.64;-5.14;-4.97;1.16;-0.43;-9.53;-2.08;-8.07;-9.1;-5.42                     
chr-VIII   212351 212352 .      0                8                  4               TGGGGCGACAT    1          -2.95      -3.14;-6.06;-9.1;0.53;0.17;-11.61;-2.48;-3.4;-2.76;0.66;0.13;-12.44           
chr-VIII   212392 212393 .      1                5                  6               ATTAACGTATA    1          -1.87      -6.91;-1.82;0.21;-4.89;-3.07;3.09;-1.92;1.83;-1.12;-2.98;-3.0;-1.82           
