# Freq_meth_calculate usage

Calculate methylation frequency at genomic CpG sites from the output of `nanopolish call-methylation`

## Options

* **input_fn**

Path to a `nanopolish call-methylation` tsv output file (read access required). In command line mode it is also possible to pipe the output of the `nanopolish call-methylation` directly into `Freq_meth_calculate`.

* **outdir**

Path to an existing directory where to write all the output files generated by `Eventalign_collapse`(write access required). If the directory does not exist an error is raised.

* **outprefix**

Prefix for all the files generated by the program

Filtering options:

* **min_llr**

Set the minimal required log likelihood ratio between methylated and unmethylated state. Otherwise the value is considered ambiguous. 

* **min_depth**

Set the minimal number of reads with non -ambiguous call per genomic position to be written in the output files.

* **min_meth_freq**

Set the minimal methylation frequency per genomic position to be written in the output files.

* **split_group**

In CpG rich regions it is not sometimes hard to distinguish which kmers is methylated in this case `nanopolish` outputs a group containing multiple CpG (or other motif). This option splits each CpG in individual genomic site. This can be useful for visualisation in a genome browser.

* **motif**

Specify the motif type , akin the one used for `Nanopolish call-methylation`. This is only useful if the split_group option is also selected. 

!!! note "Methylation motif to sequence correspondence"
    * cpg → CG
    * gpc → GC
    * dam → GATC
    * dcm → CCAGG


## Output files format

`Freq_meth_calculate` generates 2 files, a standard BED file and a tabulated file containing extra information

#### BED file

Standard genomic BED6 (https://genome.ucsc.edu/FAQ/FAQformat.html#format1). The score correspond to the methylation frequency multiplied by 1000. The file is sorted by coordinates and can be rendered with a genome browser such as [IGV](https://software.broadinstitute.org/software/igv/)

#### Tabulated TSV file

Contrary to the bed file, in the tabulated report, positions are ordered by decreasing methylation frequency.

The file contains the following fields:

* **chrom / start / end / strand**: Genomic coordinates of the motif or group of motifs in case split_group was not selected.
* **site_id**: Unique integer identifier of the genomic position.
* **methylated_reads / unmethylated_reads / ambiguous_reads**: Number of reads at a given genomic location with a higher likelyhood of being methylated or unmethylated or with an ambiguous methylation call.
* **sequence**: -5 to +5 sequence of the motif or group of motifs in case split_group was not selected.
* **num_motifs**: Number of motif in the group.
* **meth_freq**: Methylation frequency (out of non anbiguous calls).

#### Log YAML file

The program also generates a simple log file containing the sites and positions count formated in YAML.

Example file content:

```
Read sites summary:
    total: 605,248
    unmethylated: 571,328
    methylated: 33,920
Genomic positions summary:
    total: 340,081
    low_coverage: 339,355
    low_meth_freq: 698
    valid: 28
```


## Bash command line usage

### Command line help

In [1]:
%%bash

# Load local bashrc and activate virtual environment
source ~/.bashrc
workon Nanopolish_0.11.1

NanopolishComp Freq_meth_calculate --help

usage: NanopolishComp Freq_meth_calculate [-h] [-i INPUT_FN] [-o OUTDIR]
                                          [-p OUTPREFIX] [-l MIN_LLR]
                                          [-d MIN_DEPTH] [-f MIN_METH_FREQ]
                                          [-s] [-m {cpg,gpc,dam,dcm}]
                                          [-v | -q]

Calculate methylation frequency at genomic CpG sites from the output of
nanopolish call-methylation

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Increase verbosity (default: False)
  -q, --quiet           Reduce verbosity (default: False)

Input/Output options:
  -i INPUT_FN, --input_fn INPUT_FN
                        Path to a nanopolish call_methylation tsv output file.
                        If not specified read from std input
  -o OUTDIR, --outdir OUTDIR
                        Path to the output folder (default: ./)
  -p OUTPREFIX, --outprefix OUTPREFIX
                        text outpr

### Example usage

#### From an existing nanopolish call_methylation file output

In [2]:
%%bash

# Load local bashrc and activate virtual environment
source ~/.bashrc
workon Nanopolish_0.11.1

NanopolishComp Freq_meth_calculate --verbose -i data/freq_meth_calculate/methylation_calls.tsv -o ./output/freq_meth_calculate/

head ./output/freq_meth_calculate/out_freq_meth_calculate.bed
head ./output/freq_meth_calculate/out_freq_meth_calculate.tsv

track name='out' description='Methylation frequency track generated with NanopolishComp' useScore=1
chr-XII	451856	451857	216740	000090	-
chr-XII	451976	451980	216746	000071	-
chr-XII	452137	452138	216753	000050	-
chr-XII	452217	452221	216756	000272	-
chr-XII	452331	452341	216384	000062	+
chr-XII	452632	452633	216391	000066	+
chr-XII	452654	452655	216769	000100	-
chr-XII	453086	453087	216781	000055	-
chr-XII	453336	453337	216790	000066	-
chrom	start	end	strand	site_id	methylated_reads	unmethylated_reads	ambiguous_reads	sequence	num_motifs	meth_freq
chr-XII	465335	465339	-	217103	6	5	29	CAGCACGACGGAGT	2	0.545455
chr-XII	456198	456202	-	216869	7	7	32	CAGCACGACGGAGT	2	0.500000
chr-XII	460773	460774	+	216597	17	20	77	CTCATCGTATA	1	0.459459
chr-XII	457104	457105	+	216519	6	8	43	AACTACGAGCT	1	0.428571
chr-XII	461274	461275	+	216614	5	7	85	TTATCCGAATG	1	0.416667
chr-XII	463406	463407	+	216682	6	12	29	TCTTTCGCCCC	1	0.333333
chr-XII	457887	457888	-	216914	6	12	53	GCTGCCGGAAA	1	0.333333
chr-XII	

Options summary
	package_name: NanopolishComp
	package_version: 0.6.1
	timestamp: 2019-05-02 10:36:29.585341
	quiet: False
	verbose: True
	motif: cpg
	split_group: False
	min_meth_freq: 0.05
	min_depth: 10
	min_llr: 2.5
	outprefix: out
	outdir: ./output/freq_meth_calculate/
	input_fn: data/freq_meth_calculate/methylation_calls.tsv

## Checking arguments ##
Test input file readability
Testing output dir writability
## Parsing methylation_calls file ##
Starting to parse file methylation_calls file
0 lines [00:00, ? lines/s]3972 lines [00:00, 39719.66 lines/s]7974 lines [00:00, 39807.26 lines/s]11925 lines [00:00, 39714.71 lines/s]15922 lines [00:00, 39789.10 lines/s]19677 lines [00:00, 39088.44 lines/s]22922 lines [00:00, 36697.98 lines/s]26157 lines [00:00, 29681.15 lines/s]29386 lines [00:00, 30416.83 lines/s]33098 lines [00:00, 32157.87 lines/s]36729 lines [00:01, 33298.15 lines/s]40558 lines [00:01, 34651.22 lines/s]44481 lines [00:01, 35905.91 lines/s]48332 lines [00:0

#### From standard input to a file (the same can be done with Nanopolish call-methylation

In this example the split_group option was also selected leading to motif groups to be split in individual positions 

In [3]:
%%bash

# Load local bashrc and activate virtual environment
source ~/.bashrc
workon Nanopolish_0.11.1

cat data/freq_meth_calculate/methylation_calls.tsv | NanopolishComp Freq_meth_calculate -o ./output/freq_meth_calculate/ --split_group

head ./output/freq_meth_calculate/out_freq_meth_calculate.bed
head ./output/freq_meth_calculate/out_freq_meth_calculate.tsv
head ./output/freq_meth_calculate/out_freq_meth_calculate.log

track name='out' description='Methylation frequency track generated with NanopolishComp' useScore=1
chr-XII	451856	451857	289557	000090	-
chr-XII	451976	451977	289563	000071	-
chr-XII	451979	451980	289564	000071	-
chr-XII	452137	452138	289574	000050	-
chr-XII	452217	452218	289577	000272	-
chr-XII	452220	452221	289578	000272	-
chr-XII	452331	452332	288999	000062	+
chr-XII	452340	452341	289000	000062	+
chr-XII	452632	452633	289008	000066	+
chrom	start	end	strand	site_id	methylated_reads	unmethylated_reads	ambiguous_reads	sequence	num_motifs	meth_freq
chr-XII	465335	465336	-	290122	6	5	29	CAGCACGACGG	2	0.545455
chr-XII	465338	465339	-	290123	6	5	29	CACGACGGAGT	2	0.545455
chr-XII	456198	456199	-	289760	7	7	32	CAGCACGACGG	2	0.500000
chr-XII	456201	456202	-	289761	7	7	32	CACGACGGAGT	2	0.500000
chr-XII	460773	460774	+	289338	17	20	77	CTCATCGTATA	1	0.459459
chr-XII	457104	457105	+	289216	6	8	43	AACTACGAGCT	1	0.428571
chr-XII	461274	461275	+	289359	5	7	85	TTATCCGAATG	1	0.416667
chr-XII	463406	4

## Checking arguments ##
Test input file readability
Testing output dir writability
## Parsing methylation_calls file ##
Starting to parse file methylation_calls file
0 lines [00:00, ? lines/s]3660 lines [00:00, 36594.36 lines/s]7305 lines [00:00, 36549.98 lines/s]11022 lines [00:00, 36732.82 lines/s]14590 lines [00:00, 36404.52 lines/s]18237 lines [00:00, 36423.08 lines/s]21489 lines [00:00, 35155.99 lines/s]25229 lines [00:00, 35798.62 lines/s]28738 lines [00:00, 35580.30 lines/s]32470 lines [00:00, 36084.52 lines/s]36035 lines [00:01, 35952.29 lines/s]39662 lines [00:01, 36044.87 lines/s]43309 lines [00:01, 36170.40 lines/s]46878 lines [00:01, 35059.47 lines/s]50474 lines [00:01, 35324.45 lines/s]53989 lines [00:01, 35260.27 lines/s]57580 lines [00:01, 35452.52 lines/s]61173 lines [00:01, 35592.97 lines/s]64912 lines [00:01, 36113.56 lines/s]68522 lines [00:01, 35292.55 lines/s]72054 lines [00:02, 33361.66 lines/s]75414 lines [00:02, 30677.86 lines/s]78612 lin

## Python API usage

### Import the package

In [4]:
# Import main program
from NanopolishComp.Freq_meth_calculate import Freq_meth_calculate

# Import helper functions
from NanopolishComp.common import jhelp, head

### python API help

In [5]:
jhelp(Freq_meth_calculate)

---

**NanopolishComp.Freq_meth_calculate.__init__**

Calculate methylation frequency at genomic CpG sites from the output of nanopolish call-methylation

---

* **input_fn** *: str (required)*

Path to a nanopolish call_methylation tsv output file

* **outdir** *: str (default = ./)*

Path to the output folder

* **outprefix** *: str (default = out)*

text outprefix for all the files generated

* **min_llr** *: float (default = 2.5)*

Log likelihood ratio threshold

* **min_depth** *: int (default = 10)*

Minimal number of reads covering a site to be reported

* **min_meth_freq** *: float (default = 0.05)*

Minimal methylation frequency of a site to be reported

* **split_group** *: bool (default = False)*

If True, multi motif groups (sequence with close motifs) are split in individual site

* **motif** *: {cpg,gpc,dam,dcm} (default = cpg)*

Methylation motif type

* **verbose** *: bool (default = False)*

Increase verbosity

* **quiet** *: bool (default = False)*

Reduce verbosity



### Example usage

#### basic setting

In [6]:
f = Freq_meth_calculate(
    input_fn="./data/freq_meth_calculate/methylation_calls.tsv",
    outdir="./output/freq_meth_calculate/")

head("./output/freq_meth_calculate/out_freq_meth_calculate.log")
head("./output/freq_meth_calculate/out_freq_meth_calculate.tsv")
head("./output/freq_meth_calculate/out_freq_meth_calculate.bed")

## Checking arguments ##
Test input file readability
Testing output dir writability
## Parsing methylation_calls file ##
Starting to parse file methylation_calls file
605248 lines [00:18, 32549.90 lines/s]
Filtering out positions with low coverage or methylation frequency
100%|██████████| 340081/340081 [00:00<00:00, 517704.63 positions/s]
## Write output files ##
Writing bed file
Writing tsv file
Writing log file


General options:                      
package_name: NanopolishComp          
package_version: 0.6.1                
timestamp: 2019-05-02 10:37:09.200694 
quiet: False                          
verbose: False                        
motif: cpg                            
split_group: False                    
min_meth_freq: 0.05                   
min_depth: 10                         

chrom   start  end    strand site_id methylated_reads unmethylated_reads ambiguous_reads sequence       num_motifs meth_freq 
chr-XII 465335 465339 -      217103  6                5                  29              CAGCACGACGGAGT 2          0.545455  
chr-XII 456198 456202 -      216869  7                7                  32              CAGCACGACGGAGT 2          0.500000  
chr-XII 460773 460774 +      216597  17               20                 77              CTCATCGTATA    1          0.459459  
chr-XII 457104 457105 +      216519  6                8                  43              AACTACGAGCT    1

#### split motif groups in individual genomic position 

In [7]:
f = Freq_meth_calculate(
    input_fn="./data/freq_meth_calculate/methylation_calls.tsv",
    outdir="./output/freq_meth_calculate/",
    split_group=True)

head("./output/freq_meth_calculate/out_freq_meth_calculate.log")
head("./output/freq_meth_calculate/out_freq_meth_calculate.tsv")
head("./output/freq_meth_calculate/out_freq_meth_calculate.bed")

## Checking arguments ##
Test input file readability
Testing output dir writability
## Parsing methylation_calls file ##
Starting to parse file methylation_calls file
605248 lines [00:20, 28854.33 lines/s]
Filtering out positions with low coverage or methylation frequency
100%|██████████| 455542/455542 [00:00<00:00, 510214.98 positions/s]
## Write output files ##
Writing bed file
Writing tsv file
Writing log file


General options:                      
package_name: NanopolishComp          
package_version: 0.6.1                
timestamp: 2019-05-02 10:37:28.561040 
quiet: False                          
verbose: False                        
motif: cpg                            
split_group: True                     
min_meth_freq: 0.05                   
min_depth: 10                         

chrom   start  end    strand site_id methylated_reads unmethylated_reads ambiguous_reads sequence    num_motifs meth_freq 
chr-XII 465335 465336 -      630203  6                5                  29              CAGCACGACGG 2          0.545455  
chr-XII 465338 465339 -      630204  6                5                  29              CACGACGGAGT 2          0.545455  
chr-XII 456198 456199 -      629841  7                7                  32              CAGCACGACGG 2          0.500000  
chr-XII 456201 456202 -      629842  7                7                  32              CACGACGGAGT 2          0.500

#### Changing filtering threshold (not recommended)

In [8]:
f = Freq_meth_calculate(
    input_fn="./data/freq_meth_calculate/methylation_calls.tsv",
    outdir="./output/freq_meth_calculate/",
    min_llr=1,
    min_depth=20,
    min_meth_freq=0.3)

head("./output/freq_meth_calculate/out_freq_meth_calculate.log")
head("./output/freq_meth_calculate/out_freq_meth_calculate.tsv")
head("./output/freq_meth_calculate/out_freq_meth_calculate.bed")

## Checking arguments ##
Test input file readability
Testing output dir writability
## Parsing methylation_calls file ##
Starting to parse file methylation_calls file
605248 lines [00:19, 31538.44 lines/s]
Filtering out positions with low coverage or methylation frequency
100%|██████████| 340081/340081 [00:00<00:00, 539066.80 positions/s]
## Write output files ##
Writing bed file
Writing tsv file
Writing log file


General options:                      
package_name: NanopolishComp          
package_version: 0.6.1                
timestamp: 2019-05-02 10:37:50.556724 
quiet: False                          
verbose: False                        
motif: cpg                            
split_group: False                    
min_meth_freq: 0.3                    
min_depth: 20                         

chrom   start  end    strand site_id methylated_reads unmethylated_reads ambiguous_reads sequence       num_motifs meth_freq 
chr-XII 461541 461542 +      1012246 26               3                  52              AATTCCGAGGG    1          0.896552  
chr-XII 455556 455557 +      1012101 16               6                  20              AGATCCGTTGT    1          0.727273  
chr-XII 462606 462607 +      1012278 21               9                  28              AATTCCGGGGT    1          0.700000  
chr-XII 458208 458209 -      1012544 25               15                 37              TTAAACGCAAA    1