# Freq_meth_calculate usage

Calculate methylation frequency at genomic CpG sites from the output of `nanopolish call-methylation`

## Output files format

`Freq_meth_calculate` can generates 2 files, a standard BED file and a tabulated file containing extra information

#### BED file

Standard genomic BED6 (https://genome.ucsc.edu/FAQ/FAQformat.html#format1). The score correspond to the methylation frequency multiplied by 1000. The file is sorted by coordinates and can be rendered with a genome browser such as [IGV](https://software.broadinstitute.org/software/igv/)

#### Tabulated TSV file

Contrary to the bed file, in the tabulated report, positions are ordered by decreasing methylation frequency.

The file contains the following fields:

* **chrom / start / end / strand**: Genomic coordinates of the motif or group of motifs in case split_group was not selected.
* **site_id**: Unique integer identifier of the genomic position.
* **methylated_reads / unmethylated_reads / ambiguous_reads**: Number of reads at a given genomic location with a higher likelyhood of being methylated or unmethylated or with an ambiguous methylation call.
* **sequence**: -5 to +5 sequence of the motif or group of motifs in case split_group was not selected.
* **num_motifs**: Number of motif in the group.
* **meth_freq**: Methylation frequency (out of non anbiguous calls).

## Bash command line usage

### Command line help

In [1]:
%%bash

# Load local bashrc and activate virtual environment
source ~/.bashrc
workon NanopolishComp

NanopolishComp Freq_meth_calculate --help

usage: NanopolishComp Freq_meth_calculate [-h] [-i INPUT_FN]
                                          [-b OUTPUT_BED_FN]
                                          [-t OUTPUT_TSV_FN] [-l MIN_LLR]
                                          [-d MIN_DEPTH] [-f MIN_METH_FREQ]
                                          [-v | -q]

Calculate methylation frequency at genomic CpG sites from the output of
nanopolish call-methylation

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Increase verbosity (default: False)
  -q, --quiet           Reduce verbosity (default: False)

Input/Output options:
  -i INPUT_FN, --input_fn INPUT_FN
                        Path to a nanopolish call_methylation tsv output file.
                        If not specified read from std input
  -b OUTPUT_BED_FN, --output_bed_fn OUTPUT_BED_FN
                        Path to write a summary result file in BED format
                        (default: )
  -t OUTPUT_TSV_FN, --

### Example usage

#### From an existing nanopolish call_methylation file output

In [2]:
%%bash

# Load local bashrc and activate virtual environment
source ~/.bashrc
workon NanopolishComp

NanopolishComp Freq_meth_calculate --verbose -i data/freq_meth_calculate/methylation_calls.tsv -b ./output/freq_meth_calculate/out_freq_meth_calculate.bed -t ./output/freq_meth_calculate/out_freq_meth_calculate.tsv

head ./output/freq_meth_calculate/out_freq_meth_calculate.bed
head ./output/freq_meth_calculate/out_freq_meth_calculate.tsv

track name='nanopolish_methylation' description='Methylation frequency track generated with nanopolish/NanopolishComp' useScore=1
chr-VIII	213382	213386	1	000100	-
chr-VIII	213427	213428	2	000090	-
chr-VIII	214579	214580	25	000153	-
chr-VIII	214610	214611	27	000083	-
chr-VIII	214455	214459	39	000100	+
chr-XII	451675	451676	47	000066	+
chr-XII	451856	451857	50	000066	+
chr-XII	451872	451873	51	000062	+
chr-XII	452083	452084	60	000058	+
chromosome	start	end	strand	site_id	methylated_reads	unmethylated_reads	ambiguous_reads	sequence	num_motifs	meth_freq
chr-VIII	213382	213386	-	1	1	7	2	TTCTTCGCCGACTG	2	0.100000
chr-VIII	213427	213428	-	2	1	0	10	TTTCTCGCAAA	1	0.090909
chr-VIII	214579	214580	-	25	2	1	10	GTTACCGCAGG	1	0.153846
chr-VIII	214610	214611	-	27	1	8	3	CACCCCGTTGG	1	0.083333
chr-VIII	214455	214459	+	39	1	6	3	AGAATCGTCGATTA	2	0.100000
chr-XII	451675	451676	+	47	1	7	7	TTACCCGGATC	1	0.066667
chr-XII	451856	451857	+	50	1	0	14	TACCCCGTTGT	1	0.066667
chr-XII	451872	451873	+	51	1	6	9	TAAGTC

## Options summary ##
	package_name: NanopolishComp
	package_version: 0.6.2
	timestamp: 2019-08-12 18:04:41.340804
	quiet: False
	verbose: True
	min_meth_freq: 0.05
	min_depth: 10
	min_llr: 2.5
	output_tsv_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.tsv
	output_bed_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.bed
	input_fn: data/freq_meth_calculate/methylation_calls.tsv
## Checking arguments ##
	Testing input file readability
	Check output file
		Output results in bed format
		Output results in tsv format
## Parsing methylation_calls file ##
	Write output file header
	Starting to parse file Nanopolish methylation call file
	Processing_valid site found
## Results summary ##
	total read lines: 605,248
	Total sites: 340,081
	Low coverage sites: 339,082
	Low methylation sites: 907
	Valid sites: 92


#### Changing filtering threshold (not recommended)

In [3]:
%%bash

# Load local bashrc and activate virtual environment
source ~/.bashrc
workon NanopolishComp

NanopolishComp Freq_meth_calculate --verbose -i data/freq_meth_calculate/methylation_calls.tsv -b ./output/freq_meth_calculate/out_freq_meth_calculate.bed -t ./output/freq_meth_calculate/out_freq_meth_calculate.tsv --min_llr 2 --min_depth 15 --min_meth_freq 0.01

head ./output/freq_meth_calculate/out_freq_meth_calculate.bed
head ./output/freq_meth_calculate/out_freq_meth_calculate.tsv

track name='nanopolish_methylation' description='Methylation frequency track generated with nanopolish/NanopolishComp' useScore=1
chr-XII	451675	451676	0	000066	+
chr-XII	451856	451857	3	000133	+
chr-XII	451872	451873	4	000062	+
chr-XII	451915	451916	6	000062	+
chr-XII	451940	451941	7	000062	+
chr-XII	452083	452084	13	000117	+
chr-XII	452137	452138	16	000058	+
chr-XII	452238	452239	20	000058	+
chr-XII	452331	452341	24	000055	+
chromosome	start	end	strand	site_id	methylated_reads	unmethylated_reads	ambiguous_reads	sequence	num_motifs	meth_freq
chr-XII	451675	451676	+	0	1	8	6	TTACCCGGATC	1	0.066667
chr-XII	451856	451857	+	3	2	0	13	TACCCCGTTGT	1	0.133333
chr-XII	451872	451873	+	4	1	7	8	TAAGTCGTATA	1	0.062500
chr-XII	451915	451916	+	6	1	10	5	CAATTCGCCAG	1	0.062500
chr-XII	451940	451941	+	7	1	1	14	CTTTCCGCCAA	1	0.062500
chr-XII	452083	452084	+	13	2	8	7	TCCAGCGGATG	1	0.117647
chr-XII	452137	452138	+	16	1	1	15	TTATCCGAATG	1	0.058824
chr-XII	452238	452239	+	20	1	4	12	GCTCACGTTCC	1	0.058824
chr

## Options summary ##
	package_name: NanopolishComp
	package_version: 0.6.2
	timestamp: 2019-08-12 18:07:10.885082
	quiet: False
	verbose: True
	min_meth_freq: 0.01
	min_depth: 15
	min_llr: 2.0
	output_tsv_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.tsv
	output_bed_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.bed
	input_fn: data/freq_meth_calculate/methylation_calls.tsv
## Checking arguments ##
	Testing input file readability
	Check output file
		Output results in bed format
		Output results in tsv format
## Parsing methylation_calls file ##
	Write output file header
	Starting to parse file Nanopolish methylation call file
	Processing_valid site found
## Results summary ##
	total read lines: 605,248
	Total sites: 340,081
	Low coverage sites: 339,180
	Low methylation sites: 518
	Valid sites: 383


## Python API usage

### Import the package

In [4]:
# Import main program
from NanopolishComp.Freq_meth_calculate import Freq_meth_calculate

# Import helper functions
from NanopolishComp.common import jhelp, head

### python API help

In [5]:
jhelp(Freq_meth_calculate)

---

**NanopolishComp.Freq_meth_calculate.__init__**

Calculate methylation frequency at genomic CpG sites from the output of nanopolish call-methylation

---

* **input_fn** *: str (required)*

Path to a nanopolish call_methylation tsv output file

* **output_bed_fn** *: str (default = )*

Path to write a summary result file in BED format

* **output_tsv_fn** *: str (default = )*

Path to write an more extensive result report in TSV format

* **min_llr** *: float (default = 2.5)*

Log likelihood ratio threshold

* **min_depth** *: int (default = 10)*

Minimal number of reads covering a site to be reported

* **min_meth_freq** *: float (default = 0.05)*

Minimal methylation frequency of a site to be reported

* **verbose** *: bool (default = False)*

Increase verbosity

* **quiet** *: bool (default = False)*

Reduce verbosity



### Example usage

#### basic setting

In [None]:
f = Freq_meth_calculate(
    input_fn="./data/freq_meth_calculate/methylation_calls.tsv",
    output_bed_fn="./output/freq_meth_calculate/out_freq_meth_calculate.bed",
    verbose=True)

head("./output/freq_meth_calculate/out_freq_meth_calculate.bed")

## Options summary ##
	package_name: NanopolishComp
	package_version: 0.6.2
	timestamp: 2019-08-12 18:10:36.091502
	quiet: False
	verbose: True
	min_meth_freq: 0.05
	min_depth: 10
	min_llr: 2.5
	output_tsv_fn: 
	output_bed_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.bed
	input_fn: ./data/freq_meth_calculate/methylation_calls.tsv
## Checking arguments ##
	Testing input file readability
	Check output file
		Output results in bed format
## Parsing methylation_calls file ##
	Write output file header
	Starting to parse file Nanopolish methylation call file
	Processing_valid site found


#### Changing filtering threshold (not recommended)

In [7]:
f = Freq_meth_calculate(
    input_fn="./data/freq_meth_calculate/methylation_calls.tsv",
    output_tsv_fn="./output/freq_meth_calculate/out_freq_meth_calculate.tsv",
    min_llr=1,
    min_depth=20,
    min_meth_freq=0.3)

head("./output/freq_meth_calculate/out_freq_meth_calculate.tsv")

## Checking arguments ##
## Parsing methylation_calls file ##
	Starting to parse file Nanopolish methylation call file
	Processing_valid site found
## Results summary ##
	total read lines: 605,248
	Total sites: 340,081
	Low coverage sites: 339,249
	Low methylation sites: 821
	Valid sites: 11


chromosome start  end    strand site_id methylated_reads unmethylated_reads ambiguous_reads sequence       num_motifs meth_freq 
chr-XII    455556 455557 +      1089    16               6                  20              AGATCCGTTGT    1          0.380952  
chr-XII    461541 461542 +      1234    26               3                  52              AATTCCGAGGG    1          0.320988  
chr-XII    462606 462607 +      1266    21               9                  28              AATTCCGGGGT    1          0.362069  
chr-XII    464693 464694 +      1327    15               4                  15              AGATCCGTTGT    1          0.441176  
chr-XII    454376 454377 -      1418    10               7                  16              TCTTTCGGGTC    1          0.303030  
chr-XII    456198 456202 -      1464    17               13                 16              CAGCACGACGGAGT 2          0.369565  
chr-XII    458208 458209 -      1516    25               15                 37              TTAAA