Methylated DNA motif discovery tool
Branch: master
Clone or download
Latest commit 70ad935 Oct 2, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE.txt version 0.07 Apr 5, 2018 Update Oct 2, 2018 version 0.07 Apr 5, 2018
basecomposition.H1.met.txt version 0.07 Apr 5, 2018 version 0.07 Apr 5, 2018 version 0.07 Apr 5, 2018 version 0.07 Apr 5, 2018 version 0.07 Apr 5, 2018 version 0.07 Apr 5, 2018 version 0.07 Apr 5, 2018 version 0.07 Apr 5, 2018 version 0.07 Apr 5, 2018 version 0.07 Apr 5, 2018 version 0.07 Apr 5, 2018
motifscannerB_typeE.jl version 0.07 Apr 5, 2018
motifscannerB_typeEF.jl version 0.07 Apr 5, 2018
quickPssmScanBestMatchLiteTypeEF.jl version 0.07 Apr 5, 2018 version 0.07 Apr 5, 2018
sequence_met.pyc version 0.07 Apr 5, 2018


This tool allows users to find methylated motifs in CpG context and also motifs that contain modified bases.

Version 0.07




To run mEpigram: Use script.
Also, the package includes several tools to help preprocess your data before using the program.

mEpigram Pipeline:

You can use the included pipeline to run mepigram, it will output motifs and their enrichement scores.

To very quickly test the pipeline, go to the program main directory and execute:


python testfiles/data_typeE/test.typeE.faa testfiles/data_typeE/background_typeE-5.tsv testfiles/data_typeE/graphE-5mer/ typeE -o test.typeE


python testfiles/data_typeEF/test.typeEF.faa testfiles/data_typeEF/background_typeEF-5.tsv testfiles/data_typeEF/graphEF-5mer/ typeEF -o test.typeEF

If you use k=8 by inputting background_typeE-8.tsv (you need to generate this), graphE-8mer (download it from our website) instead, you should be able to find several highly enriched m-motifs.

*Note: This pipeline must be executed in the mepigram main directory. For more information, execute: python -h

mEpigram preprocessing scripts:

  1. Insert methylation information into the genome, the input is assumed to be in BED format by default. WIG format can be used with --wig. In BED format, each line contains chromosome name, start location (0-based index), start location +1. An output directory will be created to contain the new genome with methylation information. The reference genome should be in a directory format, with each chromosomal sequence contained in a separate file, labeled by its chromosome name.

    python -f input.bed -r reference_genome_directory -o methyl_ref_genomeA


    python --typeEF -f input.bed -r reference_genome_directory -o methyl_ref_genomeA

  2. Make methylated sequences from bed files and the genome above:

    python -f input.bed -r methyl_ref_genomeA -o output.faa

  3. Make background model: Count the number of k-mers in the genome. This might take a while (a few hours on human whole genome) but you only need to do this once per reference genome. Example: Count the number of 5-mers in the sample genome:

    python -gd testfiles/samplegenome/ -k 5 -m typeE


    python -gd testfiles/samplegenome/ -k 5 -m typeEF

Motif scanning:

To identify locations of matches using your motifs, you can use the motif scanning tool. The program takes a FASTA file, a motif PWM file, and a background file that states background base composition. Although the background is optional, it is recommended that you use the appropriate background as the program will assume equal nucleotide distribution if the background is not provided.

python [options] -f fastafile -m motiffile -o output_file -b backgroundBaseComposition

*Note: The file should be executed in the main mEpigram directory

Making motif LOGOs:

You can generate both types of logo using the script. Use the flag --typeEF to generate typeEF motifs.

Data used in the manuscript: