This tool allows users to find methylated motifs in CpG context and also motifs that contain modified bases.
mEpigram requires a graph of possible k-mer interactions to function. Download the graphs here:
If you want to generate motif logos (Optional), please install WebLOGO on your computer.
To run mEpigram: Use mepigram_wrapper.py script.
Also, the package includes several tools to help preprocess your data before using the program.
You can use the included pipeline to run mepigram, it will output motifs and their enrichement scores.
To very quickly test the pipeline, go to the program main directory and execute:
python mepigram_wrapper.py testfiles/data_typeE/test.typeE.faa testfiles/data_typeE/background_typeE-5.tsv testfiles/data_typeE/graphE-5mer/ typeE -o test.typeE
python mepigram_wrapper.py testfiles/data_typeEF/test.typeEF.faa testfiles/data_typeEF/background_typeEF-5.tsv testfiles/data_typeEF/graphEF-5mer/ typeEF -o test.typeEF
If you use k=8 by inputting background_typeE-8.tsv (you need to generate this), graphE-8mer (download it from our website) instead, you should be able to find several highly enriched m-motifs.
*Note: This pipeline must be executed in the mepigram main directory. For more information, execute: python mepigram_wrapper.py -h
mEpigram preprocessing scripts:
Insert methylation information into the genome, the input is assumed to be in BED format by default. WIG format can be used with --wig. In BED format, each line contains chromosome name, start location (0-based index), start location +1. An output directory will be created to contain the new genome with methylation information. The reference genome should be in a directory format, with each chromosomal sequence contained in a separate file, labeled by its chromosome name.
python modifyReference.py -f input.bed -r reference_genome_directory -o methyl_ref_genomeA
python modifyReference.py --typeEF -f input.bed -r reference_genome_directory -o methyl_ref_genomeA
Make methylated sequences from bed files and the genome above:
python bedToFasta.py -f input.bed -r methyl_ref_genomeA -o output.faa
Make background model: Count the number of k-mers in the genome. This might take a while (a few hours on human whole genome) but you only need to do this once per reference genome. Example: Count the number of 5-mers in the sample genome:
python bgModel.py -gd testfiles/samplegenome/ -k 5 -m typeE
python bgModel.py -gd testfiles/samplegenome/ -k 5 -m typeEF
To identify locations of matches using your motifs, you can use the motif scanning tool. The program takes a FASTA file, a motif PWM file, and a background file that states background base composition. Although the background is optional, it is recommended that you use the appropriate background as the program will assume equal nucleotide distribution if the background is not provided.
python motifscannerA.py [options] -f fastafile -m motiffile -o output_file -b backgroundBaseComposition
*Note: The motifscannerA.py file should be executed in the main mEpigram directory
Making motif LOGOs:
You can generate both types of logo using the
makeLOGO.py script. Use the flag --typeEF to generate typeEF motifs.