/
README.txt
116 lines (82 loc) · 3.31 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
Methyl-Analyzer is a python package that analyzes genome-wide DNA methylation
data from the methylation mapping analysis by paired-end sequencing (Methyl-MAPS)
method.
1. Requirements
===============
* Python
- python 2.7
- numpy
- pyfasta
- pysam
One easy way to install the above packages is to use 'pip'. For example,
$ sudo pip install numpy
2. Installation
===============
2.1 Download Methyl-Analyzer from:
http://github.com/epigenomics/methylmaps
or clone the repository via git:
git clone git://github.com/epigenomics/methylmaps.git
2.2 At the methylmaps directory, run the following command:
$ python setup.py build
$ sudo python setup.py install
All data analysis scripts are stored under the python bin directory.
3. Pre-analysis data preparation
================================
Compile required annotation files for the data analysis pipeline
3.1 CpG/RE/McrBC sites
* Input
1) chromosome name
2) fasta sequence for the chromosome
* Example:
$ parse_sites.py chr11 chr11.fa
4. Data Analysis Pipeline
=========================
4.1 Parsing paired-end reads
* Input:
1) Paired-end reads sequenced by SOLiD platform.
Note: the current parser works only for SOLiD mate-pair format.
2) Chromosome mapping file for mapping the chromosome IDs used in the
mate-pair format file to the formal chromosome names.
* Example:
$ parse_mates.py --out_dir /path/to/frag_dir human.cmap \
re_reads.mates
4.2 Filtering methylated/unmethylated fragments
* Input:
1) Parameters used in the filtering process. See the example parameter file
at data/filter_para.
2) CpG/RE/McrBC annotation files (generated by 'parse_sites.py')
3) Parsed RE fragments (by 'parse_mates.py')
4) Parsed McrBC fragments (by 'parse_mates.py')
* Example:
$ filter.py --out_dir /path/to/filter_dir --para filter_para chr1 \
/path/to/anno_dir chr1_re chr1_mcrbc
4.3 Estimating CpG methylation probability
* Input:
1) Global methylation level obtained by LUMA assay
2) Methylated/unmethylated information collected by 'filter.py'
* Example:
$ score.py --out_dir /path/to/meth_dir 0.71 chr1 247249719 \
/path/to/filter_dir/methdata_chr1
4.4 Alternative procedure
Run the all-in-one script for data analysis from the beginning to the end
* Input:
1) Parameters required for the pipeline. See example at
data/pipeline_paras_hg18
2) RE reads in the mate-pair format
3) McrBC reads in the mate-pair format
4) Define output directory
* Example:
$ run_pipeline.py --format mates pipeline_paras_hg18 re_reads.mates \
mcrbc_reads.mates /path/to/out_dir --run
5. Visualization
================
Create BED, Microarray, and Wiggle format to visually check DNA methylation
profiles.
* BED tracks for CpG/RE/McrBC sites: use 'create_cpg_track.py'
* BED tracks for RE/McrBC fragments: use 'create_frag_track.py'
* Microarray tracks for DNA methylation profiles: use 'create_microarray_track.py'
* Wiggle tracks for DNA methylation profiles: use 'create_wiggle_track.py'
6. Citation
================
Xin Y, Ge Y, Haghighi F. Methyl-Analyzer - Whole Genome DNA Methylation
Profiling. Bioinformatics. 2011 Jun 17. [Epub ahead of print]