📣 Imputation recovers partially observed methylation patterns for the analysis of methylation heterogeneity at a large proportion of regions genomewide and also estimates methylation levels accurately.
Ya-Ting Sabrina Chang, Ming-Ren Yen, Pao-Yang Chen (2022) BSImp: imputing partially observed methylation patterns for evaluating methylation heterogeneity Frontiers in Bioinformatics, Research Topic in Computational Methods for Analysis of DNA Methylation Data https://doi.org/10.3389/fbinf.2022.815289
- python 2.7 +
- pandas package 0.24 +
- pysam package 0.16.0.1 +
- joblib package
pip install MeHscr
pip3 install MeHscr
git clone https://github.com/britishcoffee/BSImp.git
cd BSImp
wget https://raw.githubusercontent.com/britishcoffee/BSImp/main/bsimp.py
mkdir MeHdata
mkdir myinputfiles
3. Place .bam and .bam.bai files of all samples you wish to obtain methylation heterogeneity profiles into folder MeHdata/ or myinputfiles/
scp [directory_to_bamfiles_of_all_samples].bam* ./MeHdata
# or within MeHdata/
ln -s [directory_to_bamfiles_of_all_samples].bam* ./
scp [directory_to_reference_genome].fa* ./MeHdata
# or within MeHdata/
ln -s [directory_to_reference_genome].fa* ./
$ python bsimp.py -h
usage: bsimp.py [-h] [-w WINDOWSIZE] [-c CORES] [--CG] [--CHG] [--CHH]
[-mC MINDEPTH] [-f FOLDERNAME] [--opt]
optional arguments:
-h, --help show this help message and exit
-w WINDOWSIZE, --windowsize WINDOWSIZE
number of CGs
-c CORES, --cores CORES
number of cores
--CG Include genomic context CG
--CHG Include genomic context CHG
--CHH Include genomic context CHH
-mC MINDEPTH, --mindepth MINDEPTH
Minimum depth per cytosine
-f FOLDERNAME, --foldername FOLDERNAME
Folder name of the location of input files
--opt Output original count of patterns
-mML MINML, --minML MINML
minimum methylation level for the consideration of examination of windows for CHG and CHH contexts
# 'CG' only with window size of 4 cytosines and 4 cores parallel processing (default minimum depth for output is 4 reads at a cytosine)
python bsimp.py -w 4 -c 4 --CG
# 'CG', 'CHG' and 'CHH' with window size of 4 cytosines and minimum depth for output of 8 reads
# between methylation patterns and 8 cores parallel processing
python bsimp.py -w 4 -c 8 --CG --CHG --CHH -mC 8 -f MeHdata
# 'CG', 'CHG' and 'CHH' with window size of 4 cytosines and minimum depth for output of 8 reads
# between methylation patterns and 8 cores parallel processing, minimum methylation levels for CHG/CHH
# outputs and output original counts of methylation patterns (prior to imputation)
python bsimp.py -w 4 -c 8 --CG --CHG --CHH -mC 8 -f MeHdata -mML 0.05 --o
The programme is running at folder "/MeHdata"
Format desctiptions: (1) chrom: chromosome (2) pos: (starting cytosine) position for methylation patterns and position for read copy number (3)-(18) pxx: copy number of methylation pattern
- p01: '0000' - UUUU - copy number of methylation pattern: all unmethylated
- p02: '1000' - MUUU
- p03: '0100' - UMUU
- p04: '1100' - MMUU
- p05: '0010' - UUMU
- p06: '1010' - MUMU
- p07: '0110' - UMMU
- p08: '1110' - MMMU
- p09: '0001' - UUUM
- p10: '1001' - MUUM
- p11: '0101' - UMUM
- p12: '1101' - MMUM
- p13: '0011' - UUMM
- p14: '1011' - MUMM
- p15: '0111' - UMMM
- p16: '1111' - MMMM - copy number of methylation pattern: all methylated
(19) M: # of methylated C/G (20) UM: # of unmethylated C/G (T/A) (21) strand: f(orward)/r(everse)
Sabrina- 📧 ytchang.sabrina@gmail.com