Skip to content

britishcoffee/BSImp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BSImp:

📣 Imputation recovers partially observed methylation patterns for the analysis of methylation heterogeneity at a large proportion of regions genomewide and also estimates methylation levels accurately.

Github All Releases issue star

Publication

Ya-Ting Sabrina Chang, Ming-Ren Yen, Pao-Yang Chen (2022) BSImp: imputing partially observed methylation patterns for evaluating methylation heterogeneity Frontiers in Bioinformatics, Research Topic in Computational Methods for Analysis of DNA Methylation Data https://doi.org/10.3389/fbinf.2022.815289

Pipeline

Table of Contents

System requirements

  • python 2.7 +
  • pandas package 0.24 +
  • pysam package 0.16.0.1 +
  • joblib package

Can be fulfilled by running one of the following lines

pip install MeHscr
pip3 install MeHscr

Genome screening of imputation and methylation profiling

1. Download example folder or script bsimp.py

git clone https://github.com/britishcoffee/BSImp.git
cd BSImp
wget https://raw.githubusercontent.com/britishcoffee/BSImp/main/bsimp.py

(Optional) 2. Open a folder named "MeHdata" or anything you like under the same directory

mkdir MeHdata
mkdir myinputfiles

3. Place .bam and .bam.bai files of all samples you wish to obtain methylation heterogeneity profiles into folder MeHdata/ or myinputfiles/

scp [directory_to_bamfiles_of_all_samples].bam* ./MeHdata
# or within MeHdata/
ln -s [directory_to_bamfiles_of_all_samples].bam* ./

4. Also place .fa and .fa.fai of the reference genome into the folder

scp [directory_to_reference_genome].fa* ./MeHdata
# or within MeHdata/
ln -s [directory_to_reference_genome].fa* ./

5. Run the program bsimp.py (see examples below)

Usage
$ python bsimp.py -h
usage: bsimp.py [-h] [-w WINDOWSIZE] [-c CORES] [--CG] [--CHG] [--CHH]
                [-mC MINDEPTH] [-f FOLDERNAME] [--opt]

optional arguments:
  -h, --help            show this help message and exit
  -w WINDOWSIZE, --windowsize WINDOWSIZE
                        number of CGs
  -c CORES, --cores CORES
                        number of cores
  --CG                  Include genomic context CG
  --CHG                 Include genomic context CHG
  --CHH                 Include genomic context CHH
  -mC MINDEPTH, --mindepth MINDEPTH
                        Minimum depth per cytosine
  -f FOLDERNAME, --foldername FOLDERNAME
                        Folder name of the location of input files
  --opt                 Output original count of patterns
  -mML MINML, --minML MINML
                        minimum methylation level for the consideration of examination of windows for CHG and CHH contexts
Examples
# 'CG' only with window size of 4 cytosines and 4 cores parallel processing (default minimum depth for output is 4 reads at a cytosine)
    python bsimp.py -w 4 -c 4 --CG 
# 'CG', 'CHG' and 'CHH' with window size of 4 cytosines and minimum depth for output of 8 reads
# between methylation patterns and 8 cores parallel processing
    python bsimp.py -w 4 -c 8 --CG --CHG --CHH -mC 8 -f MeHdata
# 'CG', 'CHG' and 'CHH' with window size of 4 cytosines and minimum depth for output of 8 reads
# between methylation patterns and 8 cores parallel processing, minimum methylation levels for CHG/CHH 
# outputs and output original counts of methylation patterns (prior to imputation)
    python bsimp.py -w 4 -c 8 --CG --CHG --CHH -mC 8 -f MeHdata -mML 0.05 --o

The programme is running at folder "/MeHdata"

One of the output file

Format desctiptions: (1) chrom: chromosome (2) pos: (starting cytosine) position for methylation patterns and position for read copy number (3)-(18) pxx: copy number of methylation pattern

  • p01: '0000' - UUUU - copy number of methylation pattern: all unmethylated
  • p02: '1000' - MUUU
  • p03: '0100' - UMUU
  • p04: '1100' - MMUU
  • p05: '0010' - UUMU
  • p06: '1010' - MUMU
  • p07: '0110' - UMMU
  • p08: '1110' - MMMU
  • p09: '0001' - UUUM
  • p10: '1001' - MUUM
  • p11: '0101' - UMUM
  • p12: '1101' - MMUM
  • p13: '0011' - UUMM
  • p14: '1011' - MUMM
  • p15: '0111' - UMMM
  • p16: '1111' - MMMM - copy number of methylation pattern: all methylated

(19) M: # of methylated C/G (20) UM: # of unmethylated C/G (T/A) (21) strand: f(orward)/r(everse)

Contact

Sabrina- 📧 ytchang.sabrina@gmail.com

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages