Skip to content

djhshih/rgsam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rgsam

travis-ci codecov

Infer read-group information from read names in SAM or FASTQ file.

Installation

make
make install

Usage

usage: rgsam [command]

commands:
  collect    collect read-group information from SAM or FASTQ file
  split      split SAM or FASTQ file based on read-group
  tag        tag reads in SAM file with read-group field
  qnames     list supported read name formats
  version    print version

Read-group identifier (ID) and platform unit (PU) are inferred from read names according to supported read name formats:

{
  "illumina-1.0": {
    "format": "@{flowcell}-{instrument}:{lane}:{tile}:{x}:{y}#{sample}/{pair}",
    "example": "@HWUSI-EAS100R:6:73:941:1973#0/1"
  },
  "illumina-1.8": {
    "format": "@{instrument}:{run}:{flowcell}:{lane}:{tile}:{x}:{y}",
    "example": "@EAS139:136:FC706VJ:2:2104:15343:197393"
  },
  "broad-1.0": {
    "format": "@{flowcell,5}:{barcode}:{lane}:{tile}:{x}:{y}",
    "example": "@H0164ALXX140820:2:1101:10003:23460"
  }
}

Platform (PL) defaults to illumina.

Sample (SM) and library identifier (LB) may be inferred from input file name.

Files with reads from more than one sample or library are not supported.

To split BAM or SAM files containing proper @RG header lines and reads tagged with read-group field (e.g. RG:Z:H1), use instead:

samtools view -r <rg_id> <in.bam>

Example

Suppose we have a BAM file with no read-group data, then we first infer the set of read-groups by

samtools view sample.bam | rgsam collect -s sample -o rg.txt

Now we can tag the reads with read-group information (any existing read-group tags will be replaced).

samtools view -h sample.bam | 
  rgsam tag -r rg.txt |
  samtools view -b - > sample.rg.bam

Note that we use the -h flag of samtools view to ensure that other header data are preserved (any existing @RG will be replaced).

About

Infer read-group information from read names in SAM or FASTQ file.

Resources

License

Stars

Watchers

Forks

Packages

No packages published