RG handeling #83

JJBio · 2019-08-13T08:00:19Z

So, I have multiple samples with multiple lanes and libraries - some samples just have a simple design: one lane or more lanes.

@RG ID:FLOWCELL1.LANE1  PL:ILLUMINA LB:Lib1 SM:Sample1
@RG ID:FLOWCELL1.LANE2  PL:ILLUMINA LB:Lib1 SM:Sample1

Some samples have a more complex design: multiple libraries with different insert sizes with multiple lanes each.

@RG ID:FLOWCELL1.LANE1  PL:ILLUMINA LB:Lib1 SM:Sample2
@RG ID:FLOWCELL1.LANE2  PL:ILLUMINA LB:Lib2 SM:Sample2
@RG ID:FLOWCELL1.LANE3  PL:ILLUMINA LB:Lib2 SM:Sample2

I have mapped each read group separately with bwa mem -M -R and defined the read groups as well as I could (it's public data) in the bam file, resulting in

FLOWCELL1.LANE1.sample1.bam
FLOWCELL1.LANE2.sample1.bam
FLOWCELL1.LANE1.sample2.bam
FLOWCELL1.LANE2.sample2.bam
FLOWCELL1.LANE3.sample2.bam

Then I've fed the bam files that belong to the same individual sample to MarkDuplicates which not only marks the duplicates but also merges the bam file (keeping the RG tags). So I now have

sample1.bam
sample2.bam

In this thread you mentioned I should try smoove and do the following steps when using smoove.

Then

call each sample separately

merge the calls into one set of SV sites

genotype those sites for each sample

So how do I go about this now concerning different RG?
In the other thread you also mentioned that I should NOT merge things.

Each “-pe” will get its own histogram, so separating out the different libraries with different properties is a good move. Do not merge thing back together. Each sample should have its down “-pe” and “-sr”.

I am not merging anything across samples. I am only merging RGs that belong to the same sample (multiple libs and lanes). But yes - in some cases I do merge different libs but from the same biosample (it's SRA data).

If I understood correctly, I cannot feed in the merged bam files (I created them as GATK needs that in contrast) but the individual RG level bam files. So, I feed all the RG level bam files over all samples to smoove in one go? (I have a small cohort (way below 40 samples)).

smoove call -x --name my-cohort --exclude $bed --fasta $fasta -p $threads --genotype /path/to/*.bam

so *.bam would be

FLOWCELL1.LANE1.sample1.bam
FLOWCELL1.LANE2.sample1.bam
FLOWCELL1.LANE1.sample2.bam
FLOWCELL1.LANE2.sample2.bam
FLOWCELL1.LANE3.sample2.bam

Does the tool recognise the different RG belonging to the same sample and then produces a multi-sample file containing only sample 1 and sample 2 (and not the RG levels)?

Sorry for the long post...
Thanks so much!! I very much appreciate your help!

The text was updated successfully, but these errors were encountered:

brentp · 2019-08-13T13:54:41Z

I would just run smoove on the merged files (sample1.bam and sample2.bam for your examples). The insert size distribution may be a bit different, but this will not matter in most cases.

JJBio · 2019-08-14T07:55:23Z

Thank you for you input. I have one sample though, where the insert sizes vary dramatically (from 200-800) depending on the library. Is it still ok to run smoove on the merged file in this case?

brentp · 2019-08-14T13:13:32Z

it's not ideal, but it's what will work. you'll just loose a little resolution for smaller events that have discordant reads, but not splitters--which should be fairly rare anyway.

JJBio · 2019-08-16T07:00:01Z

Thanks for your input!!

JJBio mentioned this issue Aug 13, 2019

RG handeling arq5x/lumpy-sv#312

Open

brentp closed this as completed Aug 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RG handeling #83

RG handeling #83

JJBio commented Aug 13, 2019

brentp commented Aug 13, 2019

JJBio commented Aug 14, 2019

brentp commented Aug 14, 2019

JJBio commented Aug 16, 2019

RG handeling #83

RG handeling #83

Comments

JJBio commented Aug 13, 2019

brentp commented Aug 13, 2019

JJBio commented Aug 14, 2019

brentp commented Aug 14, 2019

JJBio commented Aug 16, 2019