Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RG handeling #83

Closed
JJBio opened this issue Aug 13, 2019 · 4 comments
Closed

RG handeling #83

JJBio opened this issue Aug 13, 2019 · 4 comments

Comments

@JJBio
Copy link

JJBio commented Aug 13, 2019

So, I have multiple samples with multiple lanes and libraries - some samples just have a simple design: one lane or more lanes.

@RG ID:FLOWCELL1.LANE1  PL:ILLUMINA LB:Lib1 SM:Sample1
@RG ID:FLOWCELL1.LANE2  PL:ILLUMINA LB:Lib1 SM:Sample1

Some samples have a more complex design: multiple libraries with different insert sizes with multiple lanes each.

@RG ID:FLOWCELL1.LANE1  PL:ILLUMINA LB:Lib1 SM:Sample2
@RG ID:FLOWCELL1.LANE2  PL:ILLUMINA LB:Lib2 SM:Sample2
@RG ID:FLOWCELL1.LANE3  PL:ILLUMINA LB:Lib2 SM:Sample2

I have mapped each read group separately with bwa mem -M -R and defined the read groups as well as I could (it's public data) in the bam file, resulting in

FLOWCELL1.LANE1.sample1.bam
FLOWCELL1.LANE2.sample1.bam
FLOWCELL1.LANE1.sample2.bam
FLOWCELL1.LANE2.sample2.bam
FLOWCELL1.LANE3.sample2.bam

Then I've fed the bam files that belong to the same individual sample to MarkDuplicates which not only marks the duplicates but also merges the bam file (keeping the RG tags). So I now have

sample1.bam
sample2.bam

In this thread you mentioned I should try smoove and do the following steps when using smoove.

Then

  • call each sample separately
  • merge the calls into one set of SV sites
  • genotype those sites for each sample

So how do I go about this now concerning different RG?
In the other thread you also mentioned that I should NOT merge things.

Each “-pe” will get its own histogram, so separating out the different libraries with different properties is a good move. Do not merge thing back together. Each sample should have its down “-pe” and “-sr”.

I am not merging anything across samples. I am only merging RGs that belong to the same sample (multiple libs and lanes). But yes - in some cases I do merge different libs but from the same biosample (it's SRA data).

If I understood correctly, I cannot feed in the merged bam files (I created them as GATK needs that in contrast) but the individual RG level bam files. So, I feed all the RG level bam files over all samples to smoove in one go? (I have a small cohort (way below 40 samples)).

smoove call -x --name my-cohort --exclude $bed --fasta $fasta -p $threads --genotype /path/to/*.bam

so *.bam would be

FLOWCELL1.LANE1.sample1.bam
FLOWCELL1.LANE2.sample1.bam
FLOWCELL1.LANE1.sample2.bam
FLOWCELL1.LANE2.sample2.bam
FLOWCELL1.LANE3.sample2.bam

Does the tool recognise the different RG belonging to the same sample and then produces a multi-sample file containing only sample 1 and sample 2 (and not the RG levels)?

Sorry for the long post...
Thanks so much!! I very much appreciate your help!

@brentp
Copy link
Owner

brentp commented Aug 13, 2019

I would just run smoove on the merged files (sample1.bam and sample2.bam for your examples). The insert size distribution may be a bit different, but this will not matter in most cases.

@brentp brentp closed this as completed Aug 13, 2019
@JJBio
Copy link
Author

JJBio commented Aug 14, 2019

Thank you for you input. I have one sample though, where the insert sizes vary dramatically (from 200-800) depending on the library. Is it still ok to run smoove on the merged file in this case?

@brentp
Copy link
Owner

brentp commented Aug 14, 2019

it's not ideal, but it's what will work. you'll just loose a little resolution for smaller events that have discordant reads, but not splitters--which should be fairly rare anyway.

@JJBio
Copy link
Author

JJBio commented Aug 16, 2019

Thanks for your input!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants