-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RG handeling #83
Comments
I would just run smoove on the merged files ( |
Thank you for you input. I have one sample though, where the insert sizes vary dramatically (from 200-800) depending on the library. Is it still ok to run smoove on the merged file in this case? |
it's not ideal, but it's what will work. you'll just loose a little resolution for smaller events that have discordant reads, but not splitters--which should be fairly rare anyway. |
Thanks for your input!! |
So, I have multiple samples with multiple lanes and libraries - some samples just have a simple design: one lane or more lanes.
Some samples have a more complex design: multiple libraries with different insert sizes with multiple lanes each.
I have mapped each read group separately with
bwa mem -M -R
and defined the read groups as well as I could (it's public data) in the bam file, resulting inThen I've fed the bam files that belong to the same individual sample to MarkDuplicates which not only marks the duplicates but also merges the bam file (keeping the RG tags). So I now have
In this thread you mentioned I should try smoove and do the following steps when using smoove.
So how do I go about this now concerning different RG?
In the other thread you also mentioned that I should NOT merge things.
I am not merging anything across samples. I am only merging RGs that belong to the same sample (multiple libs and lanes). But yes - in some cases I do merge different libs but from the same biosample (it's SRA data).
If I understood correctly, I cannot feed in the merged bam files (I created them as GATK needs that in contrast) but the individual RG level bam files. So, I feed all the RG level bam files over all samples to smoove in one go? (I have a small cohort (way below 40 samples)).
smoove call -x --name my-cohort --exclude $bed --fasta $fasta -p $threads --genotype /path/to/*.bam
so *.bam would be
Does the tool recognise the different RG belonging to the same sample and then produces a multi-sample file containing only sample 1 and sample 2 (and not the RG levels)?
Sorry for the long post...
Thanks so much!! I very much appreciate your help!
The text was updated successfully, but these errors were encountered: