-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IdentifyVariants #182
Comments
Ok, it's a bit more complicated that my initial comment. I've added an example/gridss_separate.sh scripts so you can a) see what each step does, and b) don't have to run the full GRIDSS pipeline when you don't need to. The problem with just running independent assemblies is that the assembly.bam records the per-sample support according to input ordinal, not input name. This means if you merge all the assembly bams together the variant calling will consider every assembly to come from the first sample.
your second batch would look like:
and so on. It is important that the input file and label ordering matches for every batch! By using empty.bam (a BAM file containing zero reads) as the input file for each input not included in the batch, we don't incorporate them into the assembly, but we keep all the assembly bam files consistent with each other. The other issue is that assembly contig names will be reused across each batch. We can solve this issue by prepending the batch name to every read name within each assembly bam. Do this before invoking gridss.SoftClipToSplitReads on the assembly bam so you don't have to worry about split reads. So the pipeline would look something like:
|
Hi - Is there a way I could do this if I've already run the entire pipeline on each of my 200+ samples? Thanks |
If you've already run the entire pipeline on 200 samples, then you'll have 200 VCFs. Are you wanting to go back to make a single VCF with 200 samples in it? What is your use case? |
Hi,
I would like to call CNVs on a population of 200 individuals. Ideally this
dataset would enable me to make statements about frequencies of various
structural variants. (I am working with plant data which is sometimes
tricky because there are lots of duplications.) I ran the pipeline
individually because I thought I had too many samples for the multi-sample
option. I saw this post #182
<#182> and then understood
that batch method you describe might be more appropriate. However, given
that I already have per-sample VCFs is there a way you'd recommend to merge
the individuals? (unfortunately I don't have the intermediate files). Or
would you recommend re-running with the batch method? Also - how long would
I expect a single batch take - if I run 40 samples with an avg of 16x
coverage?
Thanks,
Zoe
…On Sun, Oct 27, 2019 at 7:54 PM Daniel Cameron ***@***.***> wrote:
If you've already run the entire pipeline on 200 samples, then you'll have
200 VCFs. Are you wanting to go back to make a single VCF with 200 samples
in it? What is your use case?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_PapenfussLab_gridss_issues_182-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAKFUYUYBABOPDL34NINDEMLQQYS2VA5CNFSM4GG37KY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECLLPCI-23issuecomment-2D546748297&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=163oVAfqYW8rLwmeRfPScQ&m=W5xdsC50Qmn5fVFdSAsUPXWO9W_nTaYqnRYCxNxQEb4&s=ha4TcAK2ObtFV3QV_2FvDcINMTgjTybu5YbBjPkNOCw&e=>,
or unsubscribe
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AKFUYUZ7HJVZWJ7PNOHTM43QQYS2VANCNFSM4GG37KYQ&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=163oVAfqYW8rLwmeRfPScQ&m=W5xdsC50Qmn5fVFdSAsUPXWO9W_nTaYqnRYCxNxQEb4&s=BewGcouYQUAht2532jTAAeVHojMPy8zZWAU-1dXoG4o&e=>
.
--
Zoe Lye
P.h.D. candidate
12 Waverly Place
New York University
New York, NY 10003
|
Hi,
I would like to do a "merge call" using
IdentifyVariants
as suggested in #111. And I have learned that IdentifyVariants takes Breakend assemblies as required inputs ( which should be merged before ) and Coordinate-sorted BAM file as optional inputs. So my questions are:Thank you very much!
Songtao Gui
The text was updated successfully, but these errors were encountered: