-
Notifications
You must be signed in to change notification settings - Fork 102
confused about when/how to demux in iso-seq workflow #167
Comments
Hi @kmattioli , Your listed steps (1)-(9) look good to me! |
Yep! Here are the first 10 lines of
And for good measure, here's
(There are 3 more analogous |
Hi @kmattioli , You've already pooled the data so you need to provide a "joint" classify (flnc.report.csv), can you concatenate all the [sample].flnc.report (remember to not have redundant headers) and pass that to demux? |
Hi @Magdoll - that worked!! Thank you for your guidance! |
Hi @Magdoll -- sorry, one more demuxing related question! I ran the
Where But now I see this error:
Is this the correct way to demux fusions? |
Hi @kmattioli , Yeah, that's my fault for requiring the IDs must look like Can you do me a favor and test this? Change this line 13 from
to
If that works I'll update the code in the GitHub repo -Liz |
Hi @Magdoll - That didn't work, returned the following error:
So I did some regex testing and the following line 13 appears to work on my end for both non-fusion and fusion transcript de-muxing:
However, I then ended up with a new error in the fusion de-muxing that I can't trace back. The PB ID is parsed correctly but now the error is:
Fusions 1-29 above it appeared to work fine, so not sure what the issue with this one is.... |
Hi @kmattioli well crap now I realize you're giving fusions - I'm not sure fusions have ever been run through demux. I'm not sure without testing I can make sure it works properly. Are you willing to share data confidentially for me to make some code changes? If so, please share an email for me to send file request. Thanks, |
No problem! kaia.mattioli [at] gmail.com |
request sent!
…On Tue, Aug 3, 2021 at 3:56 PM Kaia Mattioli ***@***.***> wrote:
No problem! kaia.mattioli [at] gmail.com
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#167 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEQE32XFIRFQXSDTLLKHFLT3BX2BANCNFSM5AUL33BA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Hi @kmattioli , I fixed the PBfusion ID format support issue but also noticed there was an actual error because -Liz |
Thank you @Magdoll! I received the files. One question -- is there a reason why that fusion fails to demux? I didn't do anything special to the read_stat.txt file -- just got it from running the standard fusion finder script. |
Hi @kmattioli , There were two reasons it failed (and I fixed/address them both in Cupcake v26.2.0 or any version above). The first is the sequence IDs were expected to The second is you had a sequence PBfusion.32 that was only present in the fasta and not in the read_stat files, I'm letting the script throw a warning but not fail, so you can still get counts for the other sequences. Does this make sense? |
NOTE: please do not use Cupcake v26.0.0 or v26.1.0, only v26.2.0 or above (in case I find another bug in that ID format matching). |
Yup, makes sense! I was just wondering if it was expected for |
Hi, I used the same pipeline with the @kmattioli to process the isoseq data, but i met an error when running demux_isoseq_with_genome.py.
It warns that every transcript is not in the read_stat.txt. And the transcript ID in the read_stat.txt and the ID int the filtered.rep.fa are different indeedly. But i used the files are the output from the pipeline. What's the problem with my code?
clustered.hq.fasta.collapsed.filtered.rep.fa from the get_abundance_post_collapse.py after collapse_isoforms_by_sam.py, Thanks! |
Hello,
I'm a bit confused about when and how to properly demux in the Iso-seq workflow in the command line. Wondering what the best practice is to get from multiplexed ccs reads all the way through to HQ isoforms + abundances per sample.
The tl;dr is: should I demux when running the Iso-seq workflow itself using
lima
as alluded to in the tutorial here? And if so, how do I get from the mapped, collapsed, filtered reads (which are aggregated across samples) to demuxed abundances? Or should I perform Iso-seq without demuxing (so skip thelima
step), and demux later, using thedemux_isoseq_with_genome.py
script as described here?More info:
I have data with 5 pooled samples (and thus 5 unique primers). I followed the steps outlined in the tutorial here, which are:
subreads.bam
file [output file:ccs.bam
]lima
[output files: severalfl.primer_5p--*_3p.bam
depending on primers present]isoseq3 refine
[output files: several*flnc.bam
and*flnc.report.csv
, one per demuxed sample]all.fofn
list of above files] (I combined inputs per the documentation here)isoseq3 cluster
onall.fofn
(so no longer de-muxed) [output file:clustered.hq.fasta
(and others)]minimap2
[output file:clustered.hq.fasta.sam
]collapse_isoforms_by_sam.py
[output file:clustered.hq.collapsed.rep.fa
and others]filter_by_count.py
[output files now end inmin_fl_2
]filter_away_subset.py
on the ABOVE prefix (so on the files already filtered tomin_fl_2
) [output files now end infiltered
]Now, according to your answer here, I should run the demux script. But I only have a
flnc.report.csv
for each demuxed sample from step 3 above, but it seems like I need the overall (not-demuxed) report to rundemux_isoseq_with_genome.py
. If I run the following command, for example:I get an error:
...ostensibly because this transcript from the
read_stat.txt
file (from all samples) is not in thesample1.flnc.report.csv
file?I think I'm missing something about how to run this pipeline correctly, but I'm not sure what.
Thanks so much in advance!
The text was updated successfully, but these errors were encountered: