Using intermediate output for RNA-Bloom2 transcriptome assembly? #54

patrickaoude · 2024-08-07T14:47:11Z

Hi,

I am hoping to perform transcriptome assembly using both nanopore long read sequencing data and illumina short read sequencing data. It appears RATTLE only permits the use of long read sequencing data, so I was hoping to use the error-corrected long reads produced by running the first two steps, cluster and correct.

I then wanted to use these corrected reads with RNA-Bloom2, which permits assembly using both long and short reads.

My questions for you are:

Is this general approach sound or is there some oversight I might be making in such an approach mixing tools?
Should I also use the uncorrected.fq in addition to the corrected.fq for downstream results?
Would you recommend changing -r, --min-reads from the default of 5 to something like 2 in order to correct as many reads as possible?

Thanks for your time and any help you can provide. If this approach doesn't seem sound, can you recommend any other method of long read correction for which I do not have an existing genome available for correction?

Thanks,
Patrick

EduEyras · 2024-08-12T01:06:41Z

Hi Patrick, It's ok mixing tools. The RATTLE pipeline is modular and flexible precisely to provide the opportunity to mix and match tools and use it the most convenient way. What you propose could be a good approach. Yes, you can use the uncorrected and corrected reads together for your next analysis step.

…

--min-reads is set to 5 because we observed that having at least 5 reads to cluster together and compare with each other was needed to have a reliable correction. If you change to 2, you would correct reads based on the comparison with just two reads. Still possible, but I don't know if that would be reliable enough. Please let me know how it goes cheers Eduardo

On Thu, 8 Aug 2024 at 00:47, patrickaoude ***@***.***> wrote: Hi, I am hoping to perform transcriptome assembly using both nanopore long read sequencing data and illumina short read sequencing data. It appears RATTLE only permits the use of long read sequencing data, so I was hoping to use the error-corrected long reads produced by running the first two steps, *cluster* and *correct*. I then wanted to use these corrected reads with RNA-Bloom2 <https://github.com/bcgsc/RNA-Bloom>, which permits assembly using both long and short reads. My questions for you are: - Is this general approach sound or is there some oversight I might be making in such an approach mixing tools? - Should I also use the uncorrected.fq in addition to the corrected.fq for downstream results? - Would you recommend changing *-r, --min-reads* from the default of 5 to something like 2 in order to correct as many reads as possible? Thanks for your time and any help you can provide. If this approach doesn't seem sound, can you recommend any other method of long read correction for which I do not have an existing genome available for correction? Thanks, Patrick — Reply to this email directly, view it on GitHub <#54>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCZKB373IXG4RUYLV67DGTZQIXQJAVCNFSM6AAAAABMESRCAKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2TGNRYGA4DKNI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using intermediate output for RNA-Bloom2 transcriptome assembly? #54

Using intermediate output for RNA-Bloom2 transcriptome assembly? #54

patrickaoude commented Aug 7, 2024

EduEyras commented Aug 12, 2024 via email

Using intermediate output for RNA-Bloom2 transcriptome assembly? #54

Using intermediate output for RNA-Bloom2 transcriptome assembly? #54

Comments

patrickaoude commented Aug 7, 2024

EduEyras commented Aug 12, 2024 via email