Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many Short reads can I need? And How can I get the the Chimera trimmed and no LQ trimmed sequence? #75

Closed
52teth opened this issue Aug 25, 2016 · 2 comments

Comments

@52teth
Copy link

52teth commented Aug 25, 2016

Hi thackl ,
Sorry to bother, but I have several problem about the proovread usage.
Q1. I have read the intro to prooread, in the short read section it says "The recommended coverage for short reads data is around 30-50X and should be specified with --coverage. " , How can I figure the coverage of my short reads? I have used the "normalize-by-median.py" in khmer package to normalize the short reads, or does the coverage is the same with the --cutoff (when the median k-mer coverage level above is above this numer the read is not kept.) in the normalize-by-median.py ?
Q2. My long reads is the quivered result after ICE, So the Long reads is all full length isoform sequence. So I am confused is there any method to get the corrected reads with only Chimera trimmed but not low quality bases trimmed?
Q3. I have used the normalized short reads to correct my long consencus reads after ICE/Quiver, the command I used is : ./proovread -l ./LR/data/split.001.fq -s ./SR/normalize/interleaved.fastq --prefix ./LR/result/split.001 --threads 6 --coverage=50 --overwrite --no-sampling
here is the statistics,
[Wed Aug 24 17:51:32 2016] Running mode: sr
[Wed Aug 24 17:51:49 2016] Running task bwa-sr-1
[Wed Aug 24 18:44:18 2016] Masked : 61.6%
[Wed Aug 24 18:44:18 2016] Running task bwa-sr-2
[Wed Aug 24 19:35:54 2016] Masked : 69.0%
[Wed Aug 24 19:35:54 2016] Running task bwa-sr-3
[Wed Aug 24 20:17:53 2016] Masked : 71.6%
[Wed Aug 24 20:17:53 2016] Running task bwa-sr-finish
[Wed Aug 24 20:27:13 2016] Masked : 60.7%
Does this mean that my short reads is not enough?

Thanks a lot for your help!

@thackl
Copy link
Contributor

thackl commented Aug 26, 2016

Q1. If I understand correctly, you are working with RNA-seq reads, not genomic data? In that case the best way is to run proovread with --no-sampling, and make no assumptions about coverage.

Q2. It is possible, I have drafted a new FAQs section on chimeras etc., for now available here:
https://github.com/BioInf-Wuerzburg/proovread/tree/doc/refactor_and_new_faqs_TH#chimeras-siamaeras-and-so-on
later here:
https://github.com/BioInf-Wuerzburg/proovread#chimeras-siamaeras-and-so-on
Let me know if the explanations and commands help.

Q3. Your reads are probably enough but not well used because of --coverage settings. This should improve with --no-sampling.

@52teth
Copy link
Author

52teth commented Aug 26, 2016

thanks a lot! It does really help👍

@thackl thackl closed this as completed Oct 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants