New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UMI removal #29
Comments
Hi Camile, No need to apologise. Comments are always very welcome. You raise two issues:
Regards, Tom Tom Smith, PhD From: Camille SIMONET [notifications@github.com] Hi, Is there no way to avoid the read to be truncated? Thank you, — |
Retaining the UMI on the read is not possible: if you were to leave the UMI on the read, your reads would not map. I think adding quality filtering to extract is probably a good idea in the medium to long term. However, until that point i'd recommend filtering on the UMI quality before trimming the UMI. I have previously started thinking about a tool for moving UMIs around a BAM record - certain software expects the UMI in certain places - usually either a particular place in the read name or sometimes in a particular tag. For example iCLIPro requires the UMI to be encoded as I don't think a specific column in the BAM file is a good idea as it would break compatibility with the BAM format standard. Both these are improvements that could be made to the software in future versions, but I don't think leaving the UMI on the read will ever be possible. |
Hi If a filtering option could be implemented it would be very useful. The idea of moving the UMI to a BAM field is to make it compatible with a previously designed pipeline. In particular the quality filtering actually. As suggested by @IanSudbery the best way is to do this before UMI extraction for now. Thank you both for your answer and advice. |
@casimonet The latest version of UMI-Tools (v.0.2.0) now allows reads to be filtered out during the extraction stage using the quality scores (see options --quality-threshold and --quality-encoding) |
Hi,
Sorry to open another issue. The fact that the UMI is removed from the sequence is a problem for downstream analyses. For example for building a bam file where the reads are filtered for UMI in which bases have a low quality. Or simply to keep a column with the UMI in the bam file, which can be more practical than having the UMI in the reads name for certain downstream analyses.
Is there no way to avoid the read to be truncated?
Thank you,
Camille
The text was updated successfully, but these errors were encountered: