New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AssertionError: not all umis are the same length(!): 4 - 5 #461
Comments
By default, It looks like you have output from cellranger, with UMI in bam tag If you add the following to your command,
Also note that running |
Thank you so much for your quick response. I tried with these options and the dedup ran fine. |
Hi @iammrtza. In the 3 lines above, the UMIs are all different lengths (UMI is TCCCCGCCC in first line), so the error message appears to be correct. Overall, it appears your UMI lengths are between 4-12. They need to all be the same length. What was the command you used to extract the UMIs from the fastq? |
Hi Tom, My Illumina reads structure is depicted here. After removing adapter/junk (using cutadapt), I did grep for the "common sequence" and what is left was UMI and I added those to the reads headers in FASTA file. Then I removed the "common sequence" using cutadapt again. |
Is it ok if I add dummy letters (e.g. A) to the shorter UMI so they can reach to the maximum length of 12 (in my Illumina reads, the longest UMI length should be 12) |
I think your strategy is probably suboptimal. As far as I understand what your doing, the potential issues are:
The read structure is identical with respect to the common sequence and UMI length, so you can use UMI-tools uses the
This will demand the UMIs are 12 nt. If your read length is insufficiently long to get through the sRNA, common sequence and UMI, you may find many of your reads don't match the regex. This will be reported in the log file. |
Just to add, looks like @IanSudbery has previously addressed this exact question here: https://www.biostars.org/p/9469084/ |
Hello,
i am trying to run UMI-tools to remove duplicated reads based on UMI. But getting error "AssertionError: not all umis are the same length(!): 4 - 5". Can anyone suggest how to resolve this error?
I am using following command :
umi_tools dedup --stdin=filtered.bam --umi-separator=":" --log=LOGFILE --output-stats=stats.txt -S output.bam > OUTFILE
This is the head of my bam file:
A00609:116:H7JCGDSXY:1:2171:13937:29825 16 1 10534 255 3S96M2D21M31S * 0 0 CGCAGTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGACGCCCCCATGTACTCTGCGTTGATACCACTGCTT FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF,FFFFFFFFF,FFFFFFFFFFFFFFFFFF:FFFF:FF:FFFFFFF:FFFFFFF:FFFFF:FFFF NH:i:1 HI:i:1 AS:i:103 nM:i:3 RE:A:I xf:i:0 li:i:0 CR:Z:CCGTGAGAGAACGCGT CY:Z::FF:F:FF:FFF,F:F CB:Z:CCGTGAGAGAACGCGT-1 UR:Z:TTTTCCGCACTT UY:ZG:Z:cells:0:1:H7JCGDSXY:1
I am getting following error:
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/bin/umi_tools", line 11, in
sys.exit(main())
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/umi_tools/umi_tools.py", line 61, in main
module.main(sys.argv)
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/umi_tools/dedup.py", line 329, in main
reads, umis, umi_counts = processor(
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/umi_tools/network.py", line 419, in call
clusters = self.UMIClusterer(counts, threshold)
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/umi_tools/network.py", line 367, in call
assert max(len_umis) == min(len_umis), (
AssertionError: not all umis are the same length(!): 4 - 5
The text was updated successfully, but these errors were encountered: