Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: not enough values to unpack (expected 3, got 2) #273

Closed
ashwinikumarkulkarni opened this issue Aug 25, 2018 · 13 comments
Closed
Assignees

Comments

@ashwinikumarkulkarni
Copy link

Hello! I am getting an error while running COUNT command for single cell RNA-seq analysis.

Here is my command:
umi_tools count --per-gene --gene-tag=XT --per-cell --stdin=Input_STAR_Aligned_Assigned_Sorted.bam --stdout=Counts.tsv.gz --log=Counts.log --error=Counts.err --wide-format-cell-counts
Here is the error I get:
Traceback (most recent call last): File "/home2/akulk1/.local/bin/umi_tools", line 11, in <module> sys.exit(main()) File "/home2/akulk1/.local/lib/python3.6/site-packages/umi_tools/umi_tools.py", line 59, in main module.main(sys.argv) File "/home2/akulk1/.local/lib/python3.6/site-packages/umi_tools/count.py", line 155, in main gene, cell, gene_count = line.strip().split("\t") ValueError: not enough values to unpack (expected 3, got 2)

Thank you in advance for the help!

@TomSmithCGAT
Copy link
Member

Hi @ashwinikumarkulkarni. What version do you have installed? umi_tools --version

@TomSmithCGAT TomSmithCGAT self-assigned this Aug 25, 2018
@ashwinikumarkulkarni
Copy link
Author

ashwinikumarkulkarni commented Aug 25, 2018

Thank you @TomSmithCGAT for your reply!
I have following version installed.
UMI-tools version: 0.5.4

The surprising fact is, same version has worked for data from previous sequencing runs!
For my most recent sequencing run, it gave the said error message!

Thanks!

@ashwinikumarkulkarni
Copy link
Author

@TomSmithCGAT Did you get a chance to find a solution for the error?

@TomSmithCGAT
Copy link
Member

@ashwinikumarkulkarni - Apologies for the delayed reply.

I can't reproduce the error and can't find any explanation going back over the relevant code. Would it be possible to share your input (preferably a subset of the data which still yields the same error)

@rebeccagj
Copy link

rebeccagj commented Sep 19, 2018

@TomSmithCGAT, I am getting the same error on an install of umi-tools (UMI-tools version: 0.5.4) on which I've successfully done stuff before. I am going to see if I can subset my data and reproduce the error and send to you.

@rebeccagj
Copy link

Archive.zip

@TomSmithCGAT
Copy link
Member

Thanks @rebeccagj - I'll look into this now

@TomSmithCGAT
Copy link
Member

TomSmithCGAT commented Sep 20, 2018

It looks like the issue is that the XT tag can contain an empty string. Note this read has been assigned a gene (XS tag)

samtools view subset2.bam|grep A00269:51:H7Y2TDMXX:1:2227:7337:36652_CGCTATCTCCTGCCAT_CAGTTTTGCC|cut -f1,16-20
A00269:51:H7Y2TDMXX:1:2227:7337:36652_CGCTATCTCCTGCCAT_CAGTTTTGCC	XS:Z:Assigned	XN:i:1	XT:Z:

In count.py, we first write out the counts to a tempfile. This tempfile is then parsed and re-summarised to generate the final counts per cell. This is where the error occurs. The read above generates the following line in the tempfile. Note the leading space:
" \tCGCTATCTCCTGCCAT\t1"

Line 155:
gene, cell, gene_count = line.strip().split("\t")
throws an error because the strip() removes the leading whitespace, leaving
"\tCGCTATCTCCTGCCAT\t1", which is split into just 2 elements, rather than the expected 3.

@IanSudbery - Three options below. My preference is 2. What are your thoughts?

  1. Throw an error when parsing genes with an empty string since we can't de-duplicate without correct gene information
  2. Identify reads where the assigned genes is an empty string and treat these as per unassigned. We already log ''Gene skipped - matches regex' events, we would also log ''Gene skipped - empty gene name".
  3. Switch .strip() for .rstrip() to only remove trailing newline and report counts for empty gene names. I can't justify this but it's an option

@IanSudbery
Copy link
Member

I favour skipping them. Perhaps with a warning the first time.

@TomSmithCGAT
Copy link
Member

OK, I've implemented option 2 above, with a warning the first time.

@rebeccagj & @ashwinikumarkulkarni - Could you try installing from the {TS}-DebugSkipCountUnassigned branch to confirm that this resolves the issue.

@rebeccagj. using the above branch, the following command now works with your data subset. Note the inclusion of the --assigned-status-tag option. This options ensures count.py is using the correct tag to identify the reads which are unassigned - XS for featureCounts.This option should have been available in v0.5.4 but has been lingering on this branch for a while (see #191).

umi_tools count --per-gene --gene-tag=XT --assigned-status-tag=XS --per-cell --stdin=subset2.bam --stdout=Counts.tsv.gz --log=Counts.log --error=Counts.err

The end of my Counts.log contains the following lines warning about the empty gene string and logging events:

2018-09-20 10:20:23,881 WARNING Assigned gene is empty string. First such read:
                                A00269:51:H7Y2TDMXX:1:2227:7337:36652_CGCTATCTCCTGCCAT_CAGTTTTGCC	16	JH602052	13450606	255	91M	*	0	0	AGGTGTCATTTGGGTTAAAACTTCGGAATCCACAGCAATCTAAATTTCTCTGGATATCATTTCGAGCACTTGCGGTATTGTTCCAACCAAC	FFFFFFF:FFFFF:FFFF,FFFFFFF,FFFFFFFFFFFF,,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF::FFFF:F,	NH:i:1	HI:i:1	AS:i:89	nM:i:0	XS:Z:Assigned	XN:i:1	XT:Z:
2018-09-20 10:20:28,547 INFO Input Reads: 408256
2018-09-20 10:20:28,547 INFO Read skipped, no tag: 211262
2018-09-20 10:20:28,547 INFO Read skipped - gene string is empty: 7
2018-09-20 10:20:28,547 INFO Number of reads counted: 193056

Thanks again for providing the input data so I could reproduce the error.

@TomSmithCGAT
Copy link
Member

@rebeccagj & @ashwinikumarkulkarni - Did either of you have a chance to confirm the branch resolves the issue?

@rebeccagj
Copy link

@TomSmithCGAT I haven't yet, but ought to have some time to do so later this week. So sorry for the delay!

@TomSmithCGAT
Copy link
Member

No need to apologise! I've merged the branch into master now anyway so you can install from the master branch instead. Please re-open the issue if this problem hasn't been resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants