Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flextaxd error: "ValueError: not enough values to unpack (expected 2, got 1)" #70

Open
morien opened this issue Feb 8, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@morien
Copy link

morien commented Feb 8, 2024

I'm attempting to follow along with this part of the tutorial/wiki, to get a better understanding of how to create my own custom DB. Things are okay until I get to the database creation step:

# flextaxd -db 16S_database.db -tf GTDB_arc_bact_taxo_tree_unique.txt -tt CanSNPer --genomeid2taxid g2id.txt --dump --dbprogram kraken2 -o taxonomy --verbose --logs logs/zenodo
2024-02-07 18:08:45,291 custom_taxonomy_databases [INFO ]  FlexTaxD logging initiated!
Warning: 16S_database.db already exists, overwrite? (y/n): y
2024-02-07 18:08:49,303 custom_taxonomy_databases [INFO ]  Loading module: ReadTaxonomyCanSNPer
2024-02-07 18:08:49,352 DatabaseConnection [INFO ]  16S_database.db opened successfully.
2024-02-07 18:08:49,353 ReadTaxonomyCanSNPer [INFO ]  GTDB_arc_bact_taxo_tree_unique.txt
2024-02-07 18:08:49,353 ReadTaxonomyCanSNPer [INFO ]  Fetching root name from file
2024-02-07 18:08:49,353 ReadTaxonomyCanSNPer [INFO ]  Adding, cellular organism node
2024-02-07 18:08:49,354 ReadTaxonomyCanSNPer [INFO ]  Adding root node root!
2024-02-07 18:08:49,355 custom_taxonomy_databases [INFO ]  Parse taxonomy
2024-02-07 18:08:49,355 ReadTaxonomyCanSNPer [INFO ]  Parse CanSNP tree file
2024-02-07 18:08:49,902 ReadTaxonomyCanSNPer [INFO ]  New taxonomy ids assigned 12929
Traceback (most recent call last):
  File "/home/nnnnnn/mambaforge/lib/python3.9/site-packages/flextaxd/modules/ReadTaxonomy.py", line 153, in parse_genomeid2taxid
    genomeid,taxid = row.strip().split("\t")
ValueError: not enough values to unpack (expected 2, got 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nnnnnn/mambaforge/bin/flextaxd", line 8, in <module>
    sys.exit(main())
  File "/home/nnnnnn/mambaforge/lib/python3.9/site-packages/flextaxd/custom_taxonomy_databases.py", line 330, in main
    read_obj.parse_genomeid2taxid(args.genomeid2taxid)
  File "/home/nnnnnn/mambaforge/lib/python3.9/site-packages/flextaxd/modules/ReadTaxonomy.py", line 156, in parse_genomeid2taxid
    genomeid,taxid,reference = row.strip().split("\t")
ValueError: not enough values to unpack (expected 3, got 1)

Here's the first few lines of my two input files:

# head g2id.txt 
GB_GCA_000010565.1      Pelotomaculum thermopropionicum
GB_GCA_000018565.1      Herpetosiphon aurantiacus
GB_GCA_000024525.1      Spirosoma linguale
GB_GCA_000091165.1      Methylomirabilis oxyfera_B
GB_GCA_000146855.1      Peptoanaerobacter margaretiae
GB_GCA_000147015.1      Zinderia insecticola
GB_GCA_000163995.1      Campylobacter_D jejuni_A
GB_GCA_000165065.1      Longicatena sp000165065
GB_GCA_000166295.1      Marinobacter adhaerens
GB_GCA_000168735.1      Endoriftia persephone
 # head GTDB_arc_bact_taxo_tree_unique.txt 
root;Archaea;Aenigmatarchaeota;Aenigmatarchaeia;Aenigmatarchaeales;Aenigmatarchaeaceae;Aenigmatarchaeum;Aenigmatarchaeum_subterraneum
root;Archaea;Aenigmatarchaeota;Aenigmatarchaeia;CG10238-14;CG10238-14;CG10238-14;CG10238-14_sp002789635
root;Archaea;Aenigmatarchaeota;Aenigmatarchaeia;CG10238-14;CG10238-14;RBG-16-49-10;RBG-16-49-10_sp001784635
root;Archaea;Aenigmatarchaeota;Aenigmatarchaeia;CG10238-14;EX4484-224;EX4484-224;EX4484-224_sp002254545
root;Archaea;Aenigmatarchaeota;Aenigmatarchaeia;CG10238-14;SCSR01;SCSR01;SCSR01_sp004297575
root;Archaea;Aenigmatarchaeota;Aenigmatarchaeia;GW2011-AR5;GCA-2688965;GCA-2688965;GCA-2688965_sp002688965
root;Archaea;Aenigmatarchaeota;Aenigmatarchaeia;GW2011-AR5;GW2011-AR5;GW2011-AR5;GW2011-AR5_sp000806115
root;Archaea;Aenigmatarchaeota;Aenigmatarchaeia;GW2011-AR5;GW2011-AR5;GW2011-AR5;GW2011-AR5_sp10154u
root;Archaea;Aenigmatarchaeota;Aenigmatarchaeia;QMZP01;QMZP01;QMZP01;QMZP01_sp003663225
root;Archaea;Aenigmatarchaeota;Aenigmatarchaeia;QMZP01;QMZP01;QMZY01;QMZY01_sp003663415

I'd like to use this tool so any help is greatly appreciated

@davve2
Copy link
Collaborator

davve2 commented Feb 13, 2024

Hi Morien,

It looks like the header may be the problem (if they are included in the files). If not I think the best option is if you could supply the head of your files as a text files, then we can replicate the error locally. The error itself tells says that the program finds too few columns separated by . What do you use for separation in your files? the default separator is \t

@davve2 davve2 self-assigned this Feb 13, 2024
@davve2 davve2 added the question Further information is requested label Feb 13, 2024
@morien
Copy link
Author

morien commented Feb 14, 2024

g2id.txt.gz
GTDB_arc_bact_taxo_tree_unique.txt.gz
Okay great. Yes, the default separator is \t and that's what I see reflected in my input files. Should it be . instead? Here's my input files (entire files, gzipped).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants