-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NA ambiguous in recursive_dbscan #349
Comments
I'm getting the same error in several metagenomes - 4 out of 5 metagenomes failed with this message. The fifth one had no markers and so was similarly killed at the binning step. The work-around suggested seems reasonable to me, but I'm still curious how a completeness of NA pops up in the first place? |
I spent a lot of time today debugging the PRs: Autometa/autometa/binning/recursive_dbscan.py Lines 190 to 191 in 3ae76dc
It seems like the "failed to recover clusters" error only occurs after this modification so I think it might be masking the real issue (ie the NAs are a clue that something changed upstream?) It's probably going to take comparing the intermediate results/DFs when using both pandas 1.5 and 2.1 I rebased dev onto main and created a new branch that has the biopython changes and pandas pinned to 1.5, feel free to work off that branch Sort of related: there is a tests/environment.yml that the unit test runs on (if using the Makefile). IMO I think this needs to go away and it should only pull from the main ./autometa-env.ymlfile and then pip install pytest things within the make command |
related: #350 |
I am working with autometa 2.2.2 and have the same error autometa-binning Conda list gives this Name Version Build Channel_libgcc_mutex 0.1 conda_forge conda-forge |
There's some general issues throughout Autometa (I don't know how pervasive) where recent changes to Pandas could cause issues. The issue mentioned here appears to be when a recursive dbscan iteration comes up with no clusters. A fix in is in progress and a no-promises fix can be installed in the interim via pip: For devs: Part of the issue is Pandas changed how NAs are handled, and this project isn't the only that's had issues, https://pandas.pydata.org/docs/user_guide/missing_data.html#na-in-a-boolean-context I found at least one case where div by 0 coerces np.nan and these are then mixed in a dataframe with pd.NA which may cause issues. The whole code base may need to be checked |
@imonteroo, just wanted to reach out because you seem to be in active use. No promises but you can try the interim install in the comment above |
@chasemc Thank you for your advise. You are rigth when you say tha I am in active use of autometa and I do not use any cluster. That could be the problem. Unfortunately the error keeps after install hotfix-pandas-na. Well, a bit different autometa-binning --kmers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.bacteria.kmers.embedded.tsv --coverages /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.coverages.tsv --gc-content /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.gc.content.tsv --markers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.markers.tsv --clustering-method dbscan --completeness 20 --purity 95 --cov-stddev-limit 25 --gc-stddev-limit 5 --taxonomy /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.taxonomy.tsv --output-binning /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.binning.tsv --output-main /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.main.tsv --starting-rank superkingdom --rank-filter superkingdom --rank-name-filter bacteria
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: Reading/merging 4 contig annotation files
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: merged annotations shape: (13923, 15)
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (5959, 15)
[06/11/2024 11:10:26 AM INFO] root: Selected clustering method: dbscan
[06/11/2024 11:10:26 AM INFO] autometa.binning.recursive_dbscan: Using dbscan clustering method
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.recursive_dbscan: Using ranks: superkingdom, phylum, class, order, family, genus, species
[06/11/2024 11:10:26 AM INFO] autometa.binning.recursive_dbscan: Examining superkingdom: 1 unique taxa (5,959 contigs)
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: superkingdom : bacteria : (5959, 15)
Traceback (most recent call last):
File "/media/microviable/d/miniconda3/envs/autometa_env/bin/autometa-binning", line 10, in <module>
sys.exit(main())
^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 882, in main
main_out = taxon_guided_binning(
^^^^^^^^^^^^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 660, in taxon_guided_binning
clusters_df = get_clusters(
^^^^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 495, in get_clusters
clustered_df, unclustered_df = clusterer(
^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
if median_completeness >= best_median:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous |
It looks like the Make sure to do the e.g. conda activate /media/microviable/d/miniconda3/envs/autometa_env
pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na
|
I did, but nothing better (autometa_env) microviable@microviable:~$ pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na
Collecting git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na
Cloning https://github.com/KwanLab/Autometa.git (to revision hotfix-pandas-na) to /tmp/pip-req-build-12x6zl4f
Running command git clone --filter=blob:none --quiet https://github.com/KwanLab/Autometa.git /tmp/pip-req-build-12x6zl4f
Running command git checkout -b hotfix-pandas-na --track origin/hotfix-pandas-na
Cambiado a nueva rama 'hotfix-pandas-na'
Rama 'hotfix-pandas-na' configurada para hacer seguimiento a la rama remota 'hotfix-pandas-na' de 'origin'.
Resolved https://github.com/KwanLab/Autometa.git to commit f7f99ea7d9c644e7fd963a5b00e7b3a3618de1c1
Preparing metadata (setup.py) ... done
(autometa_env) microviable@microviable:~$ autometa-binning --kmers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.bacteria.kmers.embedded.tsv --coverages /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.coverages.tsv --gc-content /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.gc.content.tsv --markers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.markers.tsv --clustering-method dbscan --completeness 20 --purity 95 --cov-stddev-limit 25 --gc-stddev-limit 5 --taxonomy /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.taxonomy.tsv --output-binning /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.binning.tsv --output-main /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.main.tsv --starting-rank superkingdom --rank-filter superkingdom --rank-name-filter bacteria
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: Reading/merging 4 contig annotation files
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: merged annotations shape: (13923, 15)
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (5959, 15)
[06/12/2024 09:09:52 AM INFO] root: Selected clustering method: dbscan
[06/12/2024 09:09:52 AM INFO] autometa.binning.recursive_dbscan: Using dbscan clustering method
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.recursive_dbscan: Using ranks: superkingdom, phylum, class, order, family, genus, species
[06/12/2024 09:09:52 AM INFO] autometa.binning.recursive_dbscan: Examining superkingdom: 1 unique taxa (5,959 contigs)
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: superkingdom : bacteria : (5959, 15)
Traceback (most recent call last):
File "/media/microviable/d/miniconda3/envs/autometa_env/bin/autometa-binning", line 10, in <module>
sys.exit(main())
^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 882, in main
main_out = taxon_guided_binning(
^^^^^^^^^^^^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 660, in taxon_guided_binning
clusters_df = get_clusters(
^^^^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 495, in get_clusters
clustered_df, unclustered_df = clusterer(
^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
if median_completeness >= best_median:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous |
My bad, the package version isn't bumped in the branch yet so you need to add If the install is successful
should show
and not:
|
Thank you so much. It works |
It should be fixed in the latest update v2.2.3 #361 |
Autometa/autometa/binning/recursive_dbscan.py
Line 190 in 5e3250c
I am getting this 'NA' error -
if I protect it. I think this will work but still testing, I assume getting a NA value means just skip it anyways??
The text was updated successfully, but these errors were encountered: