Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA ambiguous in recursive_dbscan #349

Closed
hyphaltip opened this issue Nov 12, 2023 · 12 comments
Closed

NA ambiguous in recursive_dbscan #349

hyphaltip opened this issue Nov 12, 2023 · 12 comments
Labels
bug Something isn't working

Comments

@hyphaltip
Copy link
Contributor

hyphaltip commented Nov 12, 2023

if median_completeness >= best_median:

I am getting this 'NA' error -

... site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
    if median_completeness >= best_median:
  File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

if I protect it. I think this will work but still testing, I assume getting a NA value means just skip it anyways??

if pd.isna(median_completeness):
    median_completeness = 0
@samche42
Copy link
Collaborator

samche42 commented Feb 8, 2024

I'm getting the same error in several metagenomes - 4 out of 5 metagenomes failed with this message. The fifth one had no markers and so was similarly killed at the binning step. The work-around suggested seems reasonable to me, but I'm still curious how a completeness of NA pops up in the first place?

@samche42 samche42 added the bug Something isn't working label Feb 8, 2024
This was referenced Feb 10, 2024
@chasemc
Copy link
Member

chasemc commented Feb 22, 2024

image

I spent a lot of time today debugging the PRs:
main currently passes all tests when run inside the current autometa Docker image (which is why the tests don't need to be changed- there is a regression)
your modifications to fix biopython were good, but the change you made to dbscan also needs to be done for hdbscan (I might have that
the other errors...
FAILED 😰 tests/unit_tests/test_recursive_dbscan.py::test_taxon_guided_binning - TypeError: boolean value of NA is ambiguous
FAILED 😰 tests/unit_tests/test_recursive_dbscan.py::test_get_clusters[dbscan] - TypeError: boolean value of NA is ambiguous
FAILED 😰 tests/unit_tests/test_recursive_dbscan.py::test_get_clusters[hdbscan] - TypeError: boolean value of NA is ambiguous
FAILED 😰 tests/unit_tests/test_recursive_dbscan.py::test_recursive_dbscan_main - TypeError: boolean value of NA is ambiguou
... are due to the upgrade of pandas from 1.5 to 2.1 whatever it is now
ie if you pin pandas to 1.5 all the tests pass, if you upgrade to 2.1 (possibly any version between 1.5 and current) then those tests fail
I'm not familiar enough with all of pandas' breaking changes to be able to point to the specific function that is leading to this
It seems like you were trying to bypass the error in PR#356:

if pd.isna(median_completeness):
median_completeness = 0

It seems like the "failed to recover clusters" error only occurs after this modification so I think it might be masking the real issue (ie the NAs are a clue that something changed upstream?)
It's probably going to take comparing the intermediate results/DFs when using both pandas 1.5 and 2.1

I rebased dev onto main and created a new branch that has the biopython changes and pandas pinned to 1.5, feel free to work off that branch

Sort of related: there is a tests/environment.yml that the unit test runs on (if using the Makefile). IMO I think this needs to go away and it should only pull from the main ./autometa-env.ymlfile and then pip install pytest things within the make command

@chasemc
Copy link
Member

chasemc commented Feb 22, 2024

related: #350

@imonteroo
Copy link

imonteroo commented Jun 6, 2024

I am working with autometa 2.2.2 and have the same error

autometa-binning
--kmers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.bacteria.kmers.embedded.tsv
--coverages /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.coverages.tsv
--gc-content /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.gc.content.tsv
--markers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.markers.tsv
--clustering-method dbscan
--completeness 20
--purity 95
--cov-stddev-limit 25
--gc-stddev-limit 5
--taxonomy /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.taxonomy.tsv
--output-binning /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.binning.tsv
--output-main /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.main.tsv
--starting-rank superkingdom
--rank-filter superkingdom
--rank-name-filter bacteria
[06/06/2024 09:08:51 AM DEBUG] autometa.binning.utilities: Reading/merging 4 contig annotation files
[06/06/2024 09:08:51 AM DEBUG] autometa.binning.utilities: merged annotations shape: (13923, 15)
[06/06/2024 09:08:51 AM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (5959, 15)
[06/06/2024 09:08:51 AM INFO] root: Selected clustering method: dbscan
[06/06/2024 09:08:51 AM INFO] autometa.binning.recursive_dbscan: Using dbscan clustering method
[06/06/2024 09:08:51 AM DEBUG] autometa.binning.recursive_dbscan: Using ranks: superkingdom, phylum, class, order, family, genus, species
[06/06/2024 09:08:51 AM INFO] autometa.binning.recursive_dbscan: Examining superkingdom: 1 unique taxa (5,959 contigs)
[06/06/2024 09:08:51 AM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: superkingdom : bacteria : (5959, 15)
Traceback (most recent call last):
File "/media/microviable/d/miniconda3/envs/autometa_env/bin/autometa-binning", line 10, in
sys.exit(main())
^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 882, in main
main_out = taxon_guided_binning(
^^^^^^^^^^^^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 660, in taxon_guided_binning
clusters_df = get_clusters(
^^^^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 495, in get_clusters
clustered_df, unclustered_df = clusterer(
^^^^^^^^^^
File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
if median_completeness >= best_median:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "missing.pyx", line 392, in pandas._libs.missing.NAType.bool
TypeError: boolean value of NA is ambiguous`

Conda list gives this

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
_sysroot_linux-64_curr_repodata_hack 3 h69a702a_14 conda-forge
alsa-lib 1.2.11 hd590300_1 conda-forge
attrs 23.2.0 pyh71513ae_0 conda-forge
autometa 2.2.2 pyh7cba7a3_0 bioconda
beautifulsoup4 4.12.3 pyha770c72_0 conda-forge
bedtools 2.31.1 hf5e1c6e_1 bioconda
biom-format 2.1.16 py312h9a8786e_1 conda-forge
biopython 1.83 py312h98912ed_0 conda-forge
blast 2.15.0 pl5321h6f7f691_1 bioconda
boost-cpp 1.78.0 h2c5509c_4 conda-forge
bowtie2 2.5.4 he20e202_0 bioconda
brotli-python 1.1.0 py312h30efb56_1 conda-forge
bwa 0.7.18 he4a0461_0 bioconda
bzip2 1.0.8 hd590300_5 conda-forge
c-ares 1.28.1 hd590300_0 conda-forge
ca-certificates 2024.6.2 hbcca054_0 conda-forge
cached-property 1.5.2 hd8ed1ab_1 conda-forge
cached_property 1.5.2 pyha770c72_1 conda-forge
cairo 1.18.0 h3faef2a_0 conda-forge
certifi 2024.2.2 pyhd8ed1ab_0 conda-forge
charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge
click 8.1.7 unix_pyh707e725_0 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
curl 8.8.0 he654da7_0 conda-forge
diamond 2.1.9 h43eeafb_0 bioconda
entrez-direct 21.6 he881be0_0 bioconda
exceptiongroup 1.2.0 pyhd8ed1ab_2 conda-forge
expat 2.6.2 h59595ed_0 conda-forge
fastqc 0.12.1 hdfd78af_0 bioconda
filelock 3.14.0 pyhd8ed1ab_0 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 h77eed37_2 conda-forge
fontconfig 2.14.2 h14ed4e7_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
freetype 2.12.1 h267a509_2 conda-forge
gdown 5.2.0 pyhd8ed1ab_0 conda-forge
gettext 0.22.5 h59595ed_2 conda-forge
gettext-tools 0.22.5 h59595ed_2 conda-forge
giflib 5.2.2 hd590300_0 conda-forge
graphite2 1.3.13 h59595ed_1003 conda-forge
h5py 3.11.0 nompi_py312hb7ab980_101 conda-forge
harfbuzz 8.5.0 hfac3d4d_0 conda-forge
hdf5 1.14.3 nompi_hdf9ad27_104 conda-forge
hdmedians 0.14.2 py312h085067d_6 conda-forge
hmmer 3.4 hdbdd923_1 bioconda
htslib 1.20 h81da01d_0 bioconda
icu 73.2 h59595ed_0 conda-forge
idna 3.7 pyhd8ed1ab_0 conda-forge
iniconfig 2.0.0 pyhd8ed1ab_0 conda-forge
joblib 1.4.2 pyhd8ed1ab_0 conda-forge
kart 2.5.6 hcd5855d_4 bioconda
kernel-headers_linux-64 3.10.0 h4a8ded7_14 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.2 h659d440_0 conda-forge
lcms2 2.16 hb7c19ff_0 conda-forge
ld_impl_linux-64 2.40 hf3520f5_1 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libaec 1.1.3 h59595ed_0 conda-forge
libasprintf 0.22.5 h661eb56_2 conda-forge
libasprintf-devel 0.22.5 h661eb56_2 conda-forge
libblas 3.9.0 22_linux64_openblas conda-forge
libcblas 3.9.0 22_linux64_openblas conda-forge
libcups 2.3.3 h4637d8d_4 conda-forge
libcurl 8.8.0 hca28451_0 conda-forge
libdeflate 1.20 hd590300_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libexpat 2.6.2 h59595ed_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.2.0 h77fa898_7 conda-forge
libgettextpo 0.22.5 h59595ed_2 conda-forge
libgettextpo-devel 0.22.5 h59595ed_2 conda-forge
libgfortran-ng 13.2.0 h69a702a_7 conda-forge
libgfortran5 13.2.0 hca663fb_7 conda-forge
libglib 2.80.2 hf974151_0 conda-forge
libgomp 13.2.0 h77fa898_7 conda-forge
libhwloc 2.10.0 default_h5622ce7_1001 conda-forge
libiconv 1.17 hd590300_2 conda-forge
libidn2 2.3.7 hd590300_0 conda-forge
libjpeg-turbo 3.0.0 hd590300_1 conda-forge
liblapack 3.9.0 22_linux64_openblas conda-forge
libllvm14 14.0.6 hcd5def8_4 conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libopenblas 0.3.27 pthreads_h413a1c8_0 conda-forge
libpng 1.6.43 h2797004_0 conda-forge
libsqlite 3.45.3 h2797004_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-ng 13.2.0 hc0a3c3a_7 conda-forge
libtiff 4.6.0 h1dd3fc0_3 conda-forge
libunistring 0.9.10 h7f98852_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libwebp-base 1.4.0 hd590300_0 conda-forge
libxcb 1.15 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxml2 2.12.7 hc051c1a_0 conda-forge
libzlib 1.2.13 h4ab18f5_6 conda-forge
llvm-openmp 8.0.1 hc9558a2_0 conda-forge
llvmlite 0.42.0 py312hb06c811_1 conda-forge
lz4-c 1.9.4 hcb278e6_0 conda-forge
megahit 1.2.9 h43eeafb_5 bioconda
natsort 8.4.0 pyhd8ed1ab_0 conda-forge
ncbi-vdb 3.1.1 h4ac6f70_0 bioconda
ncurses 6.5 h59595ed_0 conda-forge
numba 0.59.1 py312hacefee8_0 conda-forge
numpy 1.26.4 py312heda63a1_0 conda-forge
openjdk 22.0.1 hb622114_0 conda-forge
openmp 8.0.1 0 conda-forge
openssl 3.3.1 h4ab18f5_0 conda-forge
packaging 24.0 pyhd8ed1ab_0 conda-forge
pandas 2.2.2 py312h1d6d2e6_1 conda-forge
parallel 20240522 ha770c72_0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
pcre2 10.43 hcad00b1_0 conda-forge
perl 5.32.1 7_hd590300_perl5 conda-forge
perl-archive-tar 2.40 pl5321hdfd78af_0 bioconda
perl-carp 1.50 pl5321hd8ed1ab_0 conda-forge
perl-common-sense 3.75 pl5321hd8ed1ab_0 conda-forge
perl-compress-raw-bzip2 2.201 pl5321h166bdaf_0 conda-forge
perl-compress-raw-zlib 2.202 pl5321h166bdaf_0 conda-forge
perl-encode 3.21 pl5321hd590300_0 conda-forge
perl-exporter 5.74 pl5321hd8ed1ab_0 conda-forge
perl-exporter-tiny 1.002002 pl5321hd8ed1ab_0 conda-forge
perl-extutils-makemaker 7.70 pl5321hd8ed1ab_0 conda-forge
perl-io-compress 2.201 pl5321hdbdd923_2 bioconda
perl-io-zlib 1.14 pl5321hdfd78af_0 bioconda
perl-json 4.10 pl5321hdfd78af_0 bioconda
perl-json-xs 2.34 pl5321h4ac6f70_6 bioconda
perl-list-moreutils 0.430 pl5321hdfd78af_0 bioconda
perl-list-moreutils-xs 0.430 pl5321h031d066_2 bioconda
perl-parent 0.241 pl5321hd8ed1ab_0 conda-forge
perl-pathtools 3.75 pl5321h166bdaf_0 conda-forge
perl-scalar-list-utils 1.63 pl5321h166bdaf_0 conda-forge
perl-storable 3.15 pl5321h166bdaf_0 conda-forge
perl-types-serialiser 1.01 pl5321hdfd78af_0 bioconda
pip 24.0 pypi_0 pypi
pixman 0.43.2 h59595ed_0 conda-forge
pluggy 1.5.0 pyhd8ed1ab_0 conda-forge
popt 1.16 h0b475e3_2002 conda-forge
prodigal 2.6.3 h031d066_8 bioconda
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pynndescent 0.5.12 pyhca7485f_0 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
pytest 8.2.1 pyhd8ed1ab_0 conda-forge
python 3.12.3 hab00c5b_0_cpython conda-forge
python-annoy 1.17.3 py312h7070661_1 conda-forge
python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge
python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge
python_abi 3.12 4_cp312 conda-forge
pytz 2024.1 pyhd8ed1ab_0 conda-forge
quast 5.2.0 pypi_0 pypi
readline 8.2 h8228510_1 conda-forge
requests 2.32.3 pyhd8ed1ab_0 conda-forge
rsync 3.3.0 he6cb5fe_0 conda-forge
samtools 1.20 h50ea8bc_0 bioconda
scikit-bio 0.6.0 py312hc7c0aa3_4 conda-forge
scikit-learn 1.5.0 py312h1fcc3ea_1 conda-forge
scipy 1.13.1 py312hc2bc53b_0 conda-forge
seqkit 2.8.2 h9ee0642_0 bioconda
setuptools 70.0.0 pyhd8ed1ab_0 conda-forge
simplejson 3.19.2 pypi_0 pypi
six 1.16.0 pyh6c4a22f_0 conda-forge
soupsieve 2.5 pyhd8ed1ab_1 conda-forge
spades 4.0.0 h5fb382e_1 bioconda
sysroot_linux-64 2.17 h4a8ded7_14 conda-forge
tbb 2021.12.0 h297d8ca_1 conda-forge
threadpoolctl 3.5.0 pyhc1e730c_0 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
tomli 2.0.1 pyhd8ed1ab_0 conda-forge
tqdm 4.66.4 pyhd8ed1ab_0 conda-forge
trimap 1.0.15 pyh5e36f6f_0 bioconda
trimmomatic 0.39 hdfd78af_2 bioconda
tsne 0.3.1 py312hf053be7_5 conda-forge
tzdata 2024a h0c530f3_0 conda-forge
umap-learn 0.5.5 py312h7900ff3_1 conda-forge
urllib3 2.2.1 pyhd8ed1ab_0 conda-forge
wget 1.21.4 hda4d442_0 conda-forge
wheel 0.43.0 pyhd8ed1ab_1 conda-forge
xorg-fixesproto 5.0 h7f98852_1002 conda-forge
xorg-inputproto 2.3.2 h7f98852_1002 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.1.1 hd590300_0 conda-forge
xorg-libsm 1.2.4 h7391055_0 conda-forge
xorg-libx11 1.8.9 h8ee46fc_0 conda-forge
xorg-libxau 1.0.11 hd590300_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h0b41bf4_2 conda-forge
xorg-libxfixes 5.0.3 h7f98852_1004 conda-forge
xorg-libxi 1.7.10 h7f98852_0 conda-forge
xorg-libxrender 0.9.11 hd590300_0 conda-forge
xorg-libxt 1.3.0 hd590300_1 conda-forge
xorg-libxtst 1.2.3 h7f98852_1002 conda-forge
xorg-recordproto 1.14.2 h7f98852_1002 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xxhash 0.8.2 hd590300_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zlib 1.2.13 h4ab18f5_6 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge

@chasemc
Copy link
Member

chasemc commented Jun 10, 2024

There's some general issues throughout Autometa (I don't know how pervasive) where recent changes to Pandas could cause issues.

The issue mentioned here appears to be when a recursive dbscan iteration comes up with no clusters. A fix in is in progress and a no-promises fix can be installed in the interim via pip:
pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na

For devs:

Part of the issue is Pandas changed how NAs are handled, and this project isn't the only that's had issues, https://pandas.pydata.org/docs/user_guide/missing_data.html#na-in-a-boolean-context

I found at least one case where div by 0 coerces np.nan and these are then mixed in a dataframe with pd.NA which may cause issues. The whole code base may need to be checked

CC @jason-c-kwan @Sidduppal @shaneroesemann

@chasemc
Copy link
Member

chasemc commented Jun 10, 2024

@imonteroo, just wanted to reach out because you seem to be in active use. No promises but you can try the interim install in the comment above

@imonteroo
Copy link

imonteroo commented Jun 11, 2024

@chasemc Thank you for your advise. You are rigth when you say tha I am in active use of autometa and I do not use any cluster. That could be the problem.

Unfortunately the error keeps after install hotfix-pandas-na. Well, a bit different

autometa-binning     --kmers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.bacteria.kmers.embedded.tsv     --coverages /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.coverages.tsv     --gc-content /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.gc.content.tsv     --markers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.markers.tsv     --clustering-method dbscan     --completeness 20     --purity 95     --cov-stddev-limit 25     --gc-stddev-limit 5     --taxonomy /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.taxonomy.tsv     --output-binning /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.binning.tsv     --output-main /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.main.tsv     --starting-rank superkingdom     --rank-filter superkingdom     --rank-name-filter bacteria
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: Reading/merging 4 contig annotation files
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: merged annotations shape: (13923, 15)
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (5959, 15)
[06/11/2024 11:10:26 AM INFO] root: Selected clustering method: dbscan
[06/11/2024 11:10:26 AM INFO] autometa.binning.recursive_dbscan: Using dbscan clustering method
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.recursive_dbscan: Using ranks: superkingdom, phylum, class, order, family, genus, species
[06/11/2024 11:10:26 AM INFO] autometa.binning.recursive_dbscan: Examining superkingdom: 1 unique taxa (5,959 contigs)
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: superkingdom : bacteria : (5959, 15)
Traceback (most recent call last):
  File "/media/microviable/d/miniconda3/envs/autometa_env/bin/autometa-binning", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 882, in main
    main_out = taxon_guided_binning(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 660, in taxon_guided_binning
    clusters_df = get_clusters(
                  ^^^^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 495, in get_clusters
    clustered_df, unclustered_df = clusterer(
                                   ^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
    if median_completeness >= best_median:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

@chasemc
Copy link
Member

chasemc commented Jun 11, 2024

It looks like the hotfix-pandas-na branch wasn't installed because line 190, in recursive_dbscan is the old line 190 in the log you pasted.

Make sure to do the pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na after activating the conda environment if you didn't

e.g.

conda activate /media/microviable/d/miniconda3/envs/autometa_env
pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na

Note: With this update I'm getting more and better clusters than the unit test data that I have access to (we're looking to that in the meantime).

@imonteroo
Copy link

I did, but nothing better

(autometa_env) microviable@microviable:~$ pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na
Collecting git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na
  Cloning https://github.com/KwanLab/Autometa.git (to revision hotfix-pandas-na) to /tmp/pip-req-build-12x6zl4f
  Running command git clone --filter=blob:none --quiet https://github.com/KwanLab/Autometa.git /tmp/pip-req-build-12x6zl4f
  Running command git checkout -b hotfix-pandas-na --track origin/hotfix-pandas-na
  Cambiado a nueva rama 'hotfix-pandas-na'
  Rama 'hotfix-pandas-na' configurada para hacer seguimiento a la rama remota 'hotfix-pandas-na' de 'origin'.
  Resolved https://github.com/KwanLab/Autometa.git to commit f7f99ea7d9c644e7fd963a5b00e7b3a3618de1c1
  Preparing metadata (setup.py) ... done
(autometa_env) microviable@microviable:~$ autometa-binning     --kmers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.bacteria.kmers.embedded.tsv     --coverages /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.coverages.tsv     --gc-content /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.gc.content.tsv     --markers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.markers.tsv     --clustering-method dbscan     --completeness 20     --purity 95     --cov-stddev-limit 25     --gc-stddev-limit 5     --taxonomy /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.taxonomy.tsv     --output-binning /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.binning.tsv     --output-main /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.main.tsv     --starting-rank superkingdom     --rank-filter superkingdom     --rank-name-filter bacteria
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: Reading/merging 4 contig annotation files
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: merged annotations shape: (13923, 15)
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (5959, 15)
[06/12/2024 09:09:52 AM INFO] root: Selected clustering method: dbscan
[06/12/2024 09:09:52 AM INFO] autometa.binning.recursive_dbscan: Using dbscan clustering method
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.recursive_dbscan: Using ranks: superkingdom, phylum, class, order, family, genus, species
[06/12/2024 09:09:52 AM INFO] autometa.binning.recursive_dbscan: Examining superkingdom: 1 unique taxa (5,959 contigs)
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: superkingdom : bacteria : (5959, 15)
Traceback (most recent call last):
  File "/media/microviable/d/miniconda3/envs/autometa_env/bin/autometa-binning", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 882, in main
    main_out = taxon_guided_binning(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 660, in taxon_guided_binning
    clusters_df = get_clusters(
                  ^^^^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 495, in get_clusters
    clustered_df, unclustered_df = clusterer(
                                   ^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
    if median_completeness >= best_median:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

@chasemc
Copy link
Member

chasemc commented Jun 12, 2024

My bad, the package version isn't bumped in the branch yet so you need to add --force-reinstall which should work
pip install --force-reinstall git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na

If the install is successful

head -n191 /media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py | tail -n1

should show

else:

and not:

best_median = median_completeness

@imonteroo
Copy link

Thank you so much. It works

@Sidduppal
Copy link
Collaborator

It should be fixed in the latest update v2.2.3 #361

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants