Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: duckdb.InvalidInputException: Invalid Input Error: Required module 'pandas.core.arrays.arrow.dtype' #1069

Closed
bmanturner opened this issue Aug 31, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@bmanturner
Copy link

What happened?

I made no changes to the code. I just reinstalled my dependencies and suddenly this bug appeared

Versions

Chroma v0.3.26
Python v3.10

Relevant log output

File "/Users/bfturner/dev/heavynl/venv/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 612, in from_documents
    return cls.from_texts(
  File "/Users/bfturner/dev/heavynl/venv/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 567, in from_texts
    chroma_collection = cls(
  File "/Users/bfturner/dev/heavynl/venv/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 125, in __init__
    self._collection = self._client.get_or_create_collection(
  File "/Users/bfturner/dev/heavynl/venv/lib/python3.10/site-packages/chromadb/api/local.py", line 141, in get_or_create_collection
    return self.create_collection(
  File "/Users/bfturner/dev/heavynl/venv/lib/python3.10/site-packages/chromadb/api/local.py", line 110, in create_collection
    res = self._db.create_collection(name, metadata, get_or_create)
  File "/Users/bfturner/dev/heavynl/venv/lib/python3.10/site-packages/chromadb/db/duckdb.py", line 94, in create_collection
    dupe_check = self.get_collection(name)
  File "/Users/bfturner/dev/heavynl/venv/lib/python3.10/site-packages/chromadb/db/duckdb.py", line 116, in get_collection
    res = self._conn.execute("""SELECT * FROM collections WHERE name = ?""", [name]).fetchall()
duckdb.InvalidInputException: Invalid Input Error: Required module 'pandas.core.arrays.arrow.dtype' failed to import, due to the following Python exception:
ModuleNotFoundError: No module named 'pandas.core.arrays.arrow.dtype'
@bmanturner bmanturner added the bug Something isn't working label Aug 31, 2023
@bmanturner
Copy link
Author

Note: I cannot update chromadb because you package it with an out of date version of fastapi even though it's not necessary to use chromadb as a python module

@hyunkelw
Copy link

hyunkelw commented Sep 1, 2023

I second this. Since the Pandas update of August 30th my context retrieval isn't working anymore, and no new injest is possible

@rasnes
Copy link

rasnes commented Sep 4, 2023

Same bug experienced here.

@adnanrizve
Copy link

Having same issue

@taoli-ax
Copy link

taoli-ax commented Sep 5, 2023

Aha, same issue due to using minicoda, so I change to normal env ,and its stranger bug gone
share my requirements.txt
aiofiles==23.2.1 aiohttp==3.8.5 aiosignal==1.3.1 altair==5.1.0 annotated-types==0.5.0 anyio==3.7.1 async-timeout==4.0.3 attrs==23.1.0 backoff==2.2.1 bcrypt==4.0.1 certifi==2023.7.22 charset-normalizer==3.2.0 chroma-hnswlib==0.7.2 chromadb==0.3.26 click==8.1.7 clickhouse-connect==0.6.10 colorama==0.4.6 coloredlogs==15.0.1 contourpy==1.1.0 cycler==0.11.0 dataclasses-json==0.5.14 duckdb==0.8.1 et-xmlfile==1.1.0 exceptiongroup==1.1.3 fastapi==0.99.1 ffmpy==0.3.1 filelock==3.12.3 flatbuffers==23.5.26 fonttools==4.42.1 frozenlist==1.4.0 fsspec==2023.6.0 gpt4all==1.0.8 gradio==3.41.2 gradio_client==0.5.0 greenlet==2.0.2 h11==0.14.0 hnswlib==0.7.0 httpcore==0.17.3 httptools==0.6.0 httpx==0.24.1 huggingface-hub==0.16.4 humanfriendly==10.0 idna==3.4 importlib-resources==6.0.1 Jinja2==3.1.2 joblib==1.3.2 jsonschema==4.19.0 jsonschema-specifications==2023.7.1 kiwisolver==1.4.5 langchain==0.0.276 langsmith==0.0.28 lz4==4.3.2 MarkupSafe==2.1.3 marshmallow==3.20.1 matplotlib==3.7.2 monotonic==1.6 mpmath==1.3.0 multidict==6.0.4 mypy-extensions==1.0.0 networkx==3.1 nltk==3.8.1 numexpr==2.8.5 numpy==1.25.2 onnxruntime==1.15.1 openpyxl==3.1.2 orjson==3.9.5 overrides==7.4.0 packaging==23.1 pandas==2.0.3 Pillow==10.0.0 posthog==3.0.2 protobuf==4.24.2 pulsar-client==3.3.0 pydantic==1.10.12 pydantic_core==2.6.3 pydub==0.25.1 PyMuPDF==1.23.2 PyMuPDFb==1.23.0 pyparsing==3.0.9 PyPika==0.48.9 pyreadline3==3.4.1 python-dateutil==2.8.2 python-dotenv==1.0.0 python-multipart==0.0.6 pytz==2023.3 PyYAML==6.0.1 referencing==0.30.2 regex==2023.8.8 requests==2.31.0 rpds-py==0.10.0 safetensors==0.3.3 scikit-learn==1.3.0 scipy==1.11.2 semantic-version==2.10.0 sentence-transformers==2.2.2 sentencepiece==0.1.99 six==1.16.0 sniffio==1.3.0 SQLAlchemy==2.0.20 starlette==0.27.0 sympy==1.12 tenacity==8.2.3 threadpoolctl==3.2.0 tokenizers==0.13.3 toolz==0.12.0 torch==2.0.1 torchvision==0.15.2 tqdm==4.66.1 transformers==4.32.1 typing-inspect==0.9.0 typing_extensions==4.7.1 tzdata==2023.3 urllib3==2.0.4 uvicorn==0.23.2 watchfiles==0.20.0 websockets==11.0.3 yarl==1.9.2 zstandard==0.21.0

@andrebadini
Copy link

Same issue. Any solution?

@nsthorat
Copy link

nsthorat commented Sep 5, 2023

This is a bug with duckdb and pandas 2.1.0. Until this is fixed in duckdb, you should be able to downgrade pandas to 2.0.3.

duckdb/duckdb#8738

@jeffchuber
Copy link
Contributor

to all in this thread - chroma has dropped duckdb after 0.4.0 - can you all upgrade to a new version that is more supported?

@bmanturner
Copy link
Author

If the outdated fastapi dependency were made optional we would be able to. I think ideally you should have to install chromadb via pip install chromadb[web-server] to include support for the server functionality.

@jeffchuber
Copy link
Contributor

@bmanturner we do have a client-only version if that helps? chroma-client

@bmanturner
Copy link
Author

Thanks for the quick reply and suggestion @jeffchuber, but unfortunately that won't work. We use chromadb with LangChain. Our application runs chromadb locally. We have no need to run chromadb on a separate host.

@knaou
Copy link

knaou commented Sep 12, 2023

I found a temporary workaround for the issue by downgrading to Pandas version 1.5.3.

pip install pandas==1.5.3

@HammadB
Copy link
Collaborator

HammadB commented Dec 4, 2023

In Chroma V0.4 we dropped usage of duckdb entirely.
Closing this as stale / no longer relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants