Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Install issue]: Packaging and Deployment Issue with chromadb-client in AWS Lambda #2231

Open
marichkazb opened this issue May 21, 2024 · 4 comments
Labels
installation trouble trouble building or installing chroma

Comments

@marichkazb
Copy link

What happened?

I am developing an application using the OpenAI API, combined with ChromaDB as a tool for Retrieval-Augmented Generation (RAG) to build a custom responsive chatbot powered with business data. Currently, I am deploying my application on AWS. The ChromaDB instance is running on CloudFormation, and my backend Python function is deployed as a Lambda function.

In my Lambda function, I need to connect to the ChromaDB instance to query the collection and retrieve data, which will later be embedded as context in a call to the OpenAI API.

In a lambda function I need to connect to chromadb instance to query collection and retrieve data to later embed it as context in a call to the OpenAI.

chroma_client = chromadb.HttpClient(host='11.111.111.11’, port=8000)

One way to include dependencies in Lambda functions is by uploading .zip packages to Lambda layers. I successfully packaged all dependencies, but encountered some issues with ChromaDB.

The original ChromaDB distribution is 107 MB when zipped, which exceeds the storage limits for both Lambda layers (50 MB max) and S3 buckets. I then discovered the smaller chromadb-client library, which can be uploaded to AWS. However, after including it, I encountered the following error, likely related to packaging. I tried both zipping on MacOS and Virtual Linux machine

My questions are:
1) Is the way I am handling ChromaDB in this example optimal? Am I on the right path for deploying the app on AWS?
2) Is there an official distribution of a zipped chromadb-client or chromadb that is compatible with Lambda Layers? How would you recommend to handle this issue?

Any help will be greatly appreciated! Thanks! 🙌🏻✨

Versions

chromadb-client 0.4.25.dev0
Python 3.10.12
MacOS 13.1/VM Linux Ubuntu

Relevant log output

Response
{
  "errorMessage": "Invalid Schema:\nmodel.schema.model-fields.fields.files.schema.default.schema.nullable.schema.union.choices.0.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict].dict.values_schema.union.choices.2.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict]\n  Input tag 'tuple' found using 'type' does not match any of the expected tags: 'any', 'none', 'bool', 'int', 'float', 'decimal', 'str', 'bytes', 'date', 'time', 'datetime', 'timedelta', 'literal', 'is-instance', 'is-subclass', 'callable', 'list', 'tuple-positional', 'tuple-variable', 'set', 'frozenset', 'generator', 'dict', 'function-after', 'function-before', 'function-wrap', 'function-plain', 'default', 'nullable', 'union', 'tagged-union', 'chain', 'lax-or-strict', 'json-or-python', 'typed-dict', 'model-fields', 'model', 'dataclass-args', 'dataclass', 'arguments', 'call', 'custom-error', 'json', 'url', 'multi-host-url', 'definitions', 'definition-ref', 'uuid' [type=union_tag_invalid, input_value={'type': 'tuple', 'items_...}, {'type': 'bytes'}]}]}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.7/v/union_tag_invalid\nmodel.schema.model-fields.fields.files.schema.default.schema.nullable.schema.union.choices.0.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict].dict.values_schema.union.choices.2.`tuple[..., str]`\n  Input should be a valid tuple [type=tuple_type, input_value={'type': 'tuple', 'items_...}, {'type': 'bytes'}]}]}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.7/v/tuple_type\nmodel.schema.model-fields.fields.files.schema.default.schema.nullable.schema.union.choices.0.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict].dict.values_schema.union.choices.3.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict]\n  Input tag 'tuple' found using 'type' does not match any of the expected tags: 'any', 'none', 'bool', 'int', 'float', 'decimal', 'str', 'bytes', 'date', 'time', 'datetime', 'timedelta', 'literal', 'is-instance', 'is-subclass', 'callable', 'list', 'tuple-positional', 'tuple-variable', 'set', 'frozenset', 'generator', 'dict', 'function-after', 'function-before', 'function-wrap', 'function-plain', 'default', 'nullable', 'union', 'tagged-union', 'chain', 'lax-or-strict', 'json-or-python', 'typed-dict', 'model-fields', 'model', 'dataclass-args', 'dataclass', 'arguments', 'call', 'custom-error', 'json', 'url', 'multi-host-url', 'definitions', 'definition-ref', 'uuid' [type=union_tag_invalid, input_value={'type': 'tuple', 'items_...ema': {'type': 'str'}}]}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.7/v/union_tag_invalid\nmodel.schema.model-fields.fields.files.schema.default.schema.nullable.schema.union.choices.0.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict].dict.values_schema.union.choices.3.`tuple[..., str]`\n  Input should be a valid tuple [type=tuple_type, input_value={'type': 'tuple', 'items_...ema': {'type': 'str'}}]}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.7/v/tuple_type\nmodel.schema.model-fields.fields.files.schema.default.schema.nullable.schema.union.choices.0.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict].dict.values_schema.union.choices.4.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict]\n  Input tag 'tuple' found using 'type' does not match any of the expected tags: 'any', 'none', 'bool', 'int', 'float', 'decimal', 'str', 'bytes', 'date', 'time', 'datetime', 'timedelta', 'literal', 'is-instance', 'is-subclass', 'callable', 'list', 'tuple-positional', 'tuple-variable', 'set', 'frozenset', 'generator', 'dict', 'function-after', 'function-before', 'function-wrap', 'function-plain', 'default', 'nullable', 'union', 'tagged-union', 'chain', 'lax-or-strict', 'json-or-python', 'typed-dict', 'model-fields', 'model', 'dataclass-args', 'dataclass', 'arguments', 'call', 'custom-error', 'json', 'url', 'multi-host-url', 'definitions', 'definition-ref', 'uuid' [type=union_tag_invalid, input_value={'type': 'tuple', 'items_...tr'}, 'strict': False}]}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.7/v/union_tag_invalid\nmodel.schema.model-fields.fields.files.schema.default.schema.nullable.schema.union.choices.0.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict].dict.values_schema.union.choices.4.`tuple[..., str]`\n  Input should be a valid tuple [type=tuple_type, input_value={'type': 'tuple', 'items_...tr'}, 'strict': False}]}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.7/v/tuple_type\nmodel.schema.model-fields.fields.files.schema.default.schema.nullable.schema.union.choices.0.`tuple[..., str]`\n  Input should be a valid tuple [type=tuple_type, input_value={'type': 'dict', 'keys_sc...e}]}]}, 'strict': False}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.7/v/tuple_type\nmodel.schema.model-fields.fields.files.schema.default.schema.nullable.schema.union.choices.1.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict].json-or-python.json_schema.list.items_schema\n  Input tag 'tuple' found using 'type' does not match any of the expected tags: 'any', 'none', 'bool', 'int', 'float', 'decimal', 'str', 'bytes', 'date', 'time', 'datetime', 'timedelta', 'literal', 'is-instance', 'is-subclass', 'callable', 'list', 'tuple-positional', 'tuple-variable', 'set', 'frozenset', 'generator', 'dict', 'function-after', 'function-before', 'function-wrap', 'function-plain', 'default', 'nullable', 'union', 'tagged-union', 'chain', 'lax-or-strict', 'json-or-python', 'typed-dict', 'model-fields', 'model', 'dataclass-args', 'dataclass', 'arguments', 'call', 'custom-error', 'json', 'url', 'multi-host-url', 'definitions', 'definition-ref', 'uuid' [type=union_tag_invalid, input_value={'type': 'tuple', 'items_..., 'strict': False}]}]}]}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.7/v/union_tag_invalid\nmodel.schema.model-fields.fields.files.schema.default.schema.nullable.schema.union.choices.1.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict].json-or-python.python_schema.chain.steps.1.function-wrap.schema.list.items_schema\n  Input tag 'tuple' found using 'type' does not match any of the expected tags: 'any', 'none', 'bool', 'int', 'float', 'decimal', 'str', 'bytes', 'date', 'time', 'datetime', 'timedelta', 'literal', 'is-instance', 'is-subclass', 'callable', 'list', 'tuple-positional', 'tuple-variable', 'set', 'frozenset', 'generator', 'dict', 'function-after', 'function-before', 'function-wrap', 'function-plain', 'default', 'nullable', 'union', 'tagged-union', 'chain', 'lax-or-strict', 'json-or-python', 'typed-dict', 'model-fields', 'model', 'dataclass-args', 'dataclass', 'arguments', 'call', 'custom-error', 'json', 'url', 'multi-host-url', 'definitions', 'definition-ref', 'uuid' [type=union_tag_invalid, input_value={'type': 'tuple', 'items_..., 'strict': False}]}]}]}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.7/v/union_tag_invalid\nmodel.schema.model-fields.fields.files.schema.default.schema.nullable.schema.union.choices.1.tagged-union[typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict,typed-dict].json-or-python.serialization.function-wrap.schema\n  Input tag 'tuple' found using 'type' does not match any of the expected tags: 'any', 'none', 'bool', 'int', 'float', 'decimal', 'str', 'bytes', 'date', 'time', 'datetime', 'timedelta', 'literal', 'is-instance', 'is-subclass', 'callable', 'list', 'tuple-positional', 'tuple-variable', 'set', 'frozenset', 'generator', 'dict', 'function-after', 'function-before', 'function-wrap', 'function-plain', 'default', 'nullable', 'union', 'tagged-union', 'chain', 'lax-or-strict', 'json-or-python', 'typed-dict', 'model-fields', 'model', 'dataclass-args', 'dataclass', 'arguments', 'call', 'custom-error', 'json', 'url', 'multi-host-url', 'definitions', 'definition-ref', 'uuid' [type=union_tag_invalid, input_value={'type': 'tuple', 'items_..., 'strict': False}]}]}]}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.7/v/union_tag_invalid\nmodel.schema.model-fields.fields.files.schema.default.schema.nullable.schema.union.choices.1.`tuple[..., str]`\n  Input should be a valid tuple [type=tuple_type, input_value={'type': 'json-or-python'...'strict': False}]}]}]}}}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.7/v/tuple_type",
  "errorType": "SchemaError",
  "requestId": "",
  "stackTrace": [
    "  File \"/var/lang/lib/python3.10/importlib/__init__.py\", line 126, in import_module\n    return _bootstrap._gcd_import(name[level:], package, level)\n",
    "  File \"<frozen importlib._bootstrap>\", line 1050, in _gcd_import\n",
    "  File \"<frozen importlib._bootstrap>\", line 1027, in _find_and_load\n",
    "  File \"<frozen importlib._bootstrap>\", line 1006, in _find_and_load_unlocked\n",
    "  File \"<frozen importlib._bootstrap>\", line 688, in _load_unlocked\n",
    "  File \"<frozen importlib._bootstrap_external>\", line 883, in exec_module\n",
    "  File \"<frozen importlib._bootstrap>\", line 241, in _call_with_frames_removed\n",
    "  File \"/var/task/lambda_function.py\", line 3, in <module>\n    from openai import OpenAI\n",
    "  File \"/opt/python/openai/__init__.py\", line 8, in <module>\n    from . import types\n",
    "  File \"/opt/python/openai/types/__init__.py\", line 5, in <module>\n    from .edit import Edit as Edit\n",
    "  File \"/opt/python/openai/types/edit.py\", line 6, in <module>\n    from .._models import BaseModel\n",
    "  File \"/opt/python/openai/_models.py\", line 414, in <module>\n    class FinalRequestOptions(pydantic.BaseModel):\n",
    "  File \"/opt/python/pydantic/_internal/_model_construction.py\", line 202, in __new__\n    complete_model_class(\n",
    "  File \"/opt/python/pydantic/_internal/_model_construction.py\", line 549, in complete_model_class\n    schema = gen_schema.clean_schema(schema)\n",
    "  File \"/opt/python/pydantic/_internal/_generate_schema.py\", line 442, in clean_schema\n    schema = validate_core_schema(schema)\n",
    "  File \"/opt/python/pydantic/_internal/_core_utils.py\", line 568, in validate_core_schema\n    return _validate_core_schema(schema)\n"
  ]
}
@marichkazb marichkazb added the installation trouble trouble building or installing chroma label May 21, 2024
@marichkazb
Copy link
Author

marichkazb commented May 21, 2024

Note: also getting the following message when trying to install chromadb-client in the linux env. although when installing those manually, system responds that the requirement is already satisfied

pip3 install -t ./python/ chromadb-client

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. botocore 1.34.61 requires urllib3<1.27,>=1.25.4; python_version < "3.10", but you have urllib3 2.2.1 which is incompatible. aws-sam-cli 1.112.0 requires requests~=2.31.0, but you have requests 2.32.1 which is incompatible.

@tazarov
Copy link
Contributor

tazarov commented May 22, 2024

@marichkazb, thanks for reaching out. Let me start by saying that your approach to Lambda is correct and is how many Chroma users are deploying/using Chroma in AWS.

Your original error does not seem to be an actual Chroma issue. From the trace, it appears to be related to pydantic models in the OpenAI package.

Regarding the second error, this appears to be some library version conflicts, which is a frequent thing in the fast-moving GenAI ecosystem. What are your system dependencies e.g. packages you have installed - chromadb-client and openai library alone?

Regarding your more specific question on the AWS Lambda. While I'll admit I am not expert in AWS stack, my personal preference would be a docker image over zipped dependencies. Have a look here for an example (https://github.com/erenyasarkurt/OpenAI-AWS-Lambda-Layer/blob/main/build/build.sh).

I understand that you can easily bake a docker image, upload it to ECR, and use it as the basis for your Lambda. If you're interested, I'll happily provide you with a more detailed example.

@marichkazb
Copy link
Author

@tazarov thank you for your time!! I’ve created a docker image and currently use it as a basis for the Lambda function, it indeed resolved all dependency conflicts, thank you! 🙌🏻

Also, I was wondering if chroma uses any temporary files when quering the collection?

I’m using the following function get_results to get the context for the system prompt for openAI. Although it seems like within the scope of this function it attempts to write files, resulting in an error: "error": "[Errno 30] Read-only file system: '/home/sbx_user1051’”. On AWS only /tmp folder is a writable directory, so any other attempt fails.

I tried setting the home environment to /tmp in the Dockerfile using ENV HOME=/tmp, but it didn’t help. If you have any ideas on how to possibly fix this, I'd really appreciate it!

def get_results(message):
    chroma_client = chromadb.HttpClient(host='11.11.111.11’, port=8000)
    from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

    embedding_function = SentenceTransformerEmbeddingFunction()
    chroma_collection = chroma_client.get_collection("knowledge", embedding_function=embedding_function)

    results = chroma_collection.query(query_texts=[message], n_results=5)
    retrieved_documents = results['documents'][0]
    concatenated_string = ""
    for document in retrieved_documents:
        concatenated_string += str(document)
    return concatenated_string
   

@cbrousseauAumni
Copy link

Did you figure out the /tmp folder issue @marichkazb or @tazarov ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation trouble trouble building or installing chroma
Projects
None yet
Development

No branches or pull requests

3 participants