Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEW BACKEND! MongoDB Atlas #1883

Merged
merged 37 commits into from
Apr 29, 2024
Merged

Conversation

caseyclements
Copy link
Contributor

Description

This pull request introduces MongoDB integration as a document index backend, enhancing search capabilities within the application. Below are the details of the implementation and its supported functionalities:

Simple Usage

from docarray.index import MongoAtlasDocumentIndex
import numpy as np

class MyDoc(BaseDoc):
    text: str
    embedding: NdArray[10]

docs = [MyDoc(text=f'text {i}', embedding=np.random.rand(10)) for i in range(10)]
query = np.random.rand(10)
db = MongoAtlasDocumentIndex[MyDoc](host='localhost')
db.index(docs)
results = index.find(query, search_field='embedding', limit=10)

Supported Functionality

  • Find (vector search): Enables vector-based search.
  • Filter: Allows filtering on textual and numeric data using MongoDB syntax.
  • Text Search: Supports text search using regex match.
  • Get/Del: Retrieve and delete operations.
  • Subindex: Ability to create subindexes for better organization.

Integration Tests and documentation

  • tests/index/mongo_atlas
  • docs/API_reference/doc_index/backends/mongodb.md

Coming soon

  • Implementation of QueryBuilder and Hybrid Search.

@JoanFM
Copy link
Member

JoanFM commented Apr 24, 2024

Hello @caseyclements ,

Thanks a lot for this amazing contribution. May I ask if you could please sign off the commits so that we can pass the DCO and merge the PR if accepted?

docarray/index/backends/mongodb_atlas.py Outdated Show resolved Hide resolved
docarray/index/backends/mongodb_atlas.py Outdated Show resolved Hide resolved
docarray/index/backends/mongodb_atlas.py Show resolved Hide resolved
WaVEV and others added 28 commits April 24, 2024 12:36
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
…equired_args

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@prakul
Copy link

prakul commented Apr 24, 2024

All the comments have been addressed @JoanFM

Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check how to pass the black check? You can find in our CONTRIBUTING guidelines the steps to find

docarray/index/backends/mongodb_atlas.py Outdated Show resolved Hide resolved
docarray/index/backends/mongodb_atlas.py Outdated Show resolved Hide resolved
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@JoanFM
Copy link
Member

JoanFM commented Apr 26, 2024

We are gettting these errors in the tests caused by the type annotations , maybe my requests were not ggood enough.

docarray/index/backends/mongodb_atlas.py:197: in MongoDBAtlasDocumentIndex
    ) -> Tuple[list[dict], list[float]]:
E   TypeError: 'type' object is not subscriptable

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@caseyclements
Copy link
Contributor Author

Hi @JoanFM. Thank you for your working with me on this. I created a python 3.8 poetry environment for better coverage. black appears to have changed its mind between versions with regards to the ellipsis .... Running from 3.8, it moved it to the following line, where the day before, my 3.11 black moved it to the same one! :) I changed it to pass. ¯_(ツ)_/¯

-    class RuntimeConfig(BaseDocIndex.RuntimeConfig): ...
+    class RuntimeConfig(BaseDocIndex.RuntimeConfig):
+        ...

And now

-    class RuntimeConfig(BaseDocIndex.RuntimeConfig): ...
+    class RuntimeConfig(BaseDocIndex.RuntimeConfig):
+        pass

I'll turn to the typing issues, mypy with py3.8.

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@caseyclements
Copy link
Contributor Author

I'm sorry that I missed list to List. I recognized the required change while updating tuple, but it was a busy morning. I pushed a new commit.

Copy link

codecov bot commented Apr 26, 2024

Codecov Report

Attention: Patch coverage is 41.11675% with 116 lines in your changes are missing coverage. Please review.

Project coverage is 44.75%. Comparing base (febbdc4) to head (5c01811).

❗ Current head 5c01811 differs from pull request most recent head 3b03f06. Consider uploading reports for the commit 3b03f06 to get more accurate results

Files Patch % Lines
docarray/index/backends/mongodb_atlas.py 40.10% 115 Missing ⚠️
docarray/index/__init__.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1883       +/-   ##
===========================================
- Coverage   84.69%   44.75%   -39.95%     
===========================================
  Files         136      137        +1     
  Lines        9263     9459      +196     
===========================================
- Hits         7845     4233     -3612     
- Misses       1418     5226     +3808     
Flag Coverage Δ
docarray 44.75% <41.11%> (-39.95%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@caseyclements
Copy link
Contributor Author

Hi @JoanFM . I can address the code coverage by adding a new action to .github/workflows/ci.yml. @prakul We can set up the correct credentials and share a google doc.

@JoanFM
Copy link
Member

JoanFM commented Apr 26, 2024

Hi @JoanFM . I can address the code coverage by adding a new action to .github/workflows/ci.yml. @prakul We can set up the correct credentials and share a google doc.

no need to worry about code coverage

@caseyclements
Copy link
Contributor Author

Hi @JoanFM . I can address the code coverage by adding a new action to .github/workflows/ci.yml. @prakul We can set up the correct credentials and share a google doc.

no need to worry about code coverage

Cool. What remains then?

In the next two weeks (I'm in London this coming one) when we add the QueryBuilder, we'll also set up the testing on your end. We are already running on our CI against Atlas on our end. Maybe we could set up a face-to-face meeting in a couple weeks. We can optimize to get the most of MongoDB's API once we know the scope of the use cases for the Indexes, and the data types.

@JoanFM
Copy link
Member

JoanFM commented Apr 26, 2024

Hi @JoanFM . I can address the code coverage by adding a new action to .github/workflows/ci.yml. @prakul We can set up the correct credentials and share a google doc.

no need to worry about code coverage

Cool. What remains then?

In the next two weeks (I'm in London this coming one) when we add the QueryBuilder, we'll also set up the testing on your end. We are already running on our CI against Atlas on our end. Maybe we could set up a face-to-face meeting in a couple weeks. We can optimize to get the most of MongoDB's API once we know the scope of the use cases for the Indexes, and the data types.

there seems to be a test timing out but not sure if it comes from your changes.

So what is the plan for the upcoming changes feom your side?

@JoanFM JoanFM merged commit f5c9ab0 into docarray:main Apr 29, 2024
34 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants