New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: opensearch integration #853
feat: opensearch integration #853
Conversation
939b627
to
5e6b9b0
Compare
Hi @jay-bhambhani, thanks for your contribution, we are rally excited about additional storage options for DocArray! I approved you for CI runs, but to get it going you will have to apply I will start reviewing this PR soon, but if you have any questions in the meantime, please just ping me here! |
Hi @JohannesMessner - excited to contribute in some way! Definitely will take a look - it might take a little bit because my laptop unfortunately crapped out on me. But - just wanted to let you know so you don't think I've disappeared! |
0495583
to
35c9038
Compare
Codecov ReportBase: 88.13% // Head: 70.51% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #853 +/- ##
===========================================
- Coverage 88.13% 70.51% -17.62%
===========================================
Files 138 143 +5
Lines 7137 7462 +325
===========================================
- Hits 6290 5262 -1028
- Misses 847 2200 +1353
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look, but I am not really familiar with OpenSearch (or even ES, for that matter), so could someone else chime in? @alaeddine-13 @violenil?
Edit: Looks like some tests are failing, maybe we let you do your thing @jay-bhambhani before we jump in. Please reach out if anything is unclear!
#'ef_search': opensearch_config.ef_search, | ||
'ef_construction': opensearch_config.ef_construction, | ||
'm': opensearch_config.m, | ||
#'encoder': opensearch_config.encoder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about these comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to do's for these - as a second step i think we should maybe allow for these options to be implemented as well?
thanks @JohannesMessner! will apply the feedback and continue chugging through these. one quick question - seeing this failure: hubble.excepts.AuthenticationRequiredError: -1: Jina auth token is not provided or has expired. <function PushPullMixin.push at 0x7f1321a538c0> requires login to Jina AI, please do |
I think you can ignore all the tests that fail due to this. We will probably end up moving these tests out of DocArray anyways |
b65946e
to
61f655d
Compare
Awesome! Thanks. Sorry one more - I don't know what I'm breaking with commitlint. Can I please ask for a bit of assistance? I'll update the docs accordingly as well. Thank you so much for your help! |
Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain> Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> tests Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain> Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> progress including more testing Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain> Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> tests done Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain> Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> review Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain> Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> clean up elastic references Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain> Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> docs Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain> Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> setup Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain> Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> review and fixes Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> more testing and hopefully fixing some old tests Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> comment out unnecessary test Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> deciding whether to make these variables available to config or not Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain> formatting Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>
61f655d
to
c9be60d
Compare
OK! Looks like the only tests that aren't passing are related to this JINA_AUTH_TOKEN issue. If somebody could please take a look! |
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com> Signed-off-by: Jay Bhambhani <jayant.bhambhani@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from a preliminary docs check. Just a few small tweaks
|
||
# Opensearch | ||
|
||
One can use [Opensearch](https://opensearch.org/) as the document store for DocumentArray. It is useful when one wants to have faster Document retrieval on embeddings, i.e. `.match()`, `.find()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change from "one" to "you" throughout the doc please? To follow our updated style guidelines (updated like yesterday - they're not public yet)
|
||
### Bulk request customization | ||
|
||
You can customize how bulk requests is being sent to Opensearch when adding documents by adding additional `kwargs` on `extend` method call. See [the official Documentation](https://opensearch.org/docs/1.2/opensearch/rest-api/document-apis/bulk/) for more details. See the following code for example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can customize how bulk requests is being sent to Opensearch when adding documents by adding additional `kwargs` on `extend` method call. See [the official Documentation](https://opensearch.org/docs/1.2/opensearch/rest-api/document-apis/bulk/) for more details. See the following code for example: | |
You can customize how bulk requests are sent to Opensearch when adding Documents by adding additional `kwargs` on `extend` method call. See [the official documentation](https://opensearch.org/docs/1.2/opensearch/rest-api/document-apis/bulk/) for more details. See the following code for example: |
|
||
One can perform Approximate Nearest Neighbor Search and pre-filter results using a filter query . | ||
|
||
Consider Documents with embeddings `[0,0,0]` up to `[9,9,9]` where the document with embedding `[i,i,i]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider Documents with embeddings `[0,0,0]` up to `[9,9,9]` where the document with embedding `[i,i,i]` | |
Consider Documents with embeddings `[0,0,0]` up to `[9,9,9]` where the Document with embedding `[i,i,i]` |
``` | ||
|
||
Consider we want the nearest vectors to the embedding `[8. 8. 8.]`, with the restriction that | ||
prices must follow a filter. As an example, let's consider that retrieved documents must have `price` value lower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure all occurrences of Document
or DocumentArray
are capitalized (since they are a class in DocArray)
``` | ||
```` | ||
|
||
## Config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Config | |
## Configuration |
|
||
The following configs can be set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following configs can be set: |
This is redundant (I removed the line from other document store docs in a PR earlier today)
Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Thanks a lot for your contribution @jay-bhambhani |
Glad I could be of service! I'll keep adding and improving things as we go! Great project! |
Goals: