Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: opensearch integration #853

Merged
merged 3 commits into from Dec 12, 2022

Conversation

jay-bhambhani
Copy link
Contributor

@jay-bhambhani jay-bhambhani commented Nov 25, 2022

Goals:

  • Integrating OpenSearch with DocArray
  • check and update documentation, if required. See guide

@JohannesMessner
Copy link
Member

Hi @jay-bhambhani, thanks for your contribution, we are rally excited about additional storage options for DocArray!

I approved you for CI runs, but to get it going you will have to apply black and flake8 to your code, in order to make it uniform in terms of style. This should be very quick, and can be done automatically for every commit by installing the respective pre-commit hooks..
I think you also need to fix the indentation here before the rest will run.

I will start reviewing this PR soon, but if you have any questions in the meantime, please just ping me here!

@jay-bhambhani
Copy link
Contributor Author

done

Hi @JohannesMessner - excited to contribute in some way! Definitely will take a look - it might take a little bit because my laptop unfortunately crapped out on me. But - just wanted to let you know so you don't think I've disappeared!

@codecov-commenter
Copy link

codecov-commenter commented Dec 5, 2022

Codecov Report

Base: 88.13% // Head: 70.51% // Decreases project coverage by -17.61% ⚠️

Coverage data is based on head (35c9038) compared to base (ae49969).
Patch coverage: 0.92% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #853       +/-   ##
===========================================
- Coverage   88.13%   70.51%   -17.62%     
===========================================
  Files         138      143        +5     
  Lines        7137     7462      +325     
===========================================
- Hits         6290     5262     -1028     
- Misses        847     2200     +1353     
Flag Coverage Δ
docarray 70.51% <0.92%> (-17.62%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/opensearch.py 0.00% <0.00%> (ø)
docarray/array/storage/opensearch/backend.py 0.00% <0.00%> (ø)
docarray/array/storage/opensearch/find.py 0.00% <0.00%> (ø)
docarray/array/storage/opensearch/getsetdel.py 0.00% <0.00%> (ø)
docarray/array/storage/opensearch/seqlike.py 0.00% <0.00%> (ø)
docarray/array/document.py 84.28% <50.00%> (-3.22%) ⬇️
docarray/array/mixins/dataloader/helper.py 0.00% <0.00%> (-100.00%) ⬇️
docarray/array/mixins/embed.py 9.89% <0.00%> (-82.42%) ⬇️
docarray/array/mixins/evaluation.py 8.33% <0.00%> (-80.13%) ⬇️
docarray/array/mixins/reduce.py 26.92% <0.00%> (-73.08%) ⬇️
... and 60 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@JohannesMessner JohannesMessner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look, but I am not really familiar with OpenSearch (or even ES, for that matter), so could someone else chime in? @alaeddine-13 @violenil?
Edit: Looks like some tests are failing, maybe we let you do your thing @jay-bhambhani before we jump in. Please reach out if anything is unclear!

docarray/array/storage/opensearch/backend.py Outdated Show resolved Hide resolved
Comment on lines 190 to 194
#'ef_search': opensearch_config.ef_search,
'ef_construction': opensearch_config.ef_construction,
'm': opensearch_config.m,
#'encoder': opensearch_config.encoder,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about these comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to do's for these - as a second step i think we should maybe allow for these options to be implemented as well?

@jay-bhambhani
Copy link
Contributor Author

thanks @JohannesMessner! will apply the feedback and continue chugging through these. one quick question - seeing this failure: hubble.excepts.AuthenticationRequiredError: -1: Jina auth token is not provided or has expired. <function PushPullMixin.push at 0x7f1321a538c0> requires login to Jina AI, please do jina auth login -f or set env variable `JINA_AUTH_TOKEN. how can i take care of this?

@JohannesMessner
Copy link
Member

thanks @JohannesMessner! will apply the feedback and continue chugging through these. one quick question - seeing this failure: hubble.excepts.AuthenticationRequiredError: -1: Jina auth token is not provided or has expired. <function PushPullMixin.push at 0x7f1321a538c0> requires login to Jina AI, please do jina auth login -f or set env variable `JINA_AUTH_TOKEN. how can i take care of this?

I think you can ignore all the tests that fail due to this. We will probably end up moving these tests out of DocArray anyways

@jay-bhambhani jay-bhambhani force-pushed the feat-opensearch-integration branch 4 times, most recently from b65946e to 61f655d Compare December 6, 2022 18:27
@jay-bhambhani
Copy link
Contributor Author

thanks @JohannesMessner! will apply the feedback and continue chugging through these. one quick question - seeing this failure: hubble.excepts.AuthenticationRequiredError: -1: Jina auth token is not provided or has expired. <function PushPullMixin.push at 0x7f1321a538c0> requires login to Jina AI, please do jina auth login -f or set env variable `JINA_AUTH_TOKEN. how can i take care of this?

I think you can ignore all the tests that fail due to this. We will probably end up moving these tests out of DocArray anyways

Awesome! Thanks. Sorry one more - I don't know what I'm breaking with commitlint. Can I please ask for a bit of assistance? I'll update the docs accordingly as well.

Thank you so much for your help!
Jay

Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain>
Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

tests

Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain>
Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

progress including more testing

Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain>
Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

tests done

Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain>
Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

review

Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain>
Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

clean up elastic references

Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain>
Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

docs

Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain>
Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

setup

Signed-off-by: Jayant Bhambhani <jaybhambhani@pop-os.localdomain>
Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

review and fixes

Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

more testing and hopefully fixing some old tests

Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

comment out unnecessary test

Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

deciding whether to make these variables available to config or not

Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>

formatting

Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>
@jay-bhambhani
Copy link
Contributor Author

OK! Looks like the only tests that aren't passing are related to this JINA_AUTH_TOKEN issue. If somebody could please take a look!

@jay-bhambhani jay-bhambhani requested review from kdewald and JohannesMessner and removed request for JohannesMessner and kdewald December 8, 2022 21:06
Co-authored-by: Johannes Messner <44071807+JohannesMessner@users.noreply.github.com>
Signed-off-by: Jay Bhambhani <jayant.bhambhani@gmail.com>
Copy link
Member

@alexcg1 alexcg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from a preliminary docs check. Just a few small tweaks


# Opensearch

One can use [Opensearch](https://opensearch.org/) as the document store for DocumentArray. It is useful when one wants to have faster Document retrieval on embeddings, i.e. `.match()`, `.find()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change from "one" to "you" throughout the doc please? To follow our updated style guidelines (updated like yesterday - they're not public yet)


### Bulk request customization

You can customize how bulk requests is being sent to Opensearch when adding documents by adding additional `kwargs` on `extend` method call. See [the official Documentation](https://opensearch.org/docs/1.2/opensearch/rest-api/document-apis/bulk/) for more details. See the following code for example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can customize how bulk requests is being sent to Opensearch when adding documents by adding additional `kwargs` on `extend` method call. See [the official Documentation](https://opensearch.org/docs/1.2/opensearch/rest-api/document-apis/bulk/) for more details. See the following code for example:
You can customize how bulk requests are sent to Opensearch when adding Documents by adding additional `kwargs` on `extend` method call. See [the official documentation](https://opensearch.org/docs/1.2/opensearch/rest-api/document-apis/bulk/) for more details. See the following code for example:


One can perform Approximate Nearest Neighbor Search and pre-filter results using a filter query .

Consider Documents with embeddings `[0,0,0]` up to `[9,9,9]` where the document with embedding `[i,i,i]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consider Documents with embeddings `[0,0,0]` up to `[9,9,9]` where the document with embedding `[i,i,i]`
Consider Documents with embeddings `[0,0,0]` up to `[9,9,9]` where the Document with embedding `[i,i,i]`

```

Consider we want the nearest vectors to the embedding `[8. 8. 8.]`, with the restriction that
prices must follow a filter. As an example, let's consider that retrieved documents must have `price` value lower
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure all occurrences of Document or DocumentArray are capitalized (since they are a class in DocArray)

```
````

## Config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Config
## Configuration

Comment on lines 391 to 392

The following configs can be set:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following configs can be set:

This is redundant (I removed the line from other document store docs in a PR earlier today)

Signed-off-by: Jay Bhambhani <jaybhambhani@pop-os.localdomain>
@JoanFM JoanFM changed the title Feat opensearch integration feat: pensearch integration Dec 12, 2022
@JoanFM JoanFM changed the title feat: pensearch integration feat: opensearch integration Dec 12, 2022
Copy link
Member

@alexcg1 alexcg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@JoanFM JoanFM merged commit 6ec0cf5 into docarray:main Dec 12, 2022
@JoanFM
Copy link
Member

JoanFM commented Dec 12, 2022

Thanks a lot for your contribution @jay-bhambhani

@jay-bhambhani
Copy link
Contributor Author

Glad I could be of service! I'll keep adding and improving things as we go! Great project!

@jay-bhambhani jay-bhambhani deleted the feat-opensearch-integration branch December 12, 2022 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants