Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add docs for RPS #328

Merged
merged 6 commits into from
Apr 1, 2024
Merged

add docs for RPS #328

merged 6 commits into from
Apr 1, 2024

Conversation

HenryL27
Copy link
Collaborator

@HenryL27 HenryL27 commented Apr 1, 2024

There are a couple missing links still

@HenryL27 HenryL27 requested a review from jonfritz April 1, 2024 17:23
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Copy link
Contributor

@jonfritz jonfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start. Would sync with Alex on his dedup section - I might include that as it's own doc vs. going deeper on this page.

These search processors include:

- `debug`: prints the search response to stdout. Useful for debugging.
- `dedup`: works in conjunction with the `Sketcher` ingest transform to deduplicate search results at query-time. See the [sketch](../data_ingestion_and_preparation/transforms/sketch.md) transform for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put deup first? We also want to link to Alex's docs for this. We should not point to sketch - instead, point to Alex's docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can say it works in conjunction with Sketcher though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can bump it. was going with alphabetical scheme but can go by importance scheme. I couldn't find anywhere to point to other than sketch. alex confirmed this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - it's because Alex hasn't written it yet. That's where all the info is. Can you please add this in today once Alex's docs for query time NDD are written? We want users going to that page, not sketcher.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetical is fine, I guess.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will re-point with a new target. Actually will make alex re-point this.

- `debug`: prints the search response to stdout. Useful for debugging.
- `dedup`: works in conjunction with the `Sketcher` ingest transform to deduplicate search results at query-time. See the [sketch](../data_ingestion_and_preparation/transforms/sketch.md) transform for more details.

These processors run in a microservice in the sycamore ecosystem called RPS. The processors running by default are configured in a config file: [remote-processor-service/config/pipelines.yml](https://github.com/aryn-ai/sycamore/blob/main/apps/remote-processor-service/config/pipelines.yml)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge the first sentence ino the first paragraph - this is repetitive.

"The default hybrid search and RAG pipelines (link to each in the docs) include remote processors:"

threshold: 0.4
```

You can add a remote processor to a search pipeline using standard OpenSearch search pipeline creation syntax:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it clear you can only use the ones we provide in Sycamore?


### Configuring RPS

You can change the config file for RPS. Simply declare a processor, which processors are part of it, and their individual parameters. So, maybe I want a processor that dedupes and prints the search response before and after so I can compare them manually. I can declare this configuration in a new `my-pipelines.yml`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we want people doing this right now. I might remove it - unless there is a clear use case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I put in a

Warning: This is super experimental. Don't do this unless you really need to.

? Also happy to delete

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there isn't a use case we know, I'd leave this out. Our docs are already complex, let's try to keep it simple. I'd keep this text on your machine, in case we need to add it back. Unless - do you have a use case we think customers would do here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, will remove.

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
@HenryL27 HenryL27 requested a review from jonfritz April 1, 2024 21:56
Copy link
Contributor

@jonfritz jonfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments. Thx for the iteration.

@@ -0,0 +1,83 @@
# Remote Search Processors

> 👉 This page assumes you’re familiar with OpenSearch’s concept of a search pipeline and search processor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a friendly customer experience. Instead, write a note guiding somewhere were to go to learn.

These search processors include:

- `debug`: prints the search response to stdout. Useful for debugging.
- `dedup`: works in conjunction with the `Sketcher` ingest transform to deduplicate search results at query-time. See the [sketch](../data_ingestion_and_preparation/transforms/sketch.md) transform for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - it's because Alex hasn't written it yet. That's where all the info is. Can you please add this in today once Alex's docs for query time NDD are written? We want users going to that page, not sketcher.

These search processors include:

- `debug`: prints the search response to stdout. Useful for debugging.
- `dedup`: works in conjunction with the `Sketcher` ingest transform to deduplicate search results at query-time. See the [sketch](../data_ingestion_and_preparation/transforms/sketch.md) transform for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetical is fine, I guess.


### Configuring RPS

You can change the config file for RPS. Simply declare a processor, which processors are part of it, and their individual parameters. So, maybe I want a processor that dedupes and prints the search response before and after so I can compare them manually. I can declare this configuration in a new `my-pipelines.yml`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there isn't a use case we know, I'd leave this out. Our docs are already complex, let's try to keep it simple. I'd keep this text on your machine, in case we need to add it back. Unless - do you have a use case we think customers would do here?

Copy link
Contributor

@jonfritz jonfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending a nicer note at the top. Thx!

Signed-off-by: Henry Lindeman <hmlindeman@yahoo.com>
@HenryL27 HenryL27 merged commit 5da99d4 into main Apr 1, 2024
12 checks passed
@HenryL27 HenryL27 deleted the rps-docs branch April 1, 2024 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants