-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rebase ingestion strategy #202
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me!
- Reviewed the entire pull request up to 9649d48
- Looked at
624
lines of code in13
files - Took 13 minutes and 17 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
13
additional comments because they didn't meet confidence threshold of50%
.
1. r2r/core/pipelines/embedding.py:33
:
- Assessed confidence :
0%
- Comment:
Thegenerate_run_id
function is correctly used here to generate a unique run id for the pipeline. This is a good practice as it ensures that each run of the pipeline has a unique identifier. - Reasoning:
The PR introduces a new utility functiongenerate_run_id
which is used to generate a unique run id for the pipeline. This function is used in theinitialize_pipeline
method of theEmbeddingPipeline
,EvalPipeline
, andIngestionPipeline
classes. The PR also modifies theprocess_data
andparse_entry
methods in theIngestionPipeline
class to yieldBasicDocument
objects instead of returning a string. This change is reflected in therun
method of theIngestionPipeline
class and therun
method of theEmbeddingPipeline
class, which now loop over the yieldedBasicDocument
objects. The PR also introduces a new utility functiongenerate_id_from_label
which is used to generate a unique id from a label. This function is used in therun_client.py
scripts in theexamples
directory to generate unique document ids and user ids. Overall, the changes in the PR seem to be correctly implemented and should improve the functionality of the codebase.
2. r2r/core/pipelines/eval.py:28
:
- Assessed confidence :
0%
- Comment:
Thegenerate_run_id
function is correctly used here to generate a unique run id for the pipeline. This is a good practice as it ensures that each run of the pipeline has a unique identifier. - Reasoning:
Thegenerate_run_id
function is correctly used here to generate a unique run id for the pipeline. This is a good practice as it ensures that each run of the pipeline has a unique identifier.
3. r2r/core/pipelines/ingestion.py:21
:
- Assessed confidence :
0%
- Comment:
Thegenerate_run_id
function is correctly used here to generate a unique run id for the pipeline. This is a good practice as it ensures that each run of the pipeline has a unique identifier. - Reasoning:
Thegenerate_run_id
function is correctly used here to generate a unique run id for the pipeline. This is a good practice as it ensures that each run of the pipeline has a unique identifier.
4. r2r/core/utils/base.py:8
:
- Assessed confidence :
0%
- Comment:
Thegenerate_id_from_label
function is correctly implemented and used to generate a unique id from a label. This is a good practice as it ensures that each document and user has a unique identifier. - Reasoning:
Thegenerate_id_from_label
function is a new utility function that generates a unique id from a label. This function is used in therun_client.py
scripts in theexamples
directory to generate unique document ids and user ids. This is a good practice as it ensures that each document and user has a unique identifier.
5. r2r/examples/academy/client.py:16
:
- Assessed confidence :
0%
- Comment:
Thegenerate_id_from_label
function is correctly used here to generate a unique user id. This is a good practice as it ensures that each user has a unique identifier. - Reasoning:
Thegenerate_id_from_label
function is used here to generate a unique user id. This is a good practice as it ensures that each user has a unique identifier.
6. r2r/examples/academy/client.py:25
:
- Assessed confidence :
0%
- Comment:
Thegenerate_id_from_label
function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier. - Reasoning:
Thegenerate_id_from_label
function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
7. r2r/examples/academy/run_client.py:16
:
- Assessed confidence :
0%
- Comment:
Thegenerate_id_from_label
function is correctly used here to generate a unique user id. This is a good practice as it ensures that each user has a unique identifier. - Reasoning:
Thegenerate_id_from_label
function is used here to generate a unique user id. This is a good practice as it ensures that each user has a unique identifier.
8. r2r/examples/academy/run_client.py:27
:
- Assessed confidence :
0%
- Comment:
Thegenerate_id_from_label
function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier. - Reasoning:
Thegenerate_id_from_label
function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
9. r2r/examples/basic/run_client.py:15
:
- Assessed confidence :
0%
- Comment:
Thegenerate_id_from_label
function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier. - Reasoning:
Thegenerate_id_from_label
function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
10. r2r/examples/basic/run_client.py:24
:
- Assessed confidence :
0%
- Comment:
Thegenerate_id_from_label
function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier. - Reasoning:
Thegenerate_id_from_label
function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
11. r2r/examples/basic/run_client.py:36
:
- Assessed confidence :
0%
- Comment:
Thegenerate_id_from_label
function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier. - Reasoning:
Thegenerate_id_from_label
function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
12. r2r/examples/basic/run_client.py:41
:
- Assessed confidence :
0%
- Comment:
Thegenerate_id_from_label
function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier. - Reasoning:
Thegenerate_id_from_label
function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
13. r2r/examples/basic/run_client.py:62
:
- Assessed confidence :
0%
- Comment:
Thegenerate_id_from_label
function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier. - Reasoning:
Thegenerate_id_from_label
function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
Workflow ID: wflow_ZQbAw7kXmCa23Fmb
Not what you expected? You can customize the content of the reviews using rules. Learn more here.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
This PR introduces new ID generation functions, modifies the
IngestionPipeline
to yieldBasicDocument
objects, and updates therun
method inr2r/main/app.py
and example scripts accordingly.Key points:
uuid.uuid4
anduuid.uuid5
withgenerate_run_id
andgenerate_id_from_label
respectively in various files.IngestionPipeline
to yieldBasicDocument
objects instead of returning a singleBasicDocument
.run
method inr2r/main/app.py
to handle the change inIngestionPipeline
.Generated with ❤️ by ellipsis.dev