Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebase ingestion strategy #202

Merged
merged 2 commits into from
Mar 25, 2024
Merged

Conversation

emrgnt-cmplxty
Copy link
Contributor

@emrgnt-cmplxty emrgnt-cmplxty commented Mar 25, 2024

Ellipsis 🚀 This PR description was created by Ellipsis for commit 9649d48.

Summary:

This PR introduces new ID generation functions, modifies the IngestionPipeline to yield BasicDocument objects, and updates the run method in r2r/main/app.py and example scripts accordingly.

Key points:

  • Replaced uuid.uuid4 and uuid.uuid5 with generate_run_id and generate_id_from_label respectively in various files.
  • Modified IngestionPipeline to yield BasicDocument objects instead of returning a single BasicDocument.
  • Updated the run method in r2r/main/app.py to handle the change in IngestionPipeline.
  • Updated example scripts to use the new ID generation functions.

Generated with ❤️ by ellipsis.dev

Copy link

vercel bot commented Mar 25, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
r2r-docs 🔄 Building (Inspect) Visit Preview Mar 25, 2024 3:31am

@emrgnt-cmplxty emrgnt-cmplxty merged commit a19a86b into main Mar 25, 2024
1 of 2 checks passed
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Reviewed the entire pull request up to 9649d48
  • Looked at 624 lines of code in 13 files
  • Took 13 minutes and 17 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 13 additional comments because they didn't meet confidence threshold of 50%.
1. r2r/core/pipelines/embedding.py:33:
  • Assessed confidence : 0%
  • Comment:
    The generate_run_id function is correctly used here to generate a unique run id for the pipeline. This is a good practice as it ensures that each run of the pipeline has a unique identifier.
  • Reasoning:
    The PR introduces a new utility function generate_run_id which is used to generate a unique run id for the pipeline. This function is used in the initialize_pipeline method of the EmbeddingPipeline, EvalPipeline, and IngestionPipeline classes. The PR also modifies the process_data and parse_entry methods in the IngestionPipeline class to yield BasicDocument objects instead of returning a string. This change is reflected in the run method of the IngestionPipeline class and the run method of the EmbeddingPipeline class, which now loop over the yielded BasicDocument objects. The PR also introduces a new utility function generate_id_from_label which is used to generate a unique id from a label. This function is used in the run_client.py scripts in the examples directory to generate unique document ids and user ids. Overall, the changes in the PR seem to be correctly implemented and should improve the functionality of the codebase.
2. r2r/core/pipelines/eval.py:28:
  • Assessed confidence : 0%
  • Comment:
    The generate_run_id function is correctly used here to generate a unique run id for the pipeline. This is a good practice as it ensures that each run of the pipeline has a unique identifier.
  • Reasoning:
    The generate_run_id function is correctly used here to generate a unique run id for the pipeline. This is a good practice as it ensures that each run of the pipeline has a unique identifier.
3. r2r/core/pipelines/ingestion.py:21:
  • Assessed confidence : 0%
  • Comment:
    The generate_run_id function is correctly used here to generate a unique run id for the pipeline. This is a good practice as it ensures that each run of the pipeline has a unique identifier.
  • Reasoning:
    The generate_run_id function is correctly used here to generate a unique run id for the pipeline. This is a good practice as it ensures that each run of the pipeline has a unique identifier.
4. r2r/core/utils/base.py:8:
  • Assessed confidence : 0%
  • Comment:
    The generate_id_from_label function is correctly implemented and used to generate a unique id from a label. This is a good practice as it ensures that each document and user has a unique identifier.
  • Reasoning:
    The generate_id_from_label function is a new utility function that generates a unique id from a label. This function is used in the run_client.py scripts in the examples directory to generate unique document ids and user ids. This is a good practice as it ensures that each document and user has a unique identifier.
5. r2r/examples/academy/client.py:16:
  • Assessed confidence : 0%
  • Comment:
    The generate_id_from_label function is correctly used here to generate a unique user id. This is a good practice as it ensures that each user has a unique identifier.
  • Reasoning:
    The generate_id_from_label function is used here to generate a unique user id. This is a good practice as it ensures that each user has a unique identifier.
6. r2r/examples/academy/client.py:25:
  • Assessed confidence : 0%
  • Comment:
    The generate_id_from_label function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
  • Reasoning:
    The generate_id_from_label function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
7. r2r/examples/academy/run_client.py:16:
  • Assessed confidence : 0%
  • Comment:
    The generate_id_from_label function is correctly used here to generate a unique user id. This is a good practice as it ensures that each user has a unique identifier.
  • Reasoning:
    The generate_id_from_label function is used here to generate a unique user id. This is a good practice as it ensures that each user has a unique identifier.
8. r2r/examples/academy/run_client.py:27:
  • Assessed confidence : 0%
  • Comment:
    The generate_id_from_label function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
  • Reasoning:
    The generate_id_from_label function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
9. r2r/examples/basic/run_client.py:15:
  • Assessed confidence : 0%
  • Comment:
    The generate_id_from_label function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
  • Reasoning:
    The generate_id_from_label function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
10. r2r/examples/basic/run_client.py:24:
  • Assessed confidence : 0%
  • Comment:
    The generate_id_from_label function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
  • Reasoning:
    The generate_id_from_label function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
11. r2r/examples/basic/run_client.py:36:
  • Assessed confidence : 0%
  • Comment:
    The generate_id_from_label function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
  • Reasoning:
    The generate_id_from_label function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
12. r2r/examples/basic/run_client.py:41:
  • Assessed confidence : 0%
  • Comment:
    The generate_id_from_label function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
  • Reasoning:
    The generate_id_from_label function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
13. r2r/examples/basic/run_client.py:62:
  • Assessed confidence : 0%
  • Comment:
    The generate_id_from_label function is correctly used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.
  • Reasoning:
    The generate_id_from_label function is used here to generate a unique document id. This is a good practice as it ensures that each document has a unique identifier.

Workflow ID: wflow_ZQbAw7kXmCa23Fmb


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

@emrgnt-cmplxty emrgnt-cmplxty deleted the feature/rebase-ingestion-strat-merged branch March 25, 2024 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant