Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from whitelisting parsers to blacklisting #445

Merged
merged 5 commits into from
Jun 13, 2024

Conversation

NolanTrem
Copy link
Collaborator

@NolanTrem NolanTrem commented Jun 13, 2024

🚀 This description was created by Ellipsis for commit 6ed6881

Summary:

Switched from whitelisting to blacklisting parsers in configuration, updated relevant code, documentation, and tests.

Key points:

  • Configuration Change: Switched from selected_parsers to excluded_parsers in config.json files.
  • Code Update: Updated r2r/main/r2r_app.py to handle excluded parsers during file ingestion.
  • Class Update: Modified R2RConfig to initialize excluded_parsers.
  • Factory Update: Adjusted R2RPipeFactory to pass excluded_parsers to R2RDocumentParsingPipe.
  • Parsing Logic: Updated DocumentParsingPipe and R2RDocumentParsingPipe to use excluded_parsers.
  • Tests: Updated tests to reflect the new excluded_parsers configuration.
  • Documentation: Updated documentation in docs/pages/cookbooks/client-server.mdx, docs/pages/cookbooks/local-rag.mdx, and docs/pages/getting-started/installation.mdx to reflect the new configuration approach.

Generated with ❤️ by ellipsis.dev

Copy link

vercel bot commented Jun 13, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
r2r-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 13, 2024 8:50pm

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 7744d7f in 1 minute and 52 seconds

More details
  • Looked at 608 lines of code in 9 files
  • Skipped 1 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. docs/pages/deep-dive/app.mdx:141
  • Draft comment:
    The documentation here does not list all file types that are excluded in the application's configuration. Ensure that the documentation matches the actual configuration to avoid confusion.

This issue is also present in docs/pages/deep-dive/config.mdx.

  • Reason this comment was not posted:
    Confidence of 50% on close inspection, compared to threshold of 50%.

Workflow ID: wflow_2JCrkv0VDVm3uBGq


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on b05f455 in 2 minutes and 16 seconds

More details
  • Looked at 277 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_aoPdYxzIZdhDgU4e


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

DocumentType[file_extension.upper()]
in excluded_parsers
):
logger.error(f"{file_extension} is explicitly excluded in the configuration file.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing the HTTP status code to 403 to more accurately reflect that the file type is forbidden by configuration, not unsupported.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 96964f0 in 1 minute and 35 seconds

More details
  • Looked at 219 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. docs/pages/cookbooks/local-rag.mdx:86
  • Draft comment:
    The configuration snippet still uses the old 'excluded_parsers' format. Please update this to reflect the new configuration style as per the PR changes.
  • Reason this comment was not posted:
    Confidence of 0% on close inspection, compared to threshold of 50%.

Workflow ID: wflow_QajqPvIfnvLZk7WB


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 174fd90 in 1 minute and 56 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_CdaKVdI4N99Bnf1X


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

):
logger.error(f"{file_extension} is explicitly excluded in the configuration file.")
raise HTTPException(
status_code=403,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in HTTP status code from 415 to 403 might not be appropriate here. The original 415 status code is more suitable as it directly relates to unsupported media types, which aligns with excluding certain file types from processing. Consider reverting this change.

Suggested change
status_code=403,
status_code=415,

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 6ed6881 in 1 minute and 8 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. r2r/main/r2r_app.py:521
  • Draft comment:
    The change from HTTP status code 403 to 415 is appropriate here as it more accurately reflects the nature of the error related to file type support.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The change in HTTP status code from 403 to 415 in the error handling for excluded file types is appropriate given the context. HTTP 415 (Unsupported Media Type) is more suitable for cases where the file type is not supported due to configuration settings, as opposed to HTTP 403 (Forbidden) which implies a lack of permission. This change aligns better with the nature of the error being related to file type support rather than authorization issues.

Workflow ID: wflow_2Bb5YdtvrCmPOgfi


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@emrgnt-cmplxty emrgnt-cmplxty merged commit ce0665a into main Jun 13, 2024
3 checks passed
@NolanTrem NolanTrem deleted the Nolan/WhitelistBlacklist branch June 13, 2024 23:17
iCUE-Solutions pushed a commit to DeweyLearn/DeweyLearnR2R that referenced this pull request Jul 18, 2024
* Move from whitelisting parsers to blacklisting

* Check in

* Update docs

* Move from 415 to 403

* Readd, Ellipsis is whack
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants