-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move from whitelisting parsers to blacklisting #445
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 7744d7f in 1 minute and 52 seconds
More details
- Looked at
608
lines of code in9
files - Skipped
1
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. docs/pages/deep-dive/app.mdx:141
- Draft comment:
The documentation here does not list all file types that are excluded in the application's configuration. Ensure that the documentation matches the actual configuration to avoid confusion.
This issue is also present in docs/pages/deep-dive/config.mdx
.
- Reason this comment was not posted:
Confidence of 50% on close inspection, compared to threshold of 50%.
Workflow ID: wflow_2JCrkv0VDVm3uBGq
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Incremental review on b05f455 in 2 minutes and 16 seconds
More details
- Looked at
277
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
0
drafted comments based on config settings.
Workflow ID: wflow_aoPdYxzIZdhDgU4e
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
DocumentType[file_extension.upper()] | ||
in excluded_parsers | ||
): | ||
logger.error(f"{file_extension} is explicitly excluded in the configuration file.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider changing the HTTP status code to 403 to more accurately reflect that the file type is forbidden by configuration, not unsupported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 96964f0 in 1 minute and 35 seconds
More details
- Looked at
219
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. docs/pages/cookbooks/local-rag.mdx:86
- Draft comment:
The configuration snippet still uses the old 'excluded_parsers' format. Please update this to reflect the new configuration style as per the PR changes. - Reason this comment was not posted:
Confidence of 0% on close inspection, compared to threshold of 50%.
Workflow ID: wflow_QajqPvIfnvLZk7WB
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Incremental review on 174fd90 in 1 minute and 56 seconds
More details
- Looked at
13
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
0
drafted comments based on config settings.
Workflow ID: wflow_CdaKVdI4N99Bnf1X
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
r2r/main/r2r_app.py
Outdated
): | ||
logger.error(f"{file_extension} is explicitly excluded in the configuration file.") | ||
raise HTTPException( | ||
status_code=403, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in HTTP status code from 415 to 403 might not be appropriate here. The original 415 status code is more suitable as it directly relates to unsupported media types, which aligns with excluding certain file types from processing. Consider reverting this change.
status_code=403, | |
status_code=415, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 6ed6881 in 1 minute and 8 seconds
More details
- Looked at
13
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. r2r/main/r2r_app.py:521
- Draft comment:
The change from HTTP status code 403 to 415 is appropriate here as it more accurately reflects the nature of the error related to file type support. - Reason this comment was not posted:
Confidence changes required:0%
The change in HTTP status code from 403 to 415 in the error handling for excluded file types is appropriate given the context. HTTP 415 (Unsupported Media Type) is more suitable for cases where the file type is not supported due to configuration settings, as opposed to HTTP 403 (Forbidden) which implies a lack of permission. This change aligns better with the nature of the error being related to file type support rather than authorization issues.
Workflow ID: wflow_2Bb5YdtvrCmPOgfi
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
* Move from whitelisting parsers to blacklisting * Check in * Update docs * Move from 415 to 403 * Readd, Ellipsis is whack
Summary:
Switched from whitelisting to blacklisting parsers in configuration, updated relevant code, documentation, and tests.
Key points:
selected_parsers
toexcluded_parsers
inconfig.json
files.r2r/main/r2r_app.py
to handle excluded parsers during file ingestion.R2RConfig
to initializeexcluded_parsers
.R2RPipeFactory
to passexcluded_parsers
toR2RDocumentParsingPipe
.DocumentParsingPipe
andR2RDocumentParsingPipe
to useexcluded_parsers
.excluded_parsers
configuration.docs/pages/cookbooks/client-server.mdx
,docs/pages/cookbooks/local-rag.mdx
, anddocs/pages/getting-started/installation.mdx
to reflect the new configuration approach.Generated with ❤️ by ellipsis.dev