Feature/fix html parse bug #195

emrgnt-cmplxty · 2024-03-24T04:25:55Z

	🚀 This PR description was created by Ellipsis for commit `81b79ae`.

Summary:

This PR modifies the handling of HTML data in ingestion.py, updates the examples list in README.md, and adds a new guide for running a RAG pipeline.

Key points:

Modified process_data function in ingestion.py to handle HTML data as bytes object and encode it to 'utf-8'.
Updated README.md with reordered and updated examples list.
Added new guide end-to-end.mdx for running a RAG pipeline.

Generated with ❤️ by ellipsis.dev

vercel · 2024-03-24T04:26:02Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
r2r-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 24, 2024 4:27am

ellipsis-dev

👍 Looks good to me!

Reviewed the entire pull request up to 4d044c4
Looked at 119 lines of code in 3 files
Took 26 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.

1. r2r/pipelines/basic/ingestion.py:88:

Assessed confidence : 50%
Comment:
The change from expecting a string to expecting a bytes object for HTML data seems to be in order. However, it's important to ensure that this change is reflected in all parts of the codebase that interact with this function. If there are other parts of the codebase that still pass a string as HTML data to this function, this could lead to errors.
Reasoning:
The PR author has made changes to the README.md file, removing the 'pdf chat' example and renumbering the remaining examples. They have also added a new 'end-to-end' example. The changes seem to be in order and there are no obvious issues with the changes made to the README.md file. The PR author has also added a new file 'end-to-end.mdx' which provides a step-by-step guide on how to run a RAG pipeline locally with R2R. The instructions seem clear and there are no obvious issues with the content of this file. The PR author has also made changes to the 'ingestion.py' file, specifically to the 'process_data' function. They have changed the expected type of 'entry_data' for HTML entries from 'str' to 'bytes' and added an 'encode' function call when parsing HTML data. This change seems to be in order and there are no obvious issues with it.

Workflow ID: wflow_doHtN7YwAM4Bv7Su

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

ellipsis-dev

👍 Looks good to me!

Performed an incremental review on 81b79ae
Looked at 17 lines of code in 1 files
Took 46 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.

1. r2r/pipelines/basic/ingestion.py:88:

Assessed confidence : 80%
Grade: 0%
Comment:
Consider encoding the entry_data to 'utf-8' before checking if it's a bytes object. This would ensure that the data is in the correct format before further processing.

entry_data = entry_data.encode('utf-8')
if not isinstance(entry_data, bytes):
    raise ValueError("HTML data must be a bytes object.")
return self._parse_html(entry_data)

Reasoning:
The change in the PR modifies the data type for HTML from string to bytes. However, the encoding to 'utf-8' is done after the check for bytes object. This could potentially lead to issues if the data is not in 'utf-8' format. It would be better to perform the encoding before the check.

Workflow ID: wflow_FP4kpwW1aYQZaMq9

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

vercel bot deployed to Preview March 24, 2024 04:26 View deployment

ellipsis-dev bot reviewed Mar 24, 2024

View reviewed changes

fix html bug

81b79ae

emrgnt-cmplxty force-pushed the feature/fix-html-parse-bug branch from 4d044c4 to 81b79ae Compare March 24, 2024 04:27

vercel bot deployed to Preview March 24, 2024 04:27 View deployment

emrgnt-cmplxty merged commit f8de21d into main Mar 24, 2024
2 checks passed

ellipsis-dev bot reviewed Mar 24, 2024

View reviewed changes

emrgnt-cmplxty deleted the feature/fix-html-parse-bug branch March 24, 2024 06:39

iCUE-Solutions pushed a commit to DeweyLearn/DeweyLearnR2R that referenced this pull request Jul 18, 2024

fix html bug (SciPhi-AI#195)

9cb0a63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/fix html parse bug #195

Feature/fix html parse bug #195

emrgnt-cmplxty commented Mar 24, 2024 •

edited by ellipsis-dev bot

Loading

vercel bot commented Mar 24, 2024 •

edited

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

Feature/fix html parse bug #195

Feature/fix html parse bug #195

Conversation

emrgnt-cmplxty commented Mar 24, 2024 • edited by ellipsis-dev bot Loading

Summary:

vercel bot commented Mar 24, 2024 • edited Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

emrgnt-cmplxty commented Mar 24, 2024 •

edited by ellipsis-dev bot

Loading

vercel bot commented Mar 24, 2024 •

edited

Loading