fix html bug #196

emrgnt-cmplxty · 2024-03-24T06:16:42Z

	🚀 This PR description was created by Ellipsis for commit `81b79ae`.

Summary:

This PR modifies the process_data function in the BasicIngestionPipeline class to expect HTML data as a bytes object, which is then encoded to 'utf-8' before parsing.

Key points:

Modified process_data function in BasicIngestionPipeline class in /r2r/pipelines/basic/ingestion.py.
Changed expected HTML data type from string to bytes.
HTML data is now encoded to 'utf-8' before being passed to _parse_html function.

Generated with ❤️ by ellipsis.dev

vercel · 2024-03-24T06:16:49Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
r2r-docs	✅ Ready (Inspect)	Visit Preview	Mar 24, 2024 6:16am

ellipsis-dev

❌ Changes requested.

Reviewed the entire pull request up to 81b79ae
Looked at 18 lines of code in 1 files
Took 30 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 0 additional comments because they didn't meet confidence threshold of 50%.

Workflow ID: wflow_Yz60K8JK54K7FpZ0

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

ellipsis-dev · 2024-03-24T06:17:22Z

r2r/pipelines/basic/ingestion.py

-            return self._parse_html(entry_data)
+            if not isinstance(entry_data, bytes):
+                raise ValueError("HTML data must be a bytes object.")
+            return self._parse_html(entry_data.encode("utf-8"))


The change from expecting a string to expecting bytes for HTML data seems unnecessary, as it is immediately encoded back to a string. This could potentially introduce bugs. Consider reverting this change.

fix html bug

81b79ae

emrgnt-cmplxty merged commit 490719a into main Mar 24, 2024
2 checks passed

ellipsis-dev bot reviewed Mar 24, 2024

View reviewed changes

emrgnt-cmplxty deleted the feature/fix-html-parse-bug branch March 24, 2024 06:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix html bug #196

fix html bug #196

emrgnt-cmplxty commented Mar 24, 2024 •

edited by ellipsis-dev bot

Loading

vercel bot commented Mar 24, 2024

ellipsis-dev bot left a comment

ellipsis-dev bot Mar 24, 2024

fix html bug #196

fix html bug #196

Conversation

emrgnt-cmplxty commented Mar 24, 2024 • edited by ellipsis-dev bot Loading

Summary:

vercel bot commented Mar 24, 2024

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Mar 24, 2024

Choose a reason for hiding this comment

emrgnt-cmplxty commented Mar 24, 2024 •

edited by ellipsis-dev bot

Loading