Skip to content

perf(readability): significantly improve transformMisusedDivsIntoParagraphs #3477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jvoisin
Copy link
Collaborator

@jvoisin jvoisin commented Jun 27, 2025

image

jvoisin added 2 commits June 27, 2025 21:18
…graphs

- Instead of materializing the whole HTML content for every `div` tag, iterate
  on the child nodes. Since goquery is doing the materialization via a string
  builder, it results in a lot of allocations/deallocations.
- Instead of using a regex, use a switch-case, as we can directly match on the
  node.Data value.
@fguillot
Copy link
Member

fguillot commented Jul 1, 2025

Need rebase

@jvoisin
Copy link
Collaborator Author

jvoisin commented Jul 1, 2025

Superseded by #3488

@jvoisin jvoisin closed this Jul 1, 2025
@jvoisin jvoisin deleted the paragraph branch July 1, 2025 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants