New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HIX value diff when page content has nested tags #2652
Conversation
Code Climate has analyzed commit 4b21da4 and detected 0 issues on this pull request. The test coverage on the diff in this pull request is 100.0% (50% is the threshold). This pull request will bring the total coverage in the repository to 82.0% (0.0% change). View more on Code Climate. |
6ac39d6
to
0b9573e
Compare
0b9573e
to
328bc5d
Compare
Added another small HIX-related fix for #2621 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much. LGTM!
a939e55
to
4b21da4
Compare
Short description
Using root.iter() was not a good choice to iterate over the paragraphs, because it goes "one-way" deep into the nested elements.
For example, if a page has the following structure:
<div><p>Some image</p><p><a href="some.image"><img src="some.image" alt=""></a></p><p> </p></div>
The iteration will proceed as follows:
And the last p-node will not be processed.
Proposed changes
Use list(root) instead of root.iter()
Side effects
No?
Resolved issues
Another follow-up to #2577
UPD: added another small HIX-related fix
Fixes: #2621
Pull Request Review Guidelines