Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify HtmlSerializers #399

Open
cdalexndr opened this issue Nov 3, 2021 · 10 comments
Open

Unify HtmlSerializers #399

cdalexndr opened this issue Nov 3, 2021 · 10 comments

Comments

@cdalexndr
Copy link
Contributor

cdalexndr commented Nov 3, 2021

Currently there are multiple public HtmlSerializer-type classes: HtmlSerializerInnerOuterText, HtmlSerializerNormalizedText, HtmlSerializerVisibleText, the deprecated HtmlSerializer.

They share some logic and it's confusing to have multiple classes.
They should be merged into a single class, with options.

@rbri
Copy link
Member

rbri commented Nov 3, 2021

I made the tree out of the one because the handling of all the differences in one class was too complicated.

@cdalexndr
Copy link
Contributor Author

Here's my use case:
I need a html serializer that respects white-space (such as HtmlSerializerVisibleText) but also include invisible elements (like HtmlSerializerNormalizedText with ignoreMaskedElements_=false).

@rbri
Copy link
Member

rbri commented Nov 3, 2021

Ok, let me think about this, maybe there is some chance to work with a class hierarchy and some abstract methods.

@cdalexndr
Copy link
Contributor Author

cdalexndr commented Nov 3, 2021

HtmlSerializerNormalizedText and HtmlSerializerVisibleText seem to be similar, just that HtmlSerializerVisibleText has more logic such as css white-space handling and only prints visible text.
Also HtmlSerializerInnerOuterText seems to be a trimmed down version of HtmlSerializerVisibleText (duplicate code).

(Excluding HtmlSerializer that isdeprecated and should be removed in future.)

@cdalexndr
Copy link
Contributor Author

Did you worked on this issue? Can I have a go?

@rbri
Copy link
Member

rbri commented Nov 6, 2021 via email

@cdalexndr
Copy link
Contributor Author

cdalexndr commented Nov 6, 2021

Why are there two browser-like serialization?

  • DomNode.asNormalizedText() using HtmlSerializerNormalizedText
  • DomNode.getVisibleText() using HtmlSerializerVisibleText for selenium interop

Isn't the selenium representation sufficient?
Does it make sense to deprecate HtmlSerializerNormalizedText, and add a visible option to HtmlSerializerVisibleText to be also used to serialize all (including non-visible) text?

@rbri
Copy link
Member

rbri commented Nov 11, 2021

HtmlSerializer is gone

@rbri
Copy link
Member

rbri commented Nov 11, 2021

HtmlSerializerNormalizedText implements HtmlUnits way of serialization (more or less backward compatible to the old asText() method. I like to have this to have our own way to serialize.

HtmlSerializerVisibleText is there to try to implement the support needed to be in sync with the browsers (and all the strange things they do).

@cdalexndr
Copy link
Contributor Author

cdalexndr commented Nov 11, 2021

But, isn't the browser serialization, the real world usage, so the only correct way of doing things? Why is the custom htmlunit impl needed?
It would be simpler to keep only the real-world serializer and deprecate other ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants