Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add a HtmlScrubber class #679

Merged
merged 4 commits into from Aug 30, 2019
Merged

Conversation

oliverklee
Copy link
Contributor

This class can remove parts of the HTML. (In the Emogrifier class, this
was solved with switches for the emogrification process.)

Fixes #651

This class can remove parts of the HTML. (In the `Emogrifier` class, this
was solved with switches for the emogrification process.)

Fixes #651
@oliverklee oliverklee added this to the 2.2.0 milestone Aug 29, 2019
@oliverklee oliverklee self-assigned this Aug 29, 2019
@oliverklee
Copy link
Contributor Author

I'll also appreciate feedback on the name of the class.

Copy link
Contributor

@JakeQZ JakeQZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look mostly fine, though I've commented on the assert in the test - you might want to consider a change there.

Regarding the name, I'm not sure. One name it should not have is HtmlMinifier as that would imply something different: removal of whitespace, renaming classes to one or two letter names, etc.; and may cause confusion. Other possibile nouns that spring to mind (or found with a thesaurus): tidier, cleaner, garbage-collector, reducer, squeezer, squasher, declutterer, streamliner, purifier, depolluter - though none of them strike me as being necessarily any better. I'll have a think...


$subject->removeInvisibleNodes();

self::assertNotRegExp('/display:\\s*none;?/', $subject->render());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst it's OK as it is, the ;? isn't necessary, and adding a possessive quantifier (\\s* -> \\s*+) would improve performance. That said, we could simply assert that HTML doesn't contain '<div', which would be a stronger assertion...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's right. I completely missed that. I'll change that in a minute.

@JakeQZ
Copy link
Contributor

JakeQZ commented Aug 30, 2019

Other possibile nouns...

A few more: de-redundantifier; pruner; eliminator.

@JakeQZ
Copy link
Contributor

JakeQZ commented Aug 30, 2019

I'll also appreciate feedback on the name of the class.

I am thinking now that HtmlPruner may be a better name. There is an analogous usage when talking about “pruning a database” – removing redundant or unwanted things – and one definition of “prune” I found is “to reduce especially by eliminating superfluous matter”, which seems very fitting.

@oliverklee
Copy link
Contributor Author

HtmlPruner sounds good. I'd like to go with that.

@oliverklee
Copy link
Contributor Author

Repushed.

Copy link
Contributor

@JakeQZ JakeQZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine except that '<div>' needs to be '<div' in the test...


$subject->removeInvisibleNodes();

self::assertNotRegExp('/display:\\s*none;?/', $subject->render());
self::assertNotContains('<div>', $subject->render());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to omit the closing > otherwise it will miss the tag with attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn, you're right. This happens when I code before having had a decent coffee.

@oliverklee oliverklee merged commit 46ba9e0 into master Aug 30, 2019
@oliverklee oliverklee deleted the feature/html-minifier branch August 30, 2019 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants