Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse whitespace html entities as spaces and treat standard blockLike… #39

Closed
wants to merge 2 commits into from
Closed

Parse whitespace html entities as spaces and treat standard blockLike… #39

wants to merge 2 commits into from

Conversation

rawrmonstar
Copy link

… HTML elements as word separators when stripTags option set.

Fixes #38 by replacing the html entities with their unicode equivalent.

Also parses html block elements as word separators, e.g., <div>one</div><div>two</div> counts as two words, <span>one</span><span>two</span> still counts as just one word.

@RadLikeWhoa
Copy link
Owner

First of all, sorry for only replying to this PR this late.

  • Replacing whitespace entities is a great idea, I'll just see if there's a more concise way of writing the code.
  • I'm kind of torn on the other part of this PR. While treating HTML elements as separators seems like a good idea, I'm not sure if it should only apply to block elements. In the example you give, I'd expect both lines of code to return two words. I can see your reasoning though. I'll have to think about this some more.

@RadLikeWhoa RadLikeWhoa self-assigned this Feb 5, 2016
@rawrmonstar
Copy link
Author

No worries. The distinction for block vs inline was to make it so tags like <em> and <strong> don't break up words since I could see something like <strong>un</strong>important. Whatever you decide is cool, my fork does the job for me. Thanks for the cool project 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants