Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove html comment tag before parsing #22

Merged
merged 4 commits into from Apr 7, 2012
Merged

Conversation

eguitarz
Copy link
Contributor

@eguitarz eguitarz commented Apr 6, 2012

The commit cleans html comment tag () before parsing. For some pages like https://devcenter.heroku.com/articles/custom-domains, Readability will select content with html comments. It's not convenient to read an article when there's html tags inside that.

@cantino
Copy link
Owner

cantino commented Apr 6, 2012

Thanks for your patch- I agree, removing comments is a good idea!

I'd recommend removing comments with Nokogiri instead of regular expressions. Could you try something like this?

@html.xpath('//comment()').each { |i| i.remove }

@eguitarz
Copy link
Contributor Author

eguitarz commented Apr 6, 2012

Thanks for pointing out. It works!

@eguitarz
Copy link
Contributor Author

eguitarz commented Apr 6, 2012

And it's been pushed on my fork. You can pull if you think it's all right.

@cantino
Copy link
Owner

cantino commented Apr 6, 2012

Would you be up for adding a unit test? Then I'll definitely pull it.

@eguitarz
Copy link
Contributor Author

eguitarz commented Apr 6, 2012

Pushed the test case.

cantino added a commit that referenced this pull request Apr 7, 2012
Remove html comment tag before parsing
@cantino cantino merged commit 7295b3f into cantino:master Apr 7, 2012
@cantino
Copy link
Owner

cantino commented Apr 7, 2012

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants