Remove html comment tag before parsing #22

eguitarz · 2012-04-06T02:27:08Z

The commit cleans html comment tag () before parsing. For some pages like https://devcenter.heroku.com/articles/custom-domains, Readability will select content with html comments. It's not convenient to read an article when there's html tags inside that.

cantino · 2012-04-06T03:09:15Z

Thanks for your patch- I agree, removing comments is a good idea!

I'd recommend removing comments with Nokogiri instead of regular expressions. Could you try something like this?

@html.xpath('//comment()').each { |i| i.remove }

This reverts commit 835fd21.

eguitarz · 2012-04-06T04:21:05Z

Thanks for pointing out. It works!

eguitarz · 2012-04-06T04:21:58Z

And it's been pushed on my fork. You can pull if you think it's all right.

cantino · 2012-04-06T04:23:49Z

Would you be up for adding a unit test? Then I'll definitely pull it.

eguitarz · 2012-04-06T07:13:08Z

Pushed the test case.

Remove html comment tag before parsing

cantino · 2012-04-07T19:10:50Z

Thanks!

update: Will delete the HTML comments before parsing.

835fd21

eguitarz added 2 commits April 6, 2012 12:15

Revert "update: Will delete the HTML comments before parsing."

51cb479

This reverts commit 835fd21.

Remove html tags via nokogiri.

fbe7bf5

Add test case for striping html comment tags.

6acf0b6

cantino added a commit that referenced this pull request Apr 7, 2012

Merge pull request #22 from eguitarz/master

7295b3f

Remove html comment tag before parsing

cantino merged commit 7295b3f into cantino:master Apr 7, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove html comment tag before parsing #22

Remove html comment tag before parsing #22

eguitarz commented Apr 6, 2012

cantino commented Apr 6, 2012

eguitarz commented Apr 6, 2012

eguitarz commented Apr 6, 2012

cantino commented Apr 6, 2012

eguitarz commented Apr 6, 2012

cantino commented Apr 7, 2012

Remove html comment tag before parsing #22

Remove html comment tag before parsing #22

Conversation

eguitarz commented Apr 6, 2012

cantino commented Apr 6, 2012

eguitarz commented Apr 6, 2012

eguitarz commented Apr 6, 2012

cantino commented Apr 6, 2012

eguitarz commented Apr 6, 2012

cantino commented Apr 7, 2012