Skip to content

Commit

Permalink
Reimplement the clean feature using the loofah gem
Browse files Browse the repository at this point in the history
  • Loading branch information
knu committed Jun 28, 2016
1 parent 950afdf commit 848f1a5
Show file tree
Hide file tree
Showing 5 changed files with 23 additions and 4 deletions.
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ gem 'jsonpath', '~> 0.5.6'
gem 'kaminari', '~> 0.16.1'
gem 'kramdown', '~> 1.3.3'
gem 'liquid', '~> 3.0.3'
gem 'loofah', '~> 2.0'
gem 'mini_magick'
gem 'multi_xml'
gem 'nokogiri', '1.6.8'
Expand Down
1 change: 1 addition & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -642,6 +642,7 @@ DEPENDENCIES
letter_opener_web
liquid (~> 3.0.3)
listen (~> 3.0.5)
loofah (~> 2.0)
mini_magick
mqtt
multi_xml
Expand Down
14 changes: 12 additions & 2 deletions app/models/agents/rss_agent.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ class RssAgent < Agent
Options:
* `url` - The URL of the RSS feed (an array of URLs can also be used; items with identical guids across feeds will be considered duplicates).
* `clean` - Set to `true` to sanitize `description` and `content` as HTML fragments, removing unknown/unsafe elements and attributes.
* `expected_update_period_in_days` - How often you expect this RSS feed to change. If more than this amount of time passes without an update, the Agent will mark itself as not working.
* `headers` - When present, it should be a hash of headers to send with the request.
* `basic_auth` - Specify HTTP basic auth parameters: `"username:password"`, or `["username", "password"]`.
Expand All @@ -43,6 +44,7 @@ class RssAgent < Agent
def default_options
{
'expected_update_period_in_days' => "5",
'clean' => 'false',
'url' => "https://github.com/cantino/huginn/commits/master.atom"
}
end
Expand Down Expand Up @@ -357,8 +359,8 @@ def entry_data(entry)
url: entry.url,
links: entry.links,
title: entry.title,
description: description,
content: content,
description: clean_fragment(description),
content: clean_fragment(content),
image: entry.try(:image),
enclosure: entry.enclosure,
author: author,
Expand All @@ -378,5 +380,13 @@ def feed_to_events(feed)
Event.new(payload: payload_base.merge(entry_data(entry)))
}
end

def clean_fragment(fragment)
if boolify(interpolated['clean']) && fragment.present?
Loofah.scrub_fragment(fragment, :prune).to_s
else
fragment
end
end
end
end
3 changes: 1 addition & 2 deletions spec/data_fixtures/onethingwell.atom
Original file line number Diff line number Diff line change
Expand Up @@ -209,8 +209,7 @@
</item>
<item>
<title>Showgoers</title>
<description>&lt;a href="http://showgoers.tv/"&gt;Showgoers&lt;/a&gt;: &lt;blockquote&gt; &lt;p&gt;Showgoers is a Chrome browser extension to synchronize your Netflix player with someone else so that you can co-watch the same movie on different computers with no hassle. Syncing up your player is as easy as sharing a URL.&lt;/p&gt; &lt;/blockquote&gt;
</description>
<description>&lt;a href="http://showgoers.tv/" onmouseover="javascript:void(0)"&gt;Showgoers&lt;/a&gt;: &lt;blockquote&gt; &lt;p&gt;Showgoers is a Chrome browser extension to synchronize your Netflix player with someone else so that you can co-watch the same movie on different computers with no hassle. Syncing up your player is as easy as sharing a URL.&lt;/p&gt; &lt;/blockquote&gt;&lt;script&gt;some code&lt;/script&gt;</description>
<link>http://onethingwell.org/post/125509667816</link>
<guid>http://onethingwell.org/post/125509667816</guid>
<pubDate>Fri, 31 Jul 2015 13:00:13 +0100</pubDate>
Expand Down
8 changes: 8 additions & 0 deletions spec/models/agents/rss_agent_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,14 @@
expect(event.payload['enclosure']).to eq({ "url" => "http://c.1tw.org/images/2015/itsy.png", "type" => "image/png", "length" => "48249" })
expect(event.payload['image']).to eq("http://c.1tw.org/images/2015/itsy.png")
end

it "sanitizes HTML content" do
agent.options['clean'] = true
agent.check
event = agent.events.last
expect(event.payload['content']).to eq('<a href="http://showgoers.tv/">Showgoers</a>: <blockquote> <p>Showgoers is a Chrome browser extension to synchronize your Netflix player with someone else so that you can co-watch the same movie on different computers with no hassle. Syncing up your player is as easy as sharing a URL.</p> </blockquote>')
expect(event.payload['description']).to eq('<a href="http://showgoers.tv/">Showgoers</a>: <blockquote> <p>Showgoers is a Chrome browser extension to synchronize your Netflix player with someone else so that you can co-watch the same movie on different computers with no hassle. Syncing up your player is as easy as sharing a URL.</p> </blockquote>')
end
end

describe 'logging errors with the feed url' do
Expand Down

0 comments on commit 848f1a5

Please sign in to comment.