Skip to content

Commit

Permalink
Make URL normalization and absolutization on url optional
Browse files Browse the repository at this point in the history
This introduces a new option `resolve_url` turned off by default.

For backward compatibility, existing WebsiteAgents with a key named
`url` are altered to have the option turned on via migration.
  • Loading branch information
knu committed Nov 2, 2016
1 parent d766f23 commit d2f895c
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 6 deletions.
11 changes: 6 additions & 5 deletions app/models/agents/website_agent.rb
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ class WebsiteAgent < Agent
* Set the `url_from_event` option to a Liquid template to generate the url to access based on the Event. (To fetch the url in the Event's `url` key, for example, set `url_from_event` to `{{ url }}`.)
* Alternatively, set `data_from_event` to a Liquid template to use data directly without fetching any URL. (For example, set it to `{{ html }}` to use HTML contained in the `html` key of the incoming Event.)
* If you specify `merge` for the `mode` option, Huginn will retain the old payload and update it with new values.
If a created Event has a key named `url` containing a relative URL, it is automatically resolved using the request URL as base.
* If you set the `resolve_url` option to true, a relative URL in each Event's `url` key will be normalized and resolved to an absolute URL using the request URL as base. It is recommended that you use the `template` option to take full control of how to transform extracted values into an Event.
# Supported Document Types
Expand Down Expand Up @@ -116,7 +115,7 @@ class WebsiteAgent < Agent
If a `template` option is given, it is used as a Liquid template for each event created by this Agent, instead of directly emitting the results of extraction as events. In the template, keys of extracted data can be interpolated, and some additional variables are also available as explained in the next section. For example:
"template": {
"url": "{{ url }}",
"url": "{{ url | to_uri: _request_.url }}",
"title": "{{ title }}",
"description": "{{ body_text }}",
"last_modified": "{{ _response_.headers.Last-Modified | date: '%FT%T' }}"
Expand Down Expand Up @@ -404,8 +403,10 @@ def handle_data(body, url, existing_payload)
extracted
end

# url may be URI, string or nil
if (payload_url = result['url'].presence) && (url = url.presence)
# url may be a URI object, string or nil
if boolify(interpolated['resolve_url']) &&
(payload_url = result['url'].presence) &&
(url = url.presence)
begin
result['url'] = (Utils.normalize_uri(url) + Utils.normalize_uri(payload_url)).to_s
rescue URI::Error
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
class AddResolveUrlOptionToWebsiteAgent < ActiveRecord::Migration[5.0]
def up
Agents::WebsiteAgent.find_each do |agent|
keys = agent.event_keys
if keys.nil? || keys.include?('url')
agent.options['resolve_url'] = true
end
agent.save!(validate: false)
end
end

def down
Agents::WebsiteAgent.find_each do |agent|
if agent.options.delete('resolve_url')
agent.save!(validate: false)
end
end
end
end
7 changes: 6 additions & 1 deletion spec/models/agents/website_agent_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -656,13 +656,14 @@
expect(event.payload['hovertext']).to match(/^Biologists play reverse/)
end

it "should turn relative urls to absolute" do
it "should turn relative urls to absolute if resolve_url is true" do
rel_site = {
'name' => "XKCD",
'expected_update_period_in_days' => "2",
'type' => "html",
'url' => "http://xkcd.com",
'mode' => "on_change",
'resolve_url' => 'true',
'extract' => {
'url' => {'css' => "#topLeft a", 'value' => "@href"},
}
Expand Down Expand Up @@ -1169,6 +1170,7 @@
@checker.options = @valid_options.merge(
'type' => 'json',
'data_from_event' => '{{ some_object.some_data }}',
'resolve_url' => 'true',
'extract' => {
'value' => { 'path' => 'hello' },
'url' => { 'path' => 'href' },
Expand Down Expand Up @@ -1326,6 +1328,9 @@
'mode' => 'all',
'extract' => {
'url' => { 'css' => "a", 'value' => "@href" },
},
'template' => {
'url' => '{{ url | to_uri }}',
}
}
@checker = Agents::WebsiteAgent.new(:name => "ua", :options => @valid_options)
Expand Down

0 comments on commit d2f895c

Please sign in to comment.