RESTful polling #158

Open
wants to merge 3 commits into
from

Projects

None yet

5 participants

@slef

Added a RestPollingAgent that (repeatedly) polls a URL and creates an event whenever the server responds.

Stefan Langerman Added EventMachine to scheduler, the continuous! option to agents and…
… the restful polling agent.
6042463
@cantino cantino and 1 other commented on an outdated diff Feb 3, 2014
bin/schedule.rb
@@ -87,9 +89,23 @@ def run!
end
end
+ active_agents = Set.new
+ EM.run do
+ EM.add_periodic_timer(10) {
+ Agent.find_each do |agent|
@cantino
cantino Feb 3, 2014

What about removing Agents that no longer exist?

@slef
slef Feb 3, 2014

Ha! Good point :) I'll fix it.

@cantino
Owner

Thanks @slef, I left some comments. Looks like a good step forward! Can you add some specs?

@cantino
Owner

How's it going?

Stefan Langerman Robustifying
Added deactivation of destroyed continuous agents in EM. Handling http
and parsing errors in RestPollingAgent.
7c70272
@slef

The update should handle the destruction of agents, and does some error handling.
I'll try to add specs soon.

@cantino cantino and 1 other commented on an outdated diff Feb 20, 2014
@@ -36,6 +36,7 @@ gem "twitter"
gem 'twitter-stream', '>=0.1.16'
gem 'em-http-request'
gem 'weibo_2'
+gem 'cobravsmongoose'
@cantino
cantino Feb 20, 2014

CobraVsMongoose hans't been updated since 2006. Can you use a different gem, or, ideally, just standard library tools or nokogiri?

@slef
slef Feb 21, 2014

Would it be better to use https://github.com/msievers/badgerfish ?
It is more recent but has less features (only does XML -> json conversion) which is ok for our purpose. I think they are both quite short and use standard libraries. What do you think?

@cantino
cantino Feb 21, 2014
@slef

I thought it would be convenient to store the entire XML info in the payload as a json object and then use another agent to reformat that info. The badgerFish convention seems to be the best way of doing that in a lossless manner, see http://wiki.open311.org/index.php?title=JSON_and_XML_Conversion
In particular in my application, I need to access attributes, which are lost in simpler solutions such as Hash.from_xml(s).to_json
If we don't want to use an already existing converter, the alternatives are:

  • rewrite one from scratch -- should take about a page of code, same as the existing gems
  • save the raw XML in the payload and create another agent to extract information from there (like formatting agent but for XML)
  • add some more options to extract the relevant information directly What do you think is best?
@cantino
Owner
@slef

Done. Btw it seems like Ox (used by Badgerfish) is faster than nokogiri. http://www.ohler.com/dev/xml_with_ruby/xml_with_ruby.html

@cantino cantino commented on the diff Feb 23, 2014
app/models/agents/rest_polling_agent.rb
+ event_description "REST event"
+
+ def default_options
+ {
+ 'url' => "http://some.polling/url",
+ 'type' => "xml"
+ }
+ end
+
+ def em_start
+ t = Time.now
+ http = EM::HttpRequest.new(options['url']).get
+ http.errback {
+ error "Failed: #{http.error}"
+ if !@destroyed
+ pp "errback again "+name
@cantino
cantino Feb 23, 2014

Should this be error?

@slef
slef Feb 24, 2014

No. This is debugging code sorry. I wanted to see when it restarts after an error (which might not need to be logged, e.g. in case of a timeout). I'll change it.

@cantino cantino commented on the diff Feb 23, 2014
app/models/agents/rest_polling_agent.rb
+ http.errback {
+ error "Failed: #{http.error}"
+ if !@destroyed
+ pp "errback again "+name
+ if Time.now - t > 0.1
+ em_start
+ else
+ EM.add_timer(1) {
+ em_start
+ }
+ end
+ end
+ }
+ http.callback {
+ if !@destroyed
+ pp "callback again"+name
@cantino
cantino Feb 23, 2014

Maybe a log here for analytics?

@slef
slef Feb 24, 2014

Good idea. Will change here and in the timeout restart.

@cantino cantino commented on the diff Feb 23, 2014
app/models/agents/rest_polling_agent.rb
+ end
+ }
+ http.callback {
+ if !@destroyed
+ pp "callback again"+name
+ puts http.response
+ if http.response_header.status == 200
+ begin
+ if options['type'] == "json"
+ pld = JSON.parse(http.response)
+ else
+ pld = Badgerfish::Parser.new.load(http.response)
+ end
+ create_event :payload => pld
+ rescue
+ error "Parsing error"
@cantino
cantino Feb 23, 2014

Please include the error message in the error, like

rescue StandardError => e
  error "Parsing error (#{e.class.to_s})): #{e.message}"
end
@cantino
cantino Feb 23, 2014

Also, do you know what exception classes occur here? Maybe we can rescue them specifically.

@slef
slef Feb 24, 2014

Ok, will add JSON::ParserError and Ox::ParserError. I think these are the only relevant ones but I'll keep the general rescue for now just in case.

@cantino cantino commented on the diff Feb 23, 2014
app/models/agents/rest_polling_agent.rb
+ if options['type'] == "json"
+ pld = JSON.parse(http.response)
+ else
+ pld = Badgerfish::Parser.new.load(http.response)
+ end
+ create_event :payload => pld
+ rescue
+ error "Parsing error"
+ end
+ else
+ error "Failed #{http.response_header.status}: #{http.response_header}"
+ end
+ if Time.now - t > 0.1
+ em_start
+ else
+ EM.add_timer(1) {
@cantino
cantino Feb 23, 2014

Do you think the wait timeout (1 second) should be configurable? Since the Agent is loaded, you could do options[:poll_delay] || 1, or something.

@slef
slef Feb 24, 2014

I was actually thinking of doing a geometric increase for the wait timeout, at least in case of errors. For successful requests, I am less sure we should add any delay at all... a web polling server will normally answer only when events happen. The main thing we should detect is if the user uses an URL that is not long-polling, thereby creating a loop. Any idea how to detect that? Or can we assume the user knows what he is doing?

@cantino
cantino Feb 25, 2014

I guess it could keep track of a rate (requests in the trailing 2 minutes) and rate limit it.

I wonder how common it will be to need to provide a since_id in the URL to the remote server. Do you have some example services that this Agent works well on?

@slef
slef Feb 25, 2014

The only service I am using this on right now is my openremote server at home, which creates an event for any change of lighting, blinds and heat in the house. I imagine there are other long polling apis out there, although it seems websockets are becoming increasingly more popular. If we create a LongPollingServerAgent as well, this could be a way to transfer events between instances of huginn.

@cantino
cantino Feb 26, 2014

Interesting. Do you know examples of any webservices that we could demonstrate this Agent on?

@slef
slef Feb 26, 2014

It seems like FriendFeed is using this for real-time updates, see http://friendfeed.com/api/documentation#realtime

@cantino
cantino Mar 1, 2014

I'm definitely down with getting it into Huginn. Once we have some specs, it should be good to go. Nice work!

@cantino cantino commented on the diff Feb 23, 2014
app/models/agents/rest_polling_agent.rb
+ em_start
+ else
+ EM.add_timer(1) {
+ em_start
+ }
+ end
+ end
+ }
+ end
+
+ def em_stop
+ @destroyed = true
+ end
+
+ def working?
+ true
@cantino
cantino Feb 23, 2014

This should be something like

event_created_within?(options['expected_update_period_in_days']) && !recent_error_logs?
@cantino cantino commented on the diff Feb 23, 2014
bin/schedule.rb
@@ -87,9 +89,30 @@ def run!
end
end
+ active_agents = Set.new
+ EM.run do
+ EM.add_periodic_timer(10) {
+ inactive_agents = Set.new(active_agents)
+ Agent.find_each do |agent|
@cantino
cantino Feb 23, 2014

I worry that this could end up being very slow for a system with many agents. Probably okay for now, but in the future we should have a way to find all Agent types that are continuous and only ask for them.

@cantino cantino commented on the diff Feb 23, 2014
bin/schedule.rb
@@ -87,9 +89,30 @@ def run!
end
end
+ active_agents = Set.new
+ EM.run do
+ EM.add_periodic_timer(10) {
@cantino
cantino Feb 23, 2014

This might be too frequent, given that it's looping all Agents.

@slef
slef Feb 24, 2014

I agree... What would be better is to have AgentsController create, update and destroy send a message to the EM loop directly, e.g. using a pipe. That way we can remove the periodic timer altogether. Any suggestions on how to do this nicely?

@cantino
cantino Feb 25, 2014

Perhaps we could check for Agents that have been created or updated since a certain date, by

Agent.where("updated_at > ?", last_check_time)

just make sure we have an index on updated_at

@cantino
Owner

Thanks @slef. Can you add some specs for your new behavior?

@slef

I'm having some problems figuring out how to write specs for the agent running inside the EM loop. A long polling server could be simulated using webmock and a Queue or an EM:Queue:

@jsonqueue = Queue.new 
stub_request(:any, 'www.jsonexample.net').to_return { |request| @jsonqueue.pop }

Then the next http request should hold until I push something on the queue.
If I do this, and em-start an agent, how do I check that the event was created?

@cantino
Owner
@alias1
Collaborator

@slef Did you ever figure out a nice way to write specs for this?

@slef

Not really... I'll try to get back to this within a couple of weeks. Any ideas?

@slef slef referenced this pull request Jun 2, 2014
Open

RESTful polling #52

@alias1
Collaborator

Unfortunately not off hand. Specs are my downfall as well :P

@cantino
Owner

@dsander or I can probably put something together. Is this way of doing EM Agents how we want to proceed? Should all agents head in the EM direction and phase out DJ?

@slef

I'm not sure but it looks like EM is a bit less flexible/accurate when it comes to scheduling tasks at specific times. But for all agents that generate events from other events, webhooks, streams, etc. (basically everything but cron jobs), I don't see why they shouldn't all be continuous!.
And if we can find a way to make scheduling accurate in EM, then we could put everything in there. Right?

@dsander
Collaborator

Like I state in #52 I am not the biggest fan of EM (limited to EM enabled libraries, no additional concurrency possible) but it is much easier to implement then using threads so I think it is ok to start with it.
I however still believe the EM loop should be run in a separate process (or thread with the threaded workers).

@cantino cantino referenced this pull request Aug 21, 2014
Open

Dropbox Agent #321

@Jngai

@slef are you still working on this? looks like its almost done minus the spec.

@slef

I have taken a break for a while. Feel free to take a shot at it if you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment