ezmobius / nanite
- Source
- Commits
- Network (63)
- Issues (8)
- Downloads (0)
- Wiki (8)
- Graphs
-
Branch:
master
-
Example setup is out of date for recent RabbitMQ versions
3 comments Created 8 months ago by prbWith RabbitMQ 1.5.4, some of the commands (permissioning) in the
examples/rabbitconf.rbfail. These failures don't affect the functioning of the examples, but they don't exactly inspire user confidence, either...There are also some places in the examples where hardcoded paths are still in place, e.g.,
examples/crew.rb.Comments
-
Problem with example section "Test Nanite (finally)"
2 comments Created 8 months ago by masonleeLooks like the file "nanite/examples/cli.rb" got moved to "nanite/examples/simpleagent/cli.rb". This causes two problems:
First, it breaks a path in cli.rb:
"require File.dirname(FILE) + '/../lib/nanite'" should become "require File.dirname(FILE) + '/../../lib/nanite'"
Second, the tutorial code on the main webpage in the section "Test Nanite (finally)" needs to change from "cd examples; ./cli.rb;" to "cd examples/simpleagent; ./cli.rb"
Comments
kennethkalmer
Tue Apr 28 14:21:51 -0700 2009
| link
Updated in my branch and applied by ezmobius. Could be closed now
-
Getting a weird SSL error with ruby 1.9r129 on osx - any help / ideas would be appreciated!
$ rake spec /usr/local/lib/ruby19/site_ruby/1.9.1/openssl/ssl.rb:31: [BUG] Bus Error ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-darwin9.7.0] -- control frame ---------- c:0029 p:---- s:0080 b:0080 l:000079 d:000079 CFUNC :initialize c:0028 p:---- s:0078 b:0078 l:000077 d:000077 CFUNC :new c:0027 p:0063 s:0075 b:0075 l:000074 d:000074 CLASS /usr/local/lib/ruby19/site_ruby/1.9.1/openssl/ssl.rb:31 c:0026 p:0011 s:0073 b:0073 l:000072 d:000072 CLASS /usr/local/lib/ruby19/site_ruby/1.9.1/openssl/ssl.rb:23 c:0025 p:0011 s:0071 b:0071 l:000070 d:000070 CLASS /usr/local/lib/ruby19/site_ruby/1.9.1/openssl/ssl.rb:22 c:0024 p:0045 s:0069 b:0069 l:000068 d:000068 TOP /usr/local/lib/ruby19/site_ruby/1.9.1/openssl/ssl.rb:21 c:0023 p:---- s:0067 b:0067 l:000066 d:000066 FINISH c:0022 p:---- s:0065 b:0065 l:000064 d:000064 CFUNC :require c:0021 p:0059 s:0061 b:0061 l:000060 d:000060 TOP /usr/local/lib/ruby19/site_ruby/1.9.1/openssl.rb:22 c:0020 p:---- s:0059 b:0059 l:000058 d:000058 FINISH c:0019 p:---- s:0057 b:0057 l:000056 d:000056 CFUNC :require c:0018 p:0083 s:0053 b:0053 l:000052 d:000052 TOP /Users/ippy04/Code/Samples/nanite/lib/nanite.rb:7 c:0017 p:---- s:0051 b:0051 l:000050 d:000050 FINISH c:0016 p:---- s:0049 b:0049 l:000048 d:000048 CFUNC :require c:0015 p:0084 s:0045 b:0045 l:000044 d:000044 TOP /Users/ippy04/Code/Samples/nanite/spec/spec_helper.rb:6 c:0014 p:---- s:0043 b:0043 l:000042 d:000042 FINISH c:0013 p:---- s:0041 b:0041 l:000040 d:000040 CFUNC :require c:0012 p:0039 s:0037 b:0037 l:000036 d:000036 TOP /Users/ippy04/Code/Samples/nanite/spec/actor_registry_spec.rb:1 c:0011 p:---- s:0035 b:0035 l:000034 d:000034 FINISH c:0010 p:---- s:0033 b:0033 l:000032 d:000032 CFUNC :load c:0009 p:0012 s:0029 b:0029 l:000020 d:000028 BLOCK /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/lib/spec/runner/example_group_runner.rb:15 c:0008 p:---- s:0026 b:0026 l:000025 d:000025 FINISH c:0007 p:---- s:0024 b:0024 l:000023 d:000023 CFUNC :each c:0006 p:0036 s:0021 b:0021 l:000020 d:000020 METHOD /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/lib/spec/runner/example_group_runner.rb:14 c:0005 p:0097 s:0017 b:0017 l:000016 d:000016 METHOD /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/lib/spec/runner/options.rb:107 c:0004 p:0068 s:0012 b:0012 l:000011 d:000011 METHOD /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/lib/spec/runner/command_line.rb:9 c:0003 p:0077 s:0007 b:0006 l:002614 d:002494 EVAL /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/bin/spec:4 c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH c:0001 p:0000 s:0002 b:0002 l:002614 d:002614 TOP :43044 --------------------------- -- Ruby level backtrace information----------------------------------------- /usr/local/lib/ruby19/site_ruby/1.9.1/openssl/ssl.rb:31:in `initialize' /usr/local/lib/ruby19/site_ruby/1.9.1/openssl/ssl.rb:31:in `new' /usr/local/lib/ruby19/site_ruby/1.9.1/openssl/ssl.rb:31:in `' /usr/local/lib/ruby19/site_ruby/1.9.1/openssl/ssl.rb:23:in `' /usr/local/lib/ruby19/site_ruby/1.9.1/openssl/ssl.rb:22:in `' /usr/local/lib/ruby19/site_ruby/1.9.1/openssl/ssl.rb:21:in `' /usr/local/lib/ruby19/site_ruby/1.9.1/openssl.rb:22:in `require' /usr/local/lib/ruby19/site_ruby/1.9.1/openssl.rb:22:in `' /Users/ippy04/Code/Samples/nanite/lib/nanite.rb:7:in `require' /Users/ippy04/Code/Samples/nanite/lib/nanite.rb:7:in `' /Users/ippy04/Code/Samples/nanite/spec/spec_helper.rb:6:in `require' /Users/ippy04/Code/Samples/nanite/spec/spec_helper.rb:6:in `' /Users/ippy04/Code/Samples/nanite/spec/actor_registry_spec.rb:1:in `require' /Users/ippy04/Code/Samples/nanite/spec/actor_registry_spec.rb:1:in `' /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/lib/spec/runner/example_group_runner.rb:15:in `load' /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/lib/spec/runner/example_group_runner.rb:15:in `block in load_files' /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/lib/spec/runner/example_group_runner.rb:14:in `each' /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/lib/spec/runner/example_group_runner.rb:14:in `load_files' /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/lib/spec/runner/options.rb:107:in `run_examples' /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/lib/spec/runner/command_line.rb:9:in `run' /usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/bin/spec:4:in `' -- C level backtrace information ------------------------------------------- 0x117042 0 ruby19 0x00117042 rb_vm_bugreport + 82 0x2c21c 1 ruby19 0x0002c21c rb_warning + 444 0x2c27b 2 ruby19 0x0002c27b rb_bug + 43 0xbd37b 3 ruby19 0x000bd37b rb_enable_interrupt + 75 0x91e0b2bb 4 libSystem.B.dylib 0x91e0b2bb _sigtramp + 43 0xffffffff 5 ??? 0xffffffff 0x0 + 4294967295 [NOTE] You may encounter a bug of Ruby interpreter. Bug reports are welcome. For details: http://www.ruby-lang.org/bugreport.html rake aborted! Command /usr/local/bin/ruby19 -I"/usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/lib" "/usr/local/lib/ruby19/gems/1.9.1/gems/rspec-1.2.7/bin/spec" "spec/actor_registry_spec.rb" "spec/actor_spec.rb" "spec/agent_spec.rb" "spec/cached_certificate_store_proxy_spec.rb" "spec/certificate_cache_spec.rb" "spec/certificate_spec.rb" "spec/cluster_spec.rb" "spec/dispatcher_spec.rb" "spec/distinguished_name_spec.rb" "spec/encrypted_document_spec.rb" "spec/job_spec.rb" "spec/local_state_spec.rb" "spec/log_spec.rb" "spec/mapper_spec.rb" "spec/packet_spec.rb" "spec/rsa_key_pair_spec.rb" "spec/secure_serializer_spec.rb" "spec/serializer_spec.rb" "spec/signature_spec.rb" "spec/static_certificate_store_spec.rb" "spec/util_spec.rb" --format specdoc --colour failed
Comments
pmamediagroup
Tue Jul 28 16:02:23 -0700 2009
| link
same problem with 1.8.7
Is this still an issue? If not I'd like to close it, since it doesn't seem to be related to Nanite?
pmamediagroup
Wed Nov 18 08:27:29 -0800 2009
| link
no, its an issue with mac
-
Bug in mapper.rb where options are passed as wrong parameter (size)
2 comments Created 4 months ago by joshwilsdonIn mapper.rb the request function looks like:
request(type, payload = '', opts = {}, &blk)and then passes the first 3 arguments in the same order to:
build_deliverable(deliverable_type, type, payload, opts)with deliverable_type == Request. Then build_deliverable passes these arguments directly to:
deliverable_type.new(type, payload, opts)but the problem is that Request has an initialize function that looks like:
initialize(type, payload, size=nil, opts={})so the options that were passed in to the original request get passed in as the 'size' parameter, which obviously doesn't work. This causes any :selector or :target to be ignored for example.
In our environment, this was causing problems when we passed the :target option because it was being ignored. Changing the build_deliverable function to make the call:
deliverable_type.new(type, payload, nil, opts)fixed the issue.
Comments
-
eventmachine not initialized: evma_send_data_to_connection (RuntimeError)
10 comments Created 2 months ago by leomayleomaySystem configurations:
ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0]
amqp (0.6.0)
eventmachine (0.12.10)
rabbitmq 1.7.0
erlang 5.6.5I follow the steps given by the "Test Nanite (finally)",
First, I created a new agent called bob, http://gist.github.com/223056
Second, I created a new mapper in another shell, http://gist.github.com/223058
Comments
Could you please install EventMachine 0.12.8 and check if that works? I'd be curious if the problem is the just recently released 0.12.10 which we haven't tried with Nanite and the AMQP library yet.
leomayleomay
Sat Oct 31 06:22:47 -0700 2009
| link
thank you, it works
Pretty odd that it'd break with a patch release of EventMachine. Thanks for testing, I'll look into it.
The solution is to use the latest AMQP (version 0.6.5) gem, it's up on gemcutter and removes a custom patch for EventMachine. I'll update the documentation accordingly.
leomayleomay
Mon Nov 02 16:45:38 -0800 2009
| link
great info, thank you, mattmatt
leomayleomay
Mon Nov 23 23:00:12 -0800 2009
| link
mattmatt, the problem is still there, environment:
amqp 0.6.5
eventmachine 0.12.10
nanite 0.4.1.13
leomayleomay
Mon Nov 23 23:07:42 -0800 2009
| link
mattmatt, since I switch my mod_rails to conservative spawning mode, it works quite well right now, but you know, I still want it work under smart mode if it's possible
Interesting, though in any way I usually recommend running mappers as separate processes anyway. Reduces that kind of pain quite considerably.
leomayleomay
Tue Nov 24 19:53:28 -0800 2009
| link
mattmatt, for now, I've extracted Nanite related code out of Rails stack, and running it on a DRB server (basically, a separate process), it's working now. Thanks for the tips.
-
prefetch option only available for mapper, not for agent
4 comments Created about 1 month ago by kingcuThe prefetch option that is a savior when you have high volumes of intense tasks is only available on mappers. The mapper doesn't really benefit from having a prefetch limit, as the mapper receives mostly pings and registrations, which can be processed really fast. It is the agent that is going to predominantly be affected, and should also have the option.
I went ahead and forked nanite, made the changes and am using it in production. This finally fixed my issue with overloading my agents to the point they were unresponsive and had to be killed. The fork is available on my github page if you are interested in bringing the change upstream.
In the fork, I simply added a configuration option for the agent, including adding the command line options for the nanite-agent script. I also documented the new config option in the init method, so aside from some review, it should be ready to go.
Comments
Cool, I'll merge it later today.
Any reason why you're checking if the prefetch method is there on the AMQP connection?
that was actually brought over from the mapper.rb method. I left it as I am assuming some versions of the AMQP gem don't have that method?
-
Nanite mapper seems to log all requests to agents as INFO:
[Sun, 15 Nov 2009 01:34:35 -0500] INFO: SEND [result] to TEST_I [Sun, 15 Nov 2009 01:34:35 -0500] INFO: RECV [result] from TEST_II
Is there any way to control this? The log files grow like crazy when every request is logged. Not sure if this is a logging issue or the way we are using it.
Comments
It is indeed a logging issue. There was a merge that changed the logging in a way that everything would go super-noisy even in log level info. I might change that to only use debug for dumping every request, since it's bothering me as well. It is pretty useful to follow the trail of requests, but I agree that it's not great.
Agreed, it would make sense to move it to DEBUG. Any chance of a quick fix? We are looking to move to prod in the next week. Thanks ;)
-
comparison of Array with Array failed (ArgumentError)
16 comments Created about 1 month ago by taazzaNot sure if this is an amqp error or a nanite error, I have posted it on amqp as well.
/vendor/gems/gems/amqp-0.6.5/lib/amqp/buffer.rb:252:in `min': comparison of Array with Array failed (ArgumentError)
from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/cluster.rb:137:in `each' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/cluster.rb:137:in `min' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/cluster.rb:137:in `least_loaded' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/cluster.rb:23:in `__send__' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/cluster.rb:23:in `targets_for' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/mapper.rb:198:in `send_request' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/mapper.rb:191:in `request' from base_prog.rb:58:in `start' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/em/timers.rb:51:in `call' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/em/timers.rb:51:in `fire' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `call' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run_machine' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run' from base_prog.rb:41:in `start' from base_prog.rb:70This happens a lot. And when it happens it continues to happen repeatedly every couple of minutes till a restart is done. Wondering if this has to do with rabbitmq/amqp or the state of the nanite.
Any thoughts would be greatly appreciated.
Thanks!Comments
When did you update/install your Nanite gem? The current version on gemcutter.org is 0.4.12, and I've never seen that happen.
On a second note, I'll see how I go with the AMQP 0.6.5 gem today, but still, I'd encourage you to update your Nanite installation.
We use gem bundler and the current version of nanite on gemcutter is 4.1.10 http://gemcutter.org/gems/nanite Where are seeing 0.4.12? Am I missing something here?
The 0.4.1.2 version is right there in the list. Version 0.4.1.10 is not the official Nanite gem. I'm afraid it's the RightScale fork and it's full of custom patches for the RightScale product and not properly tested from my point of view. Please install 0.4.1.2, and I'll talk to Ezra how that version ended up on Gemcutter.
Aah... 0.4.1.2! I was looking for 0.4.1 [12] as you had mentioned earlier.
When someone installs nanite 0.4.1.[10] gets selected by default. No worries I will give this a shot and hopefully the problem disappears!
Thanks! Pls try and get the logging issue in as well ;) You help and prompt responses have been very helpful! Thanks a bunch! Pls close both issues once you are done with the build & push.
I assuming the updated Gem will be posted on gemcutter. Thanks again!
The gem on gemcutter has been updated. Let me know if there are any problems.
No such luck. Tested it out with nanite-0.4.1.13 and after running for a few hours it runs into the same problem. Exception attached below
/home/test/v_0.1/vendor/gems/gems/amqp-0.6.5/lib/amqp/buffer.rb:252:in `min': comparison of Array with Array failed (ArgumentError)
from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:132:in `each' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:132:in `min' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:132:in `least_loaded' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:22:in `__send__' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:22:in `targets_for' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/mapper.rb:193:in `send_request' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/mapper.rb:186:in `request' from tester.rb:58:in `start' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/em/timers.rb:51:in `call' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/em/timers.rb:51:in `fire' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `call' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run_machine' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run' from tester.rb:41:in `start' from tester.rb:70Had to reboot the machine.
As for logging .. The mapper is all set, INFO issue has disappeared. But the agent still logs the request as INFO
[Sat, 21 Nov 2009 03:35:44 -0500] INFO: SEND [result] [Sat, 21 Nov 2009 03:35:44 -0500] INFO: RECV [result]
leading big log files. Pls reopen this issue. Thx
Are you using Redis as state storage?
Somehow the status of an agent comes out as an array from the state storage. It would help me to find out what's going on if you could patch the cluster.rb at line 132 to output a[1] and b[1]. Otherwise it'd get hard for me to debug. I'll have a hard look at the data coming into the state store, but it'd be easier to figure out.
I'll look into the agent logging as well, I thought I got them all.
Nope, not using Redis. Let me patch and rebuild the gem and test it out.
I will send you the logs soon. I dont understand why I have to restart the machine for the problem to disappear. Anyways, thanks for taking a look at the issue, we are out of bandwidth to contribute at the moment.
We will pitch in soon. Thanks for all your effort/help. Cheers!
I printed the candidates variable
When you start the mapper and every thing is fine Here is what gets printed.
INFO: [ARGUMENT_ERROR_PATCH] candidates ->
nanite-SMEBARUTHI
timestamp1258956831
tags
status0.0
services/masala/process/thadka/process/lao/process/test/execute/vayudooth/process/khale/process/thadayam/process
nanite-ROJA
timestamp1258956827
tags
status0.0
services/masala/process/thadka/process/lao/process/test/execute/vayudooth/process/khale
/process/thadayam/processAnd when things go wrong and array compare failed error pops up this is what gets printed
INFO: [ARGUMENT_ERROR_PATCH] candidates ->
nanite-SMEBARUTHI
timestamp1259006907
tags
statusno status [THIS SEEMS TO BE THE ISSUE - no value instead [no status] gets printed]services/masala/process/thadka/process/lao/process/test/execute/vayudooth/process/khale/process/thadayam/process
nanite-ROJA
timestamp1259006915
tags
status0.46
services/masala/process/thadka/process/lao/process/test/execute/vayudooth/process/khale/process/thadayam/processHope this helps.
Sorry for the delay on this one. The problem seems to be that your agent is incapable of executing the command uptime on the machine it's running. What operating system is it, or what happens when you fire up a small Ruby script and just put `uptime` in it? Either way, the mapper needs to be fixed to not use the status value when it's just "no status".
Matt, we are on Ubuntu 8.0.4 hardy release. When we re-fire the mapper, it runs for a while before it runs into the problem again.
This repeats till we reboot the system.
Could you try overwriting the default status proc with a debug message, so I can see what the problem might be? Would be nice to fix the root cause of this. Need to change this in the agent's init.rb file, and then watch the log file when it happens again.
status_proc = lambda do
beginparse_uptime(`uptime`)rescue
Nanite::Log.error($!) 'no status'end end
Thanks!
-
Mapper unable to send requests from Merb (mongrel adapter)
2 comments Created 5 months ago by highandwildI have started the mapper in config/init.rb but Nanite.request() simply returns false without any error message. The agent does not receive the request, either.
But it works fine if I test it with nanite-mapper !
Here's the code: http://gist.github.com/147641
ruby - 1.8.7
mongrel - 1.1.5
nanite - 0.4.1 (edge)
amqp - 0.6.0
rabbitmq - 1.5.5
Arch - x64
OS - Cent OS 5.1 (virtual box)Comments
highandwild
Thu Jul 16 01:46:32 -0700 2009
| link
Update: I have no idea how (perhaps because I downgraded to ruby 1.8.6) but it seems to finally queue the job. Now the problem is that it just queues it and it never reaches the agent ! (I'm running it inside a merb console).
Nanite.mapper.job_warden.inspect shows this:
#<Nanite::JobWarden:0x2aaaabef7790 @serializer=# <Nanite::Serializer:0x2aaaabef8208 @serializers=[Marshal, JSON, YAML]>, @jobs={"b16084315b1e87436c37493e656f6f60"=>#<Nanite::Job:0x2aaaae71b370 @results={}, @intermediate_handler=nil, @request=#<Nanite::Request:0x2aaaae71b668 @persistent=false, @reply_to="mapper-d201c7294713d6f8d4ef1daa1cf793a6", @type="/title/search_by_title", @target=nil, @token="b16084315b1e87436c37493e656f6f60", @tags=[], @selector=:least_loaded, @payload={:term=>"Harry", :id=>82523}, @from="mapper-d201c7294713d6f8d4ef1daa1cf793a6">, @intermediate_state={}, @token="b16084315b1e87436c37493e656f6f60", @targets=["nanite-67a7b8cf51fe5131cc3096a6b951331a"], @completed=nil, @pending_keys=[]>}>And yes, I am getting a hearbeat from the agents. Also, Nanite.mapper.cluster.nanites lists the nanites !
What am I doing wrong ?
highandwild
Mon Jul 20 04:03:02 -0700 2009
| link
Tested it with sinatra (mongrel), and it works perfectly. Also upgraded to amqp edge. Still Zero luck with merb :( I also ran merb in debug mode but can't find anything amiss...
-
load average / status function not updated with heartbeat messages when using Redis
1 comment Created 4 months ago by joshwilsdonWhen using redis the nanite- key gets set with the load_average when the node registers but it is not updated after that by heartbeat messages. It seems that the intention is that the heartbeat/ping sends the load average so that the load average will be put in Redis. This is not happening because the code in cluster.rb (handle_ping) is doing:
if nanite = nanites[ping.identity] nanite[:status] = ping.statusbut nanites[ping.identity] returns an anonymous Hash, so updating it here does nothing. As such this value is never sent to Redis. I have confirmed that hacking in a update_status function to the Nanite::State class (which just updates the nanite- key in redis) and then calling it in the handle_ping as:
nanites.update_status(ping.identity, ping.status)causes the value in Redis to be updated at every heartbeat. Was it the intended behavior for this function (handle_ping) to update Redis? The comment seems to indicate that is the case.
Comments
-
Supposing cli-mapper that sends a push to a client-agent:
push('/client_agent/foo', "hi")And this client-agent sends a push (or request) to a main-agent:
push('/main_agent/foo', ["bar"] )Main agent receives 2 requests instead of one. If I have for instance, 8 thins running, main agent receives 10 exaclty equal requests.
Looks like the requests are getting multiplicated by each mapper online on that time.Comments
I'm guessing you're not using Redis, correct? If you don't all mappers will get the agent's request, and therefore all of them will forward it to an agent of their own. When using Redis only one mapper will get the request and forward it. It's probably a matter of discussion if this is a bug or a feature. I guess it wouldn't hurt to enable exclusiveness on the request queue when not using Redis as well, but there may be situations where not all mappers know the appropriate agents yet to handle that request. That again wouldn't hurt too much when using an offline queue, since that'd eventually lead to the right agent getting the message.
Hi Matt, thanks for the quick reply.
Yup, I'm not using redis. Hm, lots of options:
Use Redis, because this will really be a bug here this, heh...
Try to play with a tokyo adapter (already got it running on the server), redis is key/value store, right?
Or, if it's not too much to ask, could you give me some directions on how to disable this "feature" ? and rely on offline queue.Redis is key/value, that's correct. I don't have a solution for it per se. A quickfix would be to look into cluster.rb. In setup_request_queue it's using an exclusive queue when using Redis. You could try always using that, i.e. remove the shared_state? check. The offline queue is a simple command line switch (--offline-failsafe) for the nanite-agent and a parameter for the mapper class (:offline_failsafe => true). It's just an additional sanity check which you should do anyway when you're relying on your messages being delivered.
The worst case scenario should be avoidable when using the offline queue and only having one mapper pick up the message from the request queue. Let me know if that works. Maybe it's worth considering to get that fix in.
Ops, now I realize I was pvt messaging matt. Sorry man.
Same issue with redis enabled. =/
UPDATE: I've tryed editing setup_request_queue in all ways I could imagine, same result.
Btw, got 2 specs failiing too, something about the ProxyMapper instance was not being erased...That's a bit odd, even though we had to fix some issues with Redis and internal timeouts, that solved it for us, since then only one mapper gets the request and forwards it. I'd need some more log output from the mapper logs to get a better look at what's going on.
Gosh, I'm embarassed now, There was a mapper running I didn't see (w/o redis).
It's working fine with Redis. Really sorry, need to sleep, nanite is givin me some insomnia... (and it feels great).
Thank you Matt, I owe you a (or some) beer. Just let me know when you came to Brazil.Just to confirm, removing the -#{identity} and the exclusive option of the amq.queue request, it works. Only one request and without Redis.
I'll be happy to work in a patch to make this an "option", if anyone is interested, or no better solution came to light. In the while, will keep a fork to make it easy to install it on my servers.Thanks matt, thanks all you nanite devs, you guys rock!
I've added an option to the mapper init, well, it works, will try in production this weekend.
http://github.com/nofxx/nanite/commit/7804058cf297088f063cf5d1d2695c8b15ab71a0Gonna write/fix the specs soon.. heh, sorry about the emacs whitespace cleanup too.
Wow, finally, looks like it's working now! ;)
It's only calling it once, offline_failsafe ensure that some agent will find the request. All good.Was having a weird problem with some actors that use ActiveRecord, they just stop advertising their methods. The problem was I didn't knew about single-threaded... working fine now.
I've added those stones I've found on my path to the wiki. Again, thanks!
Good to be on nanite hehe..Just an update: Albeit working flawless for weeks, on the deploy something strange happens (sometimes), the rails mappers, if I'm not wrong:
heartbeat-19018d26cdb64d27e25c55d007e73ebb 8149
heartbeat-25941f2a7e262e05e05c2349f08ff468 8153
....Heartbeats start to accumulate, until god restart RabbitMQ, than everything gets back to normal... heh weird.
But heartbeat is about to be gonne, right? heh -
I met some promblem when using nanite. I think it may caused by gem environment.
Can you tell me the gem version information which you guys running.One problem I met is.
After I start up agent and mapper using the simple-agent example which is in the source code, the agent side log stopped at :
[Fri, 06 Nov 2009 17:07:01 +0800] INFO: SEND [register] d72993a0de1aed09f42dd291e5046d3d, services: /simple/echo, /simple/time, /simple/gems, /simple/yielding, /simple/delayed, tags:And the mapper side the log stopped at:
[Fri, 06 Nov 2009 17:07:11 +0800] INFO: [setup] starting mapperafter a while the mapper down, cause this error
/opt/local/lib/ruby/gems/1.8/gems/eventmachine-0.12.8/lib/eventmachine.rb:811:in `connect_server': no connection (RuntimeError)from /opt/local/lib/ruby/gems/1.8/gems/eventmachine-0.12.8/lib/eventmachine.rb:811:in `reconnect' from /opt/local/lib/ruby/gems/1.8/gems/amqp-0.6.0/lib/amqp/client.rb:172:in `reconnect' from /opt/local/lib/ruby/gems/1.8/gems/amqp-0.6.0/lib/amqp/client.rb:85:in `call' from /opt/local/lib/ruby/gems/1.8/gems/amqp-0.6.0/lib/amqp/client.rb:85:in `unbind' from /opt/local/lib/ruby/gems/1.8/gems/eventmachine-0.12.8/lib/eventmachine.rb:995:in `call' from /opt/local/lib/ruby/gems/1.8/gems/eventmachine-0.12.8/lib/eventmachine.rb:995:in `run_deferred_callbacks' from /opt/local/lib/ruby/gems/1.8/gems/eventmachine-0.12.8/lib/eventmachine.rb:995:in `times' from /opt/local/lib/ruby/gems/1.8/gems/eventmachine-0.12.8/lib/eventmachine.rb:995:in `run_deferred_callbacks' from /opt/local/lib/ruby/gems/1.8/gems/eventmachine-0.12.8/lib/eventmachine.rb:242:in `run_machine' from /opt/local/lib/ruby/gems/1.8/gems/eventmachine-0.12.8/lib/eventmachine.rb:242:in `run' from simpleagent/cli.rb:21My environment is EventMachine 0.12.8 and amqp 0.6.0, rabbitmq 1.7.1
I just think whether it is rabbitmq permission,I try to re-install rabbitmq and run the "rabbitconf.rb" , then I the problem still.
here is the log when I run "rabbitconf.rb"Setting permissions for user "mapper" in vhost "/nanite" ...
...done. Setting permissions for user "nanite" in vhost "/nanite" ... ...done. Listing users ... guest
mapper
nanite
...done. Listing vhosts ... / /nanite ...done. Listing permissions in vhost "/nanite" ... mapper . . .
nanite . . .
...done.Comments
Can you be a bit more specific? Do you mean the installed dependencies of Nanite? Do you get a specific error message at some point while trying it out?
If so, you need either EventMachine 0.12.8 and the amqp gem < 0.6.5 or EventMachine 0.12.10 and the amqp gem = 0.6.5.
-
I have been struggling for several days with a problem in my agents. Randomly, they will stall and use 100% of the CPU. strace reveals the agents are just context switching and doing nothing:
--- SIGVTALRM (Virtual timer expired) @ 0 (0) --- rt_sigreturn(0) = 40001616
--- SIGVTALRM (Virtual timer expired) @ 0 (0) --- rt_sigreturn(0) = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) --- rt_sigreturn(0) = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) --- rt_sigreturn(0) = 1
--- SIGVTALRM (Virtual timer expired) @ 0 (0) --- rt_sigreturn(0) = 1
--- SIGVTALRM (Virtual timer expired) @ 0 (0) --- rt_sigreturn(0) = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) --- rt_sigreturn(0) = 101
--- SIGVTALRM (Virtual timer expired) @ 0 (0) --- rt_sigreturn(0) = 0I have tried everything: modified agents to use epoll rather than select, tried ruby enterprise edition and ruby1.9 (they remove the syscalls in strace, but agents still lock). I cannot discern a pattern or reason the agents lock specifically, meaning the job they lock on isn't consistent ASIDE from happening during a job that utilizes net/http to pull down some images and stitch them together.
I thought it might be an issue with calling sleep() inside the agents, but that didn't solve anything. I really have no idea where to go from here.
Pastie to my agent code: http://pastie.org/702881
Pastie to image fetch/stitch code: http://pastie.org/702895On the plus side, I'll be able to give you a quick modification to nanite that causes it to use epoll, which dropped my CPU utilization a hair while performing a large amount of jobs! Any ideas on where to start even looking from here would be appreciated, otherwise I am going to just start commenting out code until something changes (the worst way to debug!).
Comments
I wish I had any idea where to start. What's the number of messages your seeing? Please also run the agents with debug log mode so you can at least see what the last of their activities is. I'd like to get Nanite a lot more bullet-proof in that regard.
Also, what EventMachine and AMQP version are you using?
Whoops, forgot to mention the particulars:
AMQP 0.6.5
EM 0.12.10 (same happens with 0.12.8)Happens when I push through a group of 200 or so jobs, with prefetch set at 1, so only one job is on the agent at a time. It doesn't seem to stall on any particular piece of code that I can discern. Additionally, strace shows absolutely 0 activity outside of the SIGVTALRM syscalls, even though the CPU is pegged, which is beyond me. From my understanding, this means all threads have finished. This is why I think there is something odd going on with Nanite/AMQP/EM, because it's inconsistent and they think work is done before it is done.
I have been reading up on ltrace and more detailed strace use, so I'll have a bit more something to go off here soon I think.
I'll try to set up something to fire similar jobs. Maybe I can reproduce it.
No rush, super busy for next couple weeks and it's working great in production. Been running couple thousand jobs a day through it with no hiccups, so it's just an issue of pushing too much of the same job (possibly, I do batches of same jobs when reprocessing failed jobs). I am thinking it may be an issue with net/http at this point (no solid reason why) so will try em/http since EM is already loaded. I'll let you know the results and go from there.
On a possibly unrelated note: when stracing ruby processes, I noticed a TON of ENOENT exceptions for required library files as it checks through the namespaces for the rb file. Meaning for example, it will look for parser.rb in the nanite gems directory, then doesn't find it and checks in the AMQP gems directory and on down until it finds optparser in ruby core lib directories. These libraries are checked for during execution of every job and probably add a great deal to the overhead of running a pile of jobs. No idea if this is a ruby or a nanite issue at this point (ruby is my guess), but it's also on my list of things to investigate.
-
I'm working on a test for our Nanite agent, but the request initiated by Nanite.request are all async, which means, i can get nothing back from my agent before the test finishes.
Can I initiate a sync request? Thank you
Comments
-
Two questions
When a nanite agent timeouts - Is there a way to detect it with code? Also sometimes the agents come back alive after a long running task. We want to strike a balance between number of agents vs timeouts. Any suggestions/ideas?
Also is there an inbuilt way to gracefully stop agents? Something like nanite-agent --token "test" stop --force-after 60 seconds
I did read an email from mattmatt a while ago. Thoughts?
Comments
There's callbacks available for timeout, register and unregister in the mapper, the RDoc for Mapper#start has all the info available.
Planned feature for me, since we need something similar. I'd like to get some Unix-y stuff like Unicorn has into Nanite so that you can gracefully shutdown and bring up new agents in the meantime.
Regarding the first issue is this functionality available outside of the mapper? That would be ideal. How does the nanite-admin get a hold of it?
Just curious how do you deal with creating new agents to scale? Time-outs impacts that decision for us big time.
That functionality is only available in the mapper. If you need to react on it with some other means I'd say the best way is to fire off a new request to a management agent.
I am assuming management agent is a different agent that you have only for triggering off processes/agents. Could you pls elaborate? Right now we are using only the mapper & the general purpose agents to take care of tasks.
A management agent is an agent that you implement, its only difference from your other agents could be that it fires up new agents on your systems. Imagine a setup where each machine runs a management agent and a couple of working agents. You can push a task to the management agent whenever you want to fire up new working agents or to kill some of them. It's nothing Nanite offers at the moment, but I thought about it and it sure would be a nice feature to have. Until then, that'd be my idea of automating the process.
We have been exactly thinking around these lines! Would be a great nanite feature. I guess you have to get the mapper to listen to hook and fire off a request to this agent (& hope & pray that this agent is running ;)). Right now the array issue is the one thats killing us. Will have to get that fixes before we get to this one. I will keep you posted.
-
nanite does not startup in ruby1.9
5 comments Created about 1 month ago by pmamediagroupnanite wont startup in ruby1.9 without some code changes. nanite uses FileUtils which does not seem to be available by default in 1.9. I have sent a pull request with a fix to the issue but never got a response (probably with good reason). in any case, with the change nanite runs flawlessly on 1.9. we have it running on several servers without any issues once that change is made.
Comments
I'll check it out. I thought I had caught most of the Ruby 1.9 issues, but I'll look into your fork.
mattmatt: i fixed this on mine, it's 1.9 getting confused on where FileUtils is. It thinks it's a Nanite class. Putting in a "require 'fileutils'" inside nanite.rb solved the issue for me. There is probably a better place for the require, but I didn't feel like digging through and finding all recurrences of FileUtils inside Nanite, and confining the require to those.
pmamediagroup
Wed Nov 18 12:59:32 -0800 2009
| link
i think its global to 1.9? irb doesn't recognize fileutils either until it's required.. at least on our system. we put ours in nanite.rb too
pmamediagroup
Tue Nov 24 07:15:13 -0800 2009
| link
sent pull request
pmamediagroup
Tue Nov 24 07:17:24 -0800 2009
| link
btw: this only happens when clustering...





Ran into the same thing here.
summary of similarities and differnces
sudo rabbitmqctl add_vhost /nanite
sudo rabbitmqctl add_user mapper testing
sudo rabbitmqctl add_user nanite testing
Version 1.5.x
sudo rabbitmqctl map_user_vhost mapper /nanite
sudo rabbitmqctl map_user_vhost nanite /nanite
Version 1.6.x
sudo rabbitmqctl set_permissions -p /nanite mapper '.*' '.*' '.*'
sudo rabbitmqctl set_permissions -p /nanite nanite '.*' '.*' '.*'
Yeah this is unfortunate that rabbit has changed between versions. The scripts right now will run both of these settings which will look like it failed for some commands but will in the end leave you with a working rabbitmq server setup as it will run the proper command for whichever version of rabbit you have.
I'm going to close this ticket with a caveat that you can safely ignore the errors and still end up with a working system.