Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Riak 1.1.2 RC3 race error messages in 2I queries [JIRA: RIAK-2400] #49

Closed
slfritchie opened this issue Jul 28, 2012 · 5 comments
Closed

Comments

@slfritchie
Copy link
Contributor

These two error messages pop up very occasionally in Riak 1.1.2 RC3, x86_64 CentOS 6. They appear to be harmless, but they're logged at error severity, which is by definition bad enough.

20:35:33.789 [error] Supervisor riak_pipe_fitting_sup had child undefined started with riak_pipe_fitting:start_link() at <0.23055.9> exit with reason noproc in context shutdown_error
20:35:39.956 [error] Unrecognized message {'DOWN',#Ref<0.0.13.243873>,process,<0.1549.10>,normal}
@slfritchie
Copy link
Contributor Author

basho_bench config to drive things:

  {operations, [
      {get_pb,           1},
      {{query_pb, 100},  1},
      {{query_pb, 1000}, 1},
        {{put_pb, 100},  1},
        {{put_pb, 1000}, 1}
  ]}.
  {driver, basho_bench_driver_2i}.

  {measurements, [
      {memory, 1000},
      {cpu, 1000},
      {processes, 1000},
      {filehandles, 1000}
  ]}.
  {measurement_driver,
    basho_bench_measurement_erlangvm}.

  {mode, {rate, 15}}.

  {concurrent, 15}.

  {duration, 15}.

  {code_paths, [
      "deps/riakc",
      "deps/protobuffs",
      "deps/mochiweb"
  ]}.

  {key_generator,
      {uniform_int, 10000000}}.

  {value_generator,
      {fixed_bin, 1000}}.

  {pb_ips, [
      {127,0,0,1}
  ]}.

  {pb_port, 8087}.

  {http_hosts, [
      "127.0.0.1"
  ]}.

  {http_port, 8098}.

  {nodes, [
          'riak@127.0.0.1'
          ]}.

  {cookie, riak}.

  {rng_seed, {31, 22, 15}}.

@beerriot
Copy link
Contributor

beerriot commented Aug 1, 2012

For the first error, about riak_pipe_fitting exiting with reason noproc, please retest with the patch attached to issue #48. I think that noproc is not coming from the fitting process, but is instead coming from the death of the input-sending process that is trying to link to the already-finished index FSM.

The second error, about the unrecognized DOWN message, is due to changes in the Protocol Buffers server. It used to simply ignore unknown messages, but as of the switch to riak_api, it instead yells about them. Since that is a harmless message, maybe lowering the severity to info in riak_api would be a good enough temporary solution (yes, messages should be prevented from leaking so easily)? See also basho/riak_kv#366

@engelsanchez
Copy link

I saw this too while running a 2i basho_bench on 1.3.1 once your concurrency setting gets on the high side (32+ at least). It is really noisy with enough load, making it hard to pay attention to what I'm trying to test. Any pointers on where to look to at least lower the severity of this message or try to understand what's causing it better?

@beerriot
Copy link
Contributor

Many thanks to @engelsanchez for providing an easily repeatable test that demonstrates this issue quite quickly.

  1. Grab the three files from this gist: https://gist.github.com/beerriot/e17024fe20aff6088feb
  2. Setup a Riak node with the app.config from the gist - the important part is using the memory backend to get 2i capabilities with better performance than leveldb, but lowering the ring size also doesn't hurt.
  3. Build a basho_bench with version 1.2.1 of the riak-erlang-client - though any version that uses MapReduce to emulate 2i queries will do.
  4. Run basho_bench with the populate.config file from the gist to preload some test data.
  5. Run basho_bench with the small_queries_only.config file from the gist, and the exit with reason noproc in context shutdown_error messages should roll out after a few seconds.

I believe I've tracked this down to a point that is "harmless". It causes log spam, which is a bother in its own right, I know, but it is not indicative of any larger lurking problem.

The problem I've found is a race in the supervisor module (running riak_pipe_fitting_sup) to change the tracking of a child (running riak_pipe_fitting) while shutting down. It's possible that we're losing an error message from riak_pipe_fitting, but the fact that another process (the riak_pipe_builder) is calling its supervisor's terminate_child function means that something else has gone wrong, and we've already decided to shut down the pipe, so a possibly missing additional error isn't a huge deal.

I've sent mail to the maintainers of the supervisor module. It includes a test demonstrating the behavior. We will find either a fix or a workaround, but for now feel safe in ignoring the message.

@bowrocker bowrocker added this to the 2.1 milestone Mar 24, 2014
@bashopatricia
Copy link

Closing, appears to be fixed.

@Basho-JIRA Basho-JIRA changed the title Riak 1.1.2 RC3 race error messages in 2I queries Riak 1.1.2 RC3 race error messages in 2I queries [JIRA: RIAK-2400] Feb 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants