Document the environment in the logs; advise on bad settings. #498

Merged
merged 1 commit into from Mar 28, 2013

Projects

None yet

3 participants

@evanmcc
Contributor
evanmcc commented Mar 4, 2013

Warning users in the shell is distracting and potentially inaccurate. This commit captures a variety of system and erlang variables and puts them into the logs, along with warnings when they're too low or not right.

This is meant to address riak issue 278: basho/riak#278

An example of the output on an untuned linux VM:

2013-03-04 11:10:55.342 [info] <0.290.0>@riak_kv_app:doc_env:285 Environment and OS variables:
2013-03-04 11:10:55.350 [info] <0.272.0>@riak_core:wait_for_application:444 Wait complete for application riak_pipe (0 seconds)
2013-03-04 11:10:55.440 [info] <0.290.0> Open file limit: 4096
2013-03-04 11:10:55.442 [info] <0.290.0> Core limit: 4096
2013-03-04 11:10:55.446 [info] <0.290.0> Open file limit: 32768
2013-03-04 11:10:55.446 [info] <0.290.0> Erlang ports limit: 4096
2013-03-04 11:10:55.447 [info] <0.290.0> ETS table count limit: 8192
2013-03-04 11:10:55.448 [info] <0.290.0> Thread pool size: 64
2013-03-04 11:10:55.448 [info] <0.290.0> Generations before full sweep: 0
2013-03-04 11:10:55.451 [info] <0.290.0> Schedulers: 1 for 1 cores
2013-03-04 11:10:55.451 [warning] <0.290.0> sysctl vm.swappiness is 60, should be no more than 0)
2013-03-04 11:10:55.452 [warning] <0.290.0> sysctl net.core.wmem_default is 229376, should be at least 8388608)
2013-03-04 11:10:55.452 [warning] <0.290.0> sysctl net.core.rmem_default is 229376, should be at least 8388608)
2013-03-04 11:10:55.455 [warning] <0.290.0> sysctl net.core.wmem_max is 131071, should be at least 8388608)
2013-03-04 11:10:55.455 [warning] <0.290.0> sysctl net.core.rmem_max is 131071, should be at least 8388608)
2013-03-04 11:10:55.456 [warning] <0.290.0> sysctl net.core.netdev_max_backlog is 1000, should be at least 10000)
2013-03-04 11:10:55.459 [warning] <0.290.0> sysctl net.core.somaxconn is 128, should be at least 4000)
2013-03-04 11:10:55.459 [warning] <0.290.0> sysctl net.ipv4.tcp_max_syn_backlog is 512, should be at least 40000)
2013-03-04 11:10:55.460 [warning] <0.290.0> sysctl net.ipv4.tcp_fin_timeout is 60, should be no more than 15)
2013-03-04 11:10:55.463 [warning] <0.290.0> sysctl net.ipv4.tcp_tw_reuse is 0, should be 1)
@jrwest
Contributor
jrwest commented Mar 18, 2013

fwiw, had to rebase on master to begin testing w/o riak crashing

@jrwest
Contributor
jrwest commented Mar 18, 2013

minor nitpick w/ log output:

2013-03-18 11:13:29.105 [warning] <0.291.0> Open file limit of 2560 is low, at least 4096 is recommended
2013-03-18 11:13:29.105 [info] <0.291.0> Core limit: 2560
2013-03-18 11:13:29.106 [info] <0.291.0> Open file limit: 32768

may be a bit confusing to readers who see open file limit 2560, core limit 2560 but then open file limit 32768. perhaps we can rename things a bit?

@jrwest
Contributor
jrwest commented Mar 18, 2013

Some example output from testing on a smartos zone (Image UUID: fdea06b0-3f24-11e2-ac50-0b645575ce9d):

2013-03-18 23:31:38.190 [info] <0.303.0>@riak_kv_app:doc_env:289 Environment and OS variables:
2013-03-18 23:31:38.827 [info] <0.303.0> Open file limit: 65536
2013-03-18 23:31:38.827 [info] <0.303.0> Core limit: 65536
2013-03-18 23:31:38.827 [info] <0.303.0> Open file limit: 32768
2013-03-18 23:31:38.828 [info] <0.303.0> Erlang ports limit: 4096
2013-03-18 23:31:38.828 [info] <0.303.0> ETS table count limit: 8192
2013-03-18 23:31:38.828 [info] <0.303.0> Thread pool size: 64
2013-03-18 23:31:38.828 [info] <0.303.0> Generations before full sweep: 0
2013-03-18 23:31:38.829 [warning] <0.303.0> Running 8 schedulers for unknown cores, these should match

EDIT: also just to point out, in the case where open file limit is not below the threshold the output can also be a bit confusing.

@jrwest jrwest commented on an outdated diff Mar 18, 2013
src/riak_kv_app.erl
+
+ %% fullsweep_after
+ {fullsweep_after, GCGens} = erlang:system_info(fullsweep_after),
+ GCMsg = {info, "Generations before full sweep: ~p", [GCGens]},
+
+ %% async_threads
+ TPSMsg = case erlang:system_info(thread_pool_size) of
+ TPS1 when TPS1 < 64 ->
+ {warn,"Thread pool size of ~p is low, at least 64 "
+ "suggested", [TPS1]};
+ TPS2 ->
+ {info, "Thread pool size: ~p", [TPS2]}
+ end,
+ %% schedulers
+ Schedulers = erlang:system_info(schedulers),
+ Cores = erlang:system_info(logical_processors_available),
@jrwest
jrwest Mar 18, 2013 Contributor

would be nice to specifically handle the case where logical_processors_available returns unknown (my smartos zone seemed to return unknown for available but the correct value for logical_processors and logical_processors_online). Perhaps point to where users can find this info based on platform (e.g. psrinfo -vp)? Or maybe its worth trying to use os:cmd as a fallback?

@russelldb
Contributor

Sorry to horn in your review, one thing I wanted to say about this change when I first saw it (but I didn't want to pull the card) is that I would prefer all this code in it's own module. I see that it is used by riak_kv_app, but I don't see why it needs to be in there. It looks self contained enough to me.

@jrwest
Contributor
jrwest commented Mar 19, 2013

Was going to make a similar comment actually. +1.

Sent from my iPhone

On Mar 19, 2013, at 12:48 AM, Russell Brown notifications@github.com wrote:

Sorry to horn in your review, one thing I wanted to say about this change when I first saw it (but I didn't want to pull the card) is that I would prefer all this code in it's own module. I see that it is used by riak_kv_app, but I don't see why it needs to be in there. It looks self contained enough to me.


Reply to this email directly or view it on GitHub.

@evanmcc
Contributor
evanmcc commented Mar 19, 2013

will do

@evanmcc
Contributor
evanmcc commented Mar 19, 2013

updated.

@evanmcc evanmcc commented on the diff Mar 28, 2013
src/riak_kv_env.erl
+ {unix, freebsd} ->
+ [];
+ {unix, sunos} ->
+ [];
+ _ ->
+ [{warn, "Unknown OS type, no platform specific info", []}]
+ end,
+ lists:map(fun({F, Fmt, Args}) ->
+ lager:debug("Term: ~p", [{F, Fmt, Args}]),
+ %% fake out lager a bit here
+ F1 = case F of
+ info -> info_msg;
+ warn -> warning_msg;
+ error -> error_msg
+ end,
+ error_logger:F1("riak_kv_env: "++Fmt, Args)
@evanmcc
evanmcc Mar 28, 2013 Contributor

added some searchable text here since this trickery is confusing the lager parse transform.

@evanmcc
Contributor
evanmcc commented Mar 28, 2013

updated and squashed.

@jrwest
Contributor
jrwest commented Mar 28, 2013

👍

w/ recent changes on my laptop:

2013-03-28 10:34:22.529 [info] <0.291.0>@riak_kv_env:doc_env:42 Environment and OS variables:
2013-03-28 10:34:22.544 [info] <0.291.0> riak_kv_env: Open file limit: 4864
2013-03-28 10:34:22.545 [warning] <0.291.0> riak_kv_env: Cores are disabled, this may hinder debugging
2013-03-28 10:34:22.545 [info] <0.291.0> riak_kv_env: Erlang process limit: 32768
2013-03-28 10:34:22.546 [info] <0.291.0> riak_kv_env: Erlang ports limit: 4096
2013-03-28 10:34:22.546 [info] <0.291.0> riak_kv_env: ETS table count limit: 8192
2013-03-28 10:34:22.550 [info] <0.291.0> riak_kv_env: Thread pool size: 64
2013-03-28 10:34:22.550 [info] <0.291.0> riak_kv_env: Generations before full sweep: 0
2013-03-28 10:34:22.551 [warning] <0.291.0> riak_kv_env: Running 4 schedulers for unknown cores, these should match
2013-03-28 10:34:22.551 [warning] <0.291.0> riak_kv_env: Unknown OS type, no platform specific info
@evanmcc evanmcc merged commit 6ceb60b into master Mar 28, 2013

1 check failed

default The Travis build failed
Details
@seancribbs seancribbs deleted the pevm-doc-env branch Apr 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment