Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction: Allow snooze_period to be a float #1880

Merged
merged 2 commits into from Jan 31, 2019

Conversation

Projects
None yet
4 participants
@adrienverge
Copy link
Contributor

adrienverge commented Jan 28, 2019

We are experiencing an issue similar to #1579, partly because we have
dozens of thousands of databases.

On our servers:

  • snooze_period = 0 leads to a significant CPU load,
  • snooze_period = 1 is too long for compaction to finish within 20
    hours (after 20 hours, compaction is stopped to allow other
    CPU-intensive processes to run, and when compaction restarts, it does
    not pick up where it left 4 hours earlier -- by the way proposal at
    #1775 would be really great to fix that!)

This commits allows to write snooze_period = 0.3 in configuration.

@adrienverge

This comment has been minimized.

Copy link
Contributor Author

adrienverge commented Jan 28, 2019

I don't know why tests fail, it compiles OK on my computer 🤔

@wohali

This comment has been minimized.

Copy link
Member

wohali commented Jan 28, 2019

The logs say that the dev/run script is failing to start up CouchDB successfully.

What happens when you run make check locally?

Did you perhaps break startup because the snooze_period value must be a float now? Does it work with the default value, or any value without a decimal point, at all?

@adrienverge adrienverge force-pushed the adrienverge:feat/allow-snooze-period-to-be-a-float branch 3 times, most recently from 538b375 to d593543 Jan 28, 2019

@adrienverge

This comment has been minimized.

Copy link
Contributor Author

adrienverge commented Jan 28, 2019

Hi @wohali, thanks for your guidance on that.

Unfortunately make check doesn't work for me (even on master), it says HTTP API was disabled at compile time (done with ./configure --disable-docs && make clean && make check).

But compiling and running it myself helped me finding the causes:

  • config:get_float() expects a float for the default value (but accepts an integer or a float as primary value),
  • timer:sleep() expects an integer.

I've modified this commit to fix that, and tested compiled CouchDB with different values (floats and integers): it works (I've also checked that these values are correctly read, using curl -u root:password localhost:15984/_node/_local/_config | grep -o snooze_period.:....)

@wohali

This comment has been minimized.

Copy link
Member

wohali commented Jan 28, 2019

@adrienverge Try:

make distclean
./configure -c
make check
@eiri

This comment has been minimized.

Copy link
Member

eiri commented Jan 28, 2019

Can I suggest not to change the existing config attribute, but instead add a new one - "snooze_period_ms" representing time in milliseconds (as integer) and then abstract snooze period's getter with something like this?

get_snooze_period() ->
    Default = config:get_integer("compaction_daemon", "snooze_period", 3),
    case config:get_integer("compaction_daemon", "snooze_period_ms", -1) of
        -1 -> Default * 1000;
        SnoozePeriod -> SnoozePeriod
    end.

This will allow not to wrestle with the floats, be consistent with the rest of the config where time attributes defined as integers and will not break every CouchDB deployment that happened to have that attribute customized, in case people miss the requirement to change their config during an upgrade to be a float.

As for PR itself the test couchdb_compaction_daemon_tests.erl need to be updated and ideally extended to include assertions related to this change.

@adrienverge

This comment has been minimized.

Copy link
Contributor Author

adrienverge commented Jan 28, 2019

@eiri thanks for your feedback.

I sure can try do add a new snooze_period_ms option, but to be honest it seems less intuitive to have two options for the same feature, it might confuse simple CouchDB users like me. But maybe it's more "Erlang-y" to do like this? (Please note I'm very unfamiliar with Erlang.)

In my opinion, accepting both integers and floats for snooze_period could do the job elegantly, without needing users to upgrade their configuration.
@eiri @wohali what do you think?
If you prefer the snooze_period + snooze_period_ms option, I'll try to implement it.

@eiri

This comment has been minimized.

Copy link
Member

eiri commented Jan 28, 2019

fwiw this code right now is not supporting both integers and floats, but ignores snooze_period in integer in preference for the float's default, because this is how config:get_float/3 works:

%% Can read customized integer
(node1@127.0.0.1)3> config:get_integer("compaction_daemon", "snooze_period", 5).
1
%% but ignoring it when trying to read it as a float
(node1@127.0.0.1)2> config:get_float("compaction_daemon", "snooze_period", 5.0).
5.0

Hence my note that it'd be good to have tests for this change.

@adrienverge

This comment has been minimized.

Copy link
Contributor Author

adrienverge commented Jan 28, 2019

@eiri thanks for your feedback; my goal was to support both integers and floats.

I'm trying to do my best to implement a simple change and try to make CouchDB better, but please take into account that I don't know anything about Erlang. I spent a few hours and I'm still struggling. That's why constructive help (like your code snippets) is welcome!

Do you prefer the snooze_period + snooze_period_ms option, or a unique snooze_period that would accept both integers and floats? (if feasible with the config lib?)
Any advice on the best way to implement it?

@wohali

This comment has been minimized.

Copy link
Member

wohali commented Jan 28, 2019

@adrienverge I agree that the best approach would be moving to snooze_period_ms here, and removing snooze_period from our default.ini file at the same time. If snooze_period is present in a user's local.ini file, that can be supported as a fallback configuration parameter.

This would be consistent with a lot of our other config params which are specified in ms, rather than as a float.

There's some precedent for "switch to a new configuration way, support old config approach if still exists" - see what Jan did here: #1602 and specifically https://github.com/apache/couchdb/pull/1602/files#diff-083efc44719b2a6e73526a2de5969175R394 or https://github.com/apache/couchdb/pull/1602/files#diff-b8549b33b7019d19458ba8ee619f1373R60

@adrienverge adrienverge force-pushed the adrienverge:feat/allow-snooze-period-to-be-a-float branch 2 times, most recently from d20cd65 to 9bc171d Jan 29, 2019

@adrienverge

This comment has been minimized.

Copy link
Contributor Author

adrienverge commented Jan 29, 2019

@wohali thanks a lot for your explanations: a "smooth migration" from snooze_period to snooze_period_ms makes sense.

→ I introduced a get_snooze_period() function as suggested, and changed the commit message and documentation to reflect this "migration" to the new config option snooze_period_ms.

About tests, I tried to write some, but:

  • Current tests in couch_compaction_daemon.erl seem to test runtime behavior of the compaction daemon -- it looks hard to test that a process waits 0, 3 or 10 seconds between each database compaction.
  • I'd like to write a unit (non-runtime) test, that would check the output of get_snooze_period() depending on various config inputs. But this function is not exported in couch/include/couch_db.hrl, so I can't seem to be able to call it from a test.

Should I keep trying to write a test? If yes, is there a preferred approach?

@eiri

This comment has been minimized.

Copy link
Member

eiri commented Jan 29, 2019

But maybe it's more "Erlang-y" to do like this?

@adrienverge: It's more of "implicit vs explicit config" type of thing, neither approach necessary better or worse and you have a fair point about two attributes controlling the same parameter been confusing. Historically times in Couch's config kept as ints and @wohali idea about deprecating old attribute should take care of the config's ambiguity, so I appreciate you going along with my suggestion.

Now about the tests. Erlang way of dealing with testing of not exported functions is to add unit tests in the same module within -ifdef(TEST). ... -endif. block, there is one at the bottom of couch_compaction_daemon.erl.

However the unit testing in Erlang is notorious for its complexity and writing tests around config is even more tricky, because config module need to be mocked. I wrote a basic suite for get_snooze_period/0 feel free to use and modify as necessary.

get_snooze_period_test_() ->
    {
        foreach,
        fun() ->
            meck:new(config, [passthrough])
        end,
        fun(_) ->
            meck:unload()
        end,
        [
            {"should return default value without config attributes",
            fun should_default_without_config/0},
            {"should respect old config attribute",
            fun should_respect_old_config/0},
            {"should respect old config set to zero",
            fun should_respect_old_config_zero/0},
            {"should respect new config attribute",
            fun should_respect_new_config/0},
            {"should respect new config set to zero",
            fun should_respect_new_config_zero/0}
        ]
    }.

should_default_without_config() ->
    ?assertEqual(3000, get_snooze_period()).

should_respect_old_config() ->
    meck:expect(config, get_integer, fun
        ("compaction_daemon", "snooze_period", _) -> 1;
        (_, _, Default) -> Default
    end),
    ?assertEqual(1000, get_snooze_period()).

should_respect_old_config_zero() ->
    meck:expect(config, get_integer, fun
        ("compaction_daemon", "snooze_period", _) -> 0;
        (_, _, Default) -> Default
    end),
    ?assertEqual(0, get_snooze_period()).

should_respect_new_config() ->
    meck:expect(config, get_integer, fun
        ("compaction_daemon", "snooze_period", _) -> 1;
        ("compaction_daemon", "snooze_period_ms", _) -> 300;
        (_, _, Default) -> Default
    end),
    ?assertEqual(300, get_snooze_period()).

should_respect_new_config_zero() ->
    meck:expect(config, get_integer, fun
        ("compaction_daemon", "snooze_period", _) -> 1;
        ("compaction_daemon", "snooze_period_ms", _) -> 0;
        (_, _, Default) -> Default
    end),
    ?assertEqual(0, get_snooze_period()).

As for couchdb_compaction_daemon_tests.erl I think we can keep it as it is, just changing to use "snooze_period_ms" for consistency's sake.

@adrienverge adrienverge force-pushed the adrienverge:feat/allow-snooze-period-to-be-a-float branch from 9bc171d to c3eb760 Jan 29, 2019

@adrienverge

This comment has been minimized.

Copy link
Contributor Author

adrienverge commented Jan 29, 2019

Amazing (this way of writing tests is similar to Rust actually).

I took your tests as is, because they do exactly what needs to be done (no config / only snooze_period / snooze_period_ms precedence over snooze_period). Thanks!

Pull request updated (and rebased on latest master).

Compaction: Add snooze_period_ms for finer tuning
We are experiencing an issue similar to #1579, partly because we have
dozens of thousands of databases.

On our servers:
- `snooze_period = 0` leads to a significant CPU load,
- `snooze_period = 1` is too long for compaction to finish within 20
  hours (after 20 hours, compaction is stopped to allow other
  CPU-intensive processes to run, and when compaction restarts, it does
  not pick up where it left 4 hours earlier -- by the way proposal at
  #1775 would be really great to fix that!)

This commit introduces a new option `snooze_period_ms` (measured in
milliseconds), and deprecates `snooze_period` while still supporting it
for obvious legacy reasons.

@adrienverge adrienverge force-pushed the adrienverge:feat/allow-snooze-period-to-be-a-float branch from c3eb760 to 0447a04 Jan 30, 2019

adrienverge added a commit to adrienverge/copr-couchdb that referenced this pull request Jan 30, 2019

@eiri

This comment has been minimized.

Copy link
Member

eiri commented Jan 30, 2019

lgtm.

@adrienverge can you please rebase?

@wohali are you all right with merging this?

@janl

janl approved these changes Jan 30, 2019

@janl

This comment has been minimized.

Copy link
Member

janl commented Jan 30, 2019

Nicely done! Great first contribution @adrienverge!

@wohali

wohali approved these changes Jan 30, 2019

Copy link
Member

wohali left a comment

Very nicely done @adrienverge !

@wohali wohali merged commit 5471694 into apache:master Jan 31, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@adrienverge

This comment has been minimized.

Copy link
Contributor Author

adrienverge commented Jan 31, 2019

Thanks for your help on this!

For info, I tried to update docs accordingly: apache/couchdb-documentation#386

janl added a commit that referenced this pull request Feb 7, 2019

Compaction: Add snooze_period_ms for finer tuning (#1880)
This commit introduces a new option `snooze_period_ms` (measured in
milliseconds), and deprecates `snooze_period` while still supporting it
for obvious legacy reasons.

@wohali wohali referenced this pull request Feb 7, 2019

Merged

2.3.1 Release Proposal #1908

janl added a commit that referenced this pull request Feb 17, 2019

Compaction: Add snooze_period_ms for finer tuning (#1880)
This commit introduces a new option `snooze_period_ms` (measured in
milliseconds), and deprecates `snooze_period` while still supporting it
for obvious legacy reasons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.