Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database stopped working after upgrading to CouchDB 3.2.0 and "unknown_error : badarith" was logged #3789

Closed
arnesten opened this issue Oct 15, 2021 · 5 comments
Labels

Comments

@arnesten
Copy link

Description

I tried to upgrade from CouchDB 3.1.1 to CouchDB 3.2.0, but then the database stopped working completely. The following error was logged 100s of times per minute:

[error] 2021-10-13T07:00:31.019539Z couchdb@127.0.0.1 <0.407.0> ac54fd9550 req_err(191493904) unknown_error : badarith
    [<<"fabric_util:get_db_timeout/4 L156">>,<<"fabric_util:get_db/2 L114">>,<<"fabric:get_security/2 L183">>,<<"chttpd_auth_request:db_authorization_check/1 
L110">>,<<"chttpd_auth_request:authorize_request/1 L19">>,<<"chttpd:handle_req_after_auth/2 L325">>,<<"chttpd:process_request/1 L310">>,<<"chttpd:handle_reque
st_int/1 L249">>]

When I downgraded to CouchDB 3.1.2, the database started working again and no errors were logged. I haven't had time to dig deeper yet as downgrading back to an earlier version worked. But I thought it was a good idea to post about it here in case others experience the same issue.

Your Environment

  • CouchDB version used: 3.2.0
  • Operating system and version: Ubuntu 20.04
@arnesten
Copy link
Author

If I remove the following section from local.ini it seems to be working in CouchDB 3.2.0:

[fabric]
request_timeout = infinity

Is "infinity" no longer a supported value for this setting?

@nickva
Copy link
Contributor

nickva commented Oct 15, 2021

Thanks for the report, @arnesten

It is indeed a bug. The request_timeout value from fabric in 3.2.0 is used in an arithmetic expression, while previously in 3.1.1 it was just passed to Erlang VM's receive ... after Timeout -> ... expression. The receive statement can handle the infinity atom while the arithmetic expression cannot and crashes.

It seems the infinity value was undocumented, but again so was is whole [fabric] config section. While we fix this, setting a large numeric value, or accepting the default (60000) can be workaround. Thanks again for debugging the issue and providing a stacktrace that pointed to the cause right away.

nickva added a commit that referenced this issue Oct 15, 2021
`infinity` it turns out is a valid configuration value for fabric
request_timeout. We can pass that to Erlang `receive` statement, any arithmetic
with it would fail.

To guard against the crash use the max small int value (60 bits). With enough
shards, due to the exponential nature of the algorithm, we still get a nice
progression from the minimum 100 msec all the way up to the large int value.
This case is illustrated in the test.

Issue: #3789
nickva added a commit that referenced this issue Oct 15, 2021
`infinity` it turns out is a valid configuration value for fabric
request_timeout. We can pass that to Erlang `receive` statement, any arithmetic
with it would fail.

To guard against the crash use the max small int value (60 bits). With enough
shards, due to the exponential nature of the algorithm, we still get a nice
progression from the minimum 100 msec all the way up to the large int value.
This case is illustrated in the test.

Issue: #3789
@nickva
Copy link
Contributor

nickva commented Oct 15, 2021

The fix for this was merged in 3.x #3790

@arnesten
Copy link
Author

Thank you! That was fixed quickly 👍

@nickva
Copy link
Contributor

nickva commented Oct 20, 2021

Closing the issue for now. The fix should be released in the next bugfix 3.2.1 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants