Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

registration_server entered FATAL state, too many start retries too quickly #95

Closed
jcherrabi opened this issue Mar 9, 2021 · 3 comments

Comments

@jcherrabi
Copy link

Hello guys,
i just cloned the repository and ran the build command, configured the config files and started the docker image using the supplied command and i get the following error:

`ar 09 21:28:11 Reading random entropy from '/dev/urandom'

Mar 09 21:28:11 Loading '/usr/lib/x86_64-linux-gnu/pdns/libremotebackend.so'

Mar 09 21:28:11 This is a standalone pdns

Mar 09 21:28:11 Listening on controlsocket in '/run/pdns.controlsocket'

Mar 09 21:28:11 UDP server bound to 0.0.0.0:53

Mar 09 21:28:11 UDPv6 server bound to [::]:53

Mar 09 21:28:11 TCP server bound to 0.0.0.0:53

Mar 09 21:28:11 TCPv6 server bound to [::]:53

Mar 09 21:28:11 PowerDNS Authoritative Server 4.1.6 (C) 2001-2018 PowerDNS.COM BV

Mar 09 21:28:11 Using 64-bits mode. Built using gcc 8.3.0.

Mar 09 21:28:11 PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.

Mar 09 21:28:11 Could not retrieve security status update for '4.1.6-3+deb10u1.Debian' on 'auth-4.1.6-3_deb10u1.Debian.security-status.secpoll.powerdns.com.', RCODE = Non-Existent domain

Mar 09 21:28:11 Creating backend connection for TCP

Mar 09 21:28:11 About to create 3 backend threads for UDP

tunnel_socket_blocks=False; optfile_/home/user/config/pagekite.conf=ok; started=/usr/local/bin/pagekite.py; ll=0; optfile_.SELF/defaults.cfg=ok; ts=6047e86b; argv=--optfile=/home/user/config/pagekite.conf; python=2.7.16 (default, Oct 10 2019, 22:02:15) [GCC 8.3.0]; platform=linux2; version=1.5.2.201011; t=2021-03-09T21:28:11; ca_certs=/etc/ssl/certs/ca-certificates.crt; send_always_buffers=False

id=s1; ll=1; listen=:4443; ts=6047e86b; t=2021-03-09T21:28:11

Mar 09 21:28:11 Done launching threads, ready to distribute questions

2021-03-09 21:28:12,197 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2021-03-09 21:28:12,197 INFO success: pagekite entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2021-03-09 21:28:12,199 INFO spawned: 'registration_server' with pid 30

2021-03-09 21:28:12,199 INFO success: pdns entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

thread 'main' panicked at 'Invalid config file: Error { inner: ErrorInner { kind: UnquotedString, line: Some(47), col: 7, at: Some(1083), message: "", key: [] } }', src/args.rs:20:33

note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Panic in Arbiter thread.

2021-03-09 21:28:12,205 INFO exited: registration_server (exit status 101; not expected)

2021-03-09 21:28:14,209 INFO spawned: 'registration_server' with pid 31

thread 'main' panicked at 'Invalid config file: Error { inner: ErrorInner { kind: UnquotedString, line: Some(47), col: 7, at: Some(1083), message: "", key: [] } }', src/args.rs:20:33

note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Panic in Arbiter thread.

2021-03-09 21:28:14,215 INFO exited: registration_server (exit status 101; not expected)

2021-03-09 21:28:17,221 INFO spawned: 'registration_server' with pid 32

thread 'main' panicked at 'Invalid config file: Error { inner: ErrorInner { kind: UnquotedString, line: Some(47), col: 7, at: Some(1083), message: "", key: [] } }', src/args.rs:20:33

note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Panic in Arbiter thread.

2021-03-09 21:28:17,227 INFO exited: registration_server (exit status 101; not expected)

2021-03-09 21:28:18,229 INFO gave up: registration_server entered FATAL state, too many start retries too quickly`

i've tried many things but the error persists..
can anyone help or at least confirm that this build is working..

Running this on ubuntu 18.04
cheers,

@mrstegeman mrstegeman added this to To do in WebThings Cloud Mar 9, 2021
@jcherrabi
Copy link
Author

jcherrabi commented Mar 9, 2021

I pulled the available docker mysql image to test and used the included config.toml file as i noticed they were changes from the previous version (my Bad), and now i'm getting a different error as shown below: so basically it creates a new DB and starts putting out too many connections open messages until it goes into panic

PS: i'm using Azure mysql as a service.

`Creating database: mydbName

Running migration 2017-11-30-192535_create_accounts

Running migration 2017-11-30-192538_create_domains

Running migration 2018-02-26-144749_add_continent_to_domains

Running migration 2018-10-25-180818_add_opt_out_to_accounts

Running migration 2019-08-09-174849_add_mode_to_domains

/usr/lib/python2.7/dist-packages/supervisor/options.py:461: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.

'Supervisord is running as root and it is searching '

2021-03-09 22:14:09,728 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message.

2021-03-09 22:14:09,728 INFO Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing

2021-03-09 22:14:09,733 INFO RPC interface 'supervisor' initialized

2021-03-09 22:14:09,733 CRIT Server 'unix_http_server' running without any HTTP authentication checking

2021-03-09 22:14:09,733 INFO supervisord started with pid 1

2021-03-09 22:14:10,735 INFO spawned: 'cron' with pid 14

2021-03-09 22:14:10,735 INFO spawned: 'pagekite' with pid 15

2021-03-09 22:14:10,736 INFO spawned: 'registration_server' with pid 16

2021-03-09 22:14:10,737 INFO spawned: 'pdns' with pid 17

[2021-03-09T22:14:10Z INFO main] Managing the domain slackers.io

[2021-03-09T22:14:10Z INFO main] Starting HTTP server on 127.0.0.1:81

[2021-03-09T22:14:10Z INFO actix_server::builder] Starting 16 workers

[2021-03-09T22:14:10Z INFO actix_server::builder] Starting "actix-web-service-127.0.0.1:81" service on 127.0.0.1:81

Mar 09 22:14:10 Reading random entropy from '/dev/urandom'

Mar 09 22:14:10 Loading '/usr/lib/x86_64-linux-gnu/pdns/libremotebackend.so'

Mar 09 22:14:10 This is a standalone pdns

Mar 09 22:14:10 Listening on controlsocket in '/run/pdns.controlsocket'

Mar 09 22:14:10 UDP server bound to 0.0.0.0:53

Mar 09 22:14:10 UDPv6 server bound to [::]:53

Mar 09 22:14:10 TCP server bound to 0.0.0.0:53

Mar 09 22:14:10 TCPv6 server bound to [::]:53

Mar 09 22:14:10 PowerDNS Authoritative Server 4.1.6 (C) 2001-2018 PowerDNS.COM BV

Mar 09 22:14:10 Using 64-bits mode. Built using gcc 8.3.0.

Mar 09 22:14:10 PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.

Mar 09 22:14:10 Could not retrieve security status update for '4.1.6-3.Debian' on 'auth-4.1.6-3.Debian.security-status.secpoll.powerdns.com.', RCODE = Non-Existent domain

Mar 09 22:14:10 Creating backend connection for TCP

Mar 09 22:14:10 About to create 3 backend threads for UDP

tunnel_socket_blocks=False; optfile_/home/user/config/pagekite.conf=ok; started=/usr/local/bin/pagekite.py; ll=0; optfile_.SELF/defaults.cfg=ok; ts=6047f332; argv=--optfile=/home/user/config/pagekite.conf; python=2.7.16 (default, Oct 10 2019, 22:02:15) [GCC 8.3.0]; platform=linux2; version=1.5.2.201011; t=2021-03-09T22:14:10; ca_certs=/etc/ssl/certs/ca-certificates.crt; send_always_buffers=False

id=s1; ll=1; listen=:4443; ts=6047f332; t=2021-03-09T22:14:10

Mar 09 22:14:10 Done launching threads, ready to distribute questions

2021-03-09 22:14:11,905 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2021-03-09 22:14:11,905 INFO success: pagekite entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2021-03-09 22:14:11,905 INFO success: registration_server entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2021-03-09 22:14:11,905 INFO success: pdns entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

[2021-03-09T22:14:12Z ERROR r2d2] Too many connections

[2021-03-09T22:14:12Z ERROR r2d2] Too many connections

[2021-03-09T22:14:13Z ERROR r2d2] Too many connections

[2021-03-09T22:14:15Z ERROR r2d2] Too many connections

[2021-03-09T22:14:18Z ERROR r2d2] Too many connections

[2021-03-09T22:14:25Z ERROR r2d2] Too many connectionsthread 'actix-rt:worker:4' panicked at 'Unable to open database at mysql://xxxxxxx@xxxxxxx.mysql.database.azure.com:3306/xxxxxxx: Error(Some("Too many connections"))', src/database.rs:50:38

note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

thread 'actix-rt:worker:7' panicked at 'called Result::unwrap() on an Err value: "PoisonError { inner: .. }"', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-web-3.2.0/src/server.rs:282:36

thread 'actix-rt:worker:6' panicked at 'called Result::unwrap() on an Err value: "PoisonError { inner: .. }"', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-web-3.2.0/src/server.rs:282:36

thread 'actix-rt:worker:9' panicked at 'called Result::unwrap() on an Err value: "PoisonError { inner: .. }"', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-web-3.2.0/src/server.rs:282:36

thread 'actix-rt:worker:8' panicked at 'called Result::unwrap() on an Err value: "PoisonError { inner: .. }"', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-web-3.2.0/src/server.rs:282:36

thread 'actix-rt:worker:10' panicked at 'called Result::unwrap() on an Err value: "PoisonError { inner: .. }"', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-web-3.2.0/src/server.rs:282:36

thread 'actix-rt:worker:11' panicked at 'called Result::unwrap() on an Err value: "PoisonError { inner: .. }"', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-web-3.2.0/src/server.rs:282:36

thread 'actix-rt:worker:12' panicked at 'called Result::unwrap() on an Err value: "PoisonError { inner: .. }"', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-web-3.2.0/src/server.rs:282:36

thread 'actix-rt:worker:13' panicked at 'called Result::unwrap() on an Err value: "PoisonError { inner: .. }"', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-web-3.2.0/src/server.rs:282:36

thread 'actix-rt:worker:14' panicked at 'called Result::unwrap() on an Err value: "PoisonError { inner: .. }"', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-web-3.2.0/src/server.rs:282:36

thread 'actix-rt:worker:15' panicked at 'called Result::unwrap() on an Err value: "PoisonError { inner: .. }"', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/actix-web-3.2.0/src/server.rs:282:36`

@jcherrabi
Copy link
Author

jcherrabi commented Mar 13, 2021

After some digging the error turns out to be from the connection limitation set by Azure MySQL Basic Plan which is set to max_connections = 50 which you can't increase unless you move to higher plan... and the newly implemented actix_web:

the change came after the webthingsos team replaced iron with actix_web which according to teh article below it creates n workers on the httpserver where n is the number of logical cpus. For each worker, r2d2will create 10 connections. so two things happened that triggered the error and panic.

During my testing i increased the registration server capacity to 8 vCPUs to test the scaling which caused r2rd to create 800 connections and hence the error from Azure MySQL db. so long story short if you plan on increasing the registration server make sure you plan on increasing MySQL DB max_connection as well. i hope this can be helpful to others.

the article:
https://kevingalang.com/rust-connection-flooding/

Jay

@benfrancis
Copy link
Member

@jcherrabi Thank you very much for reporting this, and for sharing your solution. I'm going to close this but please feel free to re-open if you think there's anything more we can do.

WebThings Cloud automation moved this from To do to Done Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants