New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnauthorizedLogin error during buildbot start #3462

Open
aj062 opened this Issue Jul 22, 2017 · 5 comments

Comments

Projects
None yet
4 participants
@aj062

aj062 commented Jul 22, 2017

During buildbot start twistd.log usually have errors about invalid login from workers. It happens more frequently when there are lot of workers, and buildbot takes long time to start. However, even when buildbot start quickly, such errors can be noticed.

2017-07-21 16:22:30-0700 [-] Loading buildbot.tac...
...
2017-07-21 16:22:32-0700 [-] PBServerFactory starting on 9989
2017-07-21 16:22:32-0700 [-] Starting factory <twisted.spread.pb.PBServerFactory instance at 0x4acf830>
2017-07-21 16:22:32-0700 [-] adding 22 new builders, removing 0
2017-07-21 16:22:32-0700 [-] adding 30 new schedulers, removing 0
2017-07-21 16:22:32-0700 [Broker,0,17.219.248.51] invalid login from unknown user 'bot111'
2017-07-21 16:22:32-0700 [Broker,0,17.219.248.51] Peer will receive following PB traceback:
2017-07-21 16:22:32-0700 [Broker,0,17.219.248.51] Unhandled Error
        Traceback (most recent call last):
        Failure: twisted.cred.error.UnauthorizedLogin:
...
2017-07-21 16:22:32-0700 [-] adding 24 new workers, removing 0
2017-07-21 16:22:32-0700 [-] BuildMaster is running
@seankelly

This comment has been minimized.

Show comment
Hide comment
@seankelly

seankelly Jul 22, 2017

Member

I've noticed this too. The worker will fail to log in a few times (two or three) and then connect.

Member

seankelly commented Jul 22, 2017

I've noticed this too. The worker will fail to log in a few times (two or three) and then connect.

@tardyp

This comment has been minimized.

Show comment
Hide comment
@tardyp

tardyp Jul 22, 2017

Member

So look like a race condition between the port opening, and the final configuration of workers.

This sounds quite painful to fix while the problem is just cosmetic (spurious errors in the admin facing log),

feel free to fix, but watch out not to introduce other problems.

Member

tardyp commented Jul 22, 2017

So look like a race condition between the port opening, and the final configuration of workers.

This sounds quite painful to fix while the problem is just cosmetic (spurious errors in the admin facing log),

feel free to fix, but watch out not to introduce other problems.

@rjarry

This comment has been minimized.

Show comment
Hide comment
@rjarry

rjarry Nov 7, 2017

Contributor

I encounter the exact same problem with my setup. Note that I have a lot of workers which seems to favor this kind of race condition. And this sometimes is more than just cosmetic in the logs. The twisted.cred.error.UnauthorizedLogin error is slowing down the startup:

$ buildbot start
Following twistd.log until startup finished..
The buildmaster took more than 10 seconds to start, so we were unable to confirm
that it started correctly. Please 'tail twistd.log' and look for a line that
says 'BuildMaster is running' to verify correct startup.
$ echo $?
1

Here is an excerpt from twistd.log:

2017-11-07 17:16:58+0100 [-] Loading buildbot.tac...
...
2017-11-07 17:16:58+0100 [-] INFO: loaded 40 workers from file './hosts.yml'
...
2017-11-07 17:17:02+0100 [Broker,0,10.16.0.144] invalid login from unknown user 'mars'
2017-11-07 17:17:02+0100 [Broker,1,10.16.0.208] worker 'ascain' attaching from IPv4Address(TCP, '10.16.0.208', 48210)
2017-11-07 17:17:03+0100 [Broker,2,10.16.0.138] invalid login from unknown user 'ping'
2017-11-07 17:17:04+0100 [Broker,3,10.16.0.7] invalid login from unknown user 'winstub'
2017-11-07 17:17:04+0100 [Broker,4,10.16.0.184] invalid login from unknown user 'paques'
2017-11-07 17:17:04+0100 [Broker,5,10.16.0.189] worker 'burgos' attaching from IPv4Address(TCP, '10.16.0.189', 52686)
2017-11-07 17:17:05+0100 [Broker,6,10.16.0.45] invalid login from unknown user 'rangoon'
2017-11-07 17:17:05+0100 [Broker,7,10.16.0.144] invalid login from unknown user 'mars'
2017-11-07 17:17:05+0100 [Broker,8,10.16.0.138] invalid login from unknown user 'ping'
2017-11-07 17:17:06+0100 [Broker,9,10.16.0.7] worker 'winstub' attaching from IPv4Address(TCP, '10.16.0.7', 35702)
2017-11-07 17:17:07+0100 [Broker,10,10.16.0.108] worker 'pong' attaching from IPv4Address(TCP, '10.16.0.108', 53178)
2017-11-07 17:17:07+0100 [Broker,11,10.16.0.184] invalid login from unknown user 'paques'
2017-11-07 17:17:07+0100 [Broker,12,10.16.0.144] worker 'mars' attaching from IPv4Address(TCP, '10.16.0.144', 57258)
2017-11-07 17:17:08+0100 [Broker,13,10.16.0.138] worker 'ping' attaching from IPv4Address(TCP, '10.16.0.138', 57168)
2017-11-07 17:17:08+0100 [Broker,14,10.16.0.79] worker 'adour' attaching from IPv4Address(TCP, '10.16.0.79', 41734)
2017-11-07 17:17:09+0100 [Broker,15,10.16.0.45] invalid login from unknown user 'rangoon'
2017-11-07 17:17:09+0100 [Broker,16,10.16.0.184] worker 'paques' attaching from IPv4Address(TCP, '10.16.0.184', 50584)
2017-11-07 17:17:10+0100 [-] BuildMaster is running
...

Maybe we should look for a fix. My understanding is that all subservices are started in parallel. Could we add a "dependency" from the pb socket listener, to the workermanager ?

Or add an option to increase the startup timeout :)

Contributor

rjarry commented Nov 7, 2017

I encounter the exact same problem with my setup. Note that I have a lot of workers which seems to favor this kind of race condition. And this sometimes is more than just cosmetic in the logs. The twisted.cred.error.UnauthorizedLogin error is slowing down the startup:

$ buildbot start
Following twistd.log until startup finished..
The buildmaster took more than 10 seconds to start, so we were unable to confirm
that it started correctly. Please 'tail twistd.log' and look for a line that
says 'BuildMaster is running' to verify correct startup.
$ echo $?
1

Here is an excerpt from twistd.log:

2017-11-07 17:16:58+0100 [-] Loading buildbot.tac...
...
2017-11-07 17:16:58+0100 [-] INFO: loaded 40 workers from file './hosts.yml'
...
2017-11-07 17:17:02+0100 [Broker,0,10.16.0.144] invalid login from unknown user 'mars'
2017-11-07 17:17:02+0100 [Broker,1,10.16.0.208] worker 'ascain' attaching from IPv4Address(TCP, '10.16.0.208', 48210)
2017-11-07 17:17:03+0100 [Broker,2,10.16.0.138] invalid login from unknown user 'ping'
2017-11-07 17:17:04+0100 [Broker,3,10.16.0.7] invalid login from unknown user 'winstub'
2017-11-07 17:17:04+0100 [Broker,4,10.16.0.184] invalid login from unknown user 'paques'
2017-11-07 17:17:04+0100 [Broker,5,10.16.0.189] worker 'burgos' attaching from IPv4Address(TCP, '10.16.0.189', 52686)
2017-11-07 17:17:05+0100 [Broker,6,10.16.0.45] invalid login from unknown user 'rangoon'
2017-11-07 17:17:05+0100 [Broker,7,10.16.0.144] invalid login from unknown user 'mars'
2017-11-07 17:17:05+0100 [Broker,8,10.16.0.138] invalid login from unknown user 'ping'
2017-11-07 17:17:06+0100 [Broker,9,10.16.0.7] worker 'winstub' attaching from IPv4Address(TCP, '10.16.0.7', 35702)
2017-11-07 17:17:07+0100 [Broker,10,10.16.0.108] worker 'pong' attaching from IPv4Address(TCP, '10.16.0.108', 53178)
2017-11-07 17:17:07+0100 [Broker,11,10.16.0.184] invalid login from unknown user 'paques'
2017-11-07 17:17:07+0100 [Broker,12,10.16.0.144] worker 'mars' attaching from IPv4Address(TCP, '10.16.0.144', 57258)
2017-11-07 17:17:08+0100 [Broker,13,10.16.0.138] worker 'ping' attaching from IPv4Address(TCP, '10.16.0.138', 57168)
2017-11-07 17:17:08+0100 [Broker,14,10.16.0.79] worker 'adour' attaching from IPv4Address(TCP, '10.16.0.79', 41734)
2017-11-07 17:17:09+0100 [Broker,15,10.16.0.45] invalid login from unknown user 'rangoon'
2017-11-07 17:17:09+0100 [Broker,16,10.16.0.184] worker 'paques' attaching from IPv4Address(TCP, '10.16.0.184', 50584)
2017-11-07 17:17:10+0100 [-] BuildMaster is running
...

Maybe we should look for a fix. My understanding is that all subservices are started in parallel. Could we add a "dependency" from the pb socket listener, to the workermanager ?

Or add an option to increase the startup timeout :)

@tardyp

This comment has been minimized.

Show comment
Hide comment
@tardyp

tardyp Nov 7, 2017

Member

I don't think it is slowing down your startup. the slow startup is triggering the bug..

I think you will need to use buildbot_profiler to see what is doing the slowdown. (try to do measurment when doing a reconfig)

Member

tardyp commented Nov 7, 2017

I don't think it is slowing down your startup. the slow startup is triggering the bug..

I think you will need to use buildbot_profiler to see what is doing the slowdown. (try to do measurment when doing a reconfig)

@rjarry

This comment has been minimized.

Show comment
Hide comment
@rjarry

rjarry Nov 7, 2017

Contributor

I'll have a look tomorrow for profiling details.

Contributor

rjarry commented Nov 7, 2017

I'll have a look tomorrow for profiling details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment