Skip to content

cmd-sign should handle lost connections gracefully #1072

@jlebon

Description

@jlebon

FCOS pipeline hit this while signing images:

+ cosa sign robosignatory --s3 fcos-builds/prod/streams/testing-devel/builds --extra-fedmsg-keys stream=testing-devel --images --gpgkeypath /etc/pki/rpm-gpg --fedmsg-conf /etc/fedora-messaging-cfg/fedmsg.toml
Successfully started consumer thread
Sending artifacts-sign request for build 31.20200121.20.1
Waiting for response from RoboSignatory
The connection to the broker was lost (ConnectionLost('Connection lost')), consumer halted; the connection should restart and consuming will resume.
Traceback (most recent call last):
  File "/usr/lib/coreos-assembler/cmd-sign", line 380, in <module>
    sys.exit(main())
  File "/usr/lib/coreos-assembler/cmd-sign", line 64, in main
    args.func(args)
  File "/usr/lib/coreos-assembler/cmd-sign", line 124, in cmd_robosignatory
    robosign_images(args, s3, cond)
  File "/usr/lib/coreos-assembler/cmd-sign", line 232, in robosign_images
    validate_response(cond)
  File "/usr/lib/coreos-assembler/cmd-sign", line 312, in validate_response
    raise Exception("Timed out waiting for RoboSignatory")
Exception: Timed out waiting for RoboSignatory

I think what happened there is the consumer didn't actually resume watching for the finished request after the ConnectionLost happened, so we timed out. Need to investigate if we're supposed to handle this in our code or if fedora-messaging itself is supposed to do this as the error implies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions