Upgrade pipelines with worker job#574
Conversation
|
I think this would work immediately at least to ease the building process. Is the idea eventually that this endpoint is modified or a tool is put in place to specify the updates externally instead of relying on the two statically defined internal shell scripts? |
|
You make a good point. There's no real reason to offload this to a bash script. Might as well allow for more granular updates. |
|
I haven't really wrapped my head around exactly what I wanted, but this may overlap a bit with #557. |
|
I've come up with a possible solution. It's been evading me all day. I have a couple of goals here:
In addition, we've been doing some dumb things with pipelines from the start
I would like to instead have a
For now, the schema of that config can be the same as it is now, but it could easily change to be your schema described in #557. One issue is that we need to keep downloaded addons in a separate directory from the viame install tree. It's a bad idea to mix a container image directory tree (ephemeral) with a persistent tree, so there will have to be 2 locations to look for pipes. |
|
I like the idea of the worker and server being separate and being able to change pipelines externally without having to relaunch. There is one possible hiccup with separating pipelines into different folders and that is some have what I think are relative cascading dependencies on the
|
|
Thanks for pointing that out. For the short term, we can continue using the default pipeline location. It means we can never rm -rf to get rid of old pipes while running (would require a container restart) but it's something. Hopefully Matt can help us resolve this in the future. |
@BryonLewis what do you think about something like this?
Naturally the lack of worker management means other jobs couldn't be allowed to run while this was running, or you'd get pipe missing errors. I don't think stalling the queue using celery controller is reliable.
I think just adding a
dive.pipelines_stalledbool setting while the job runs such that trying to start a pipeline while it's true throws a 409 conflict, and then requiring an empty queue and idle runners as a pre-condition to starting the upgrade.It's sort of a coarse lock, but I think it's easier to understand than trying to drain the queue and stall jobs using celery controller.