Private runner support by subdavis · Pull Request #736 · Kitware/dive

subdavis · 2021-05-06T17:16:31Z

Private job runner.

Enable the runner on the jobs page.
New endpoints for autoconfiguration of the worker under new girder plugin.
New dependency for interacting with the RabbitMQ Management Plugin to configure user and permissions.
Accept user/password as worker args so end-users don't have to mess with celery credentials.

jobs page

Runner

BryonLewis · 2021-05-12T18:53:39Z

build fix worked, I'm looking at it now

…abbit-updates

subdavis · 2021-05-12T18:55:05Z

Pushed a lint fix. Those url validators aren't very easy to use.

BryonLewis

Works really well and I was easily able to get it up and running on my local machine and swap between the user/server queues for different tasks.

Only thing I think needs to be fixed in the minor Vue variable issue I have a code reference to.

These may be future changes and not part of this PR:

Indication of job type (server vs user) in the job list.
A way to unqueue user jobs and swap them back into the main queue? This can be resolved with job cancelling but there is that extra step of going to the /girder interface. This is a problem if I start a user queued job and don't have my worker up. Then decide I want to to turn of the user queue and use the server workers instead. If I don't cancel the job I get an error:

Questions:

During the setup the user should have an understanding that using a remote worker with data stored on viame.kitware.com will require downloading all of the data from viame.kitware.com. Meaning that if someone is trying to run pipelines on cloned datasets in the public collection they will be ingesting a lot and our server will be transferring a lot?
Cleaner Error Handling? :
- Wrong or incomplete /opt/noaa/viame folder? This will be greatly helped if we can tell the difference between user and server jobs.

Running command: . /opt/noaa/viame/setup_viame.sh && kwiver runner -s input:video_reader:type=vidl_ffmpeg -p /tmp/addons/extracted/configs/pipelines/detector_motion.pipe -s input:video_filename=/tmp/tmp51jo3wez/tmp62_fgrc0.mp4 -s downsampler:target_frame_rate=10 -s detector_writer:file_name=/tmp/tmpk1tb2mrk/detector_output.csv -s track_writer:file_name=/tmp/tmpk1tb2mrk/track_output.csv
RuntimeError: Pipeline exited with nonzero status code 255: INFO: Could not load default logger factory. Using built-in logger.
2021-05-12 20:03:45.819 INFO vital.modules.module_loader(66): Loading python modules
Caught unhandled kwiver::vital::vital_exception: The file does not exist: /tmp/addons/extracted/configs/pipelines/detector_motion.pipe, thrown from /viame/packages/kwiver/sprokit/src/sprokit/pipeline_util/pipeline_builder.cxx:104

  File "/home/worker/venv/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/worker/venv/lib/python3.7/site-packages/girder_worker/task.py", line 153, in __call__
    results = super(Task, self).__call__(*_t_args, **_t_kwargs)
  File "/home/worker/venv/lib/python3.7/site-packages/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/viame_girder/dive_tasks/tasks.py", line 274, in run_pipeline
    stream_subprocess(process, self, manager, process_err_file, cleanup=cleanup)
  File "/home/viame_girder/dive_tasks/utils.

The above would have me scratching my head until I figured out is a user job with a local install of VIAME which doesn't include the motion pipeline. I just know that my head wouldn't immediately jump to that and I would be thinking something is wrong with the server or /addons.

BryonLewis · 2021-05-12T19:45:15Z

client/platform/web-girder/views/Jobs.vue

+    async function setPrivateQueueEnabled(value: boolean) {
+      loading.value = true;
+      const resp = await setUsePrivateQueue(restClient.user._id, value);
+      privateQueueEnabled.value = resp.user_private_queue_enabled;


I think you need to either use a different initialization for the privateQueueEnabled or make it so that restClient refreshes the user data after setting or manually edit the restClient.user.user_private_queue_enabled to the return result (but I'm guessing that should a read-only val). I.E. If you toggle this setting go back to data and back to this page it will have the old value still stuck in restClient.user.user_private_queue_enabled instead of the new value.

subdavis · 2021-05-12T20:20:57Z

These are great comments.

A way to unqueue user jobs and swap them back into the main queue? This can be resolved with job cancelling but there is that extra step of going to the /girder interface. This is a problem if I start a user queued job and don't have my worker up. Then decide I want to to turn of the user queue and use the server workers instead.

Agree that this is an issue. It doesn't have a good solution because the only way to clear a message from a queue is to read it.

Indication of job type (server vs user) in the job list.

Agree.

During the setup the user should have an understanding that using a remote worker with data stored on viame.kitware.com will require downloading all of the data from viame.kitware.com. Meaning that if someone is trying to run pipelines on cloned datasets in the public collection they will be ingesting a lot and our server will be transferring a lot?

I'll address this in documentation.

Cleaner Error Handling?

Yes. In a perfect world, I'd actually want to hit https://viame.kitware.com/girder/api/v1/viame/pipelines and then check the contents of the pipeline folder to make sure it matches. Could even add checksums to the pipeline item objects.

For now, some catch-all about updating your VIAME installation and making sure to install addons seems appropriate.

BryonLewis

Did a second run of tests and stuff all seems to be working and it's nice having the tagging and the better error messages for missing pipelines.

Celery updates

a2f9158

subdavis force-pushed the gc-rabbit-updates branch from 505b672 to a2f9158 Compare May 8, 2021 00:53

subdavis and others added 5 commits May 7, 2021 22:22

Fix bucket notification imports

013289d

Rabbitmq user queues plugin

367b2ef

Remote runners

76bce08

WIP

04e86e2

Client changes

da0bd74

subdavis changed the title ~~Gc rabbit updates~~ Private runner support May 11, 2021

subdavis added 2 commits May 11, 2021 10:14

Merge branch 'main' into gc-rabbit-updates

e994c57

Fix linting issues

40d1d89

subdavis requested a review from BryonLewis May 11, 2021 14:39

subdavis marked this pull request as ready for review May 11, 2021 14:39

subdavis added 4 commits May 11, 2021 15:43

Remove private docker run command

77da5c5

Merge branch 'main' into gc-rabbit-updates

7f77c5e

Fix AnyUrl build issue

9001715

Merge branch 'main' into gc-rabbit-updates

ca3dae5

subdavis added 2 commits May 12, 2021 14:54

resolve linting error

5b3dd38

Merge branch 'gc-rabbit-updates' of github.com:Kitware/dive into gc-r…

14a76eb

…abbit-updates

BryonLewis requested changes May 12, 2021

View reviewed changes

subdavis added 4 commits May 12, 2021 21:01

Found several serious issues in worker

2450d01

lint

0c4e05e

Deal with timeout issues

7f7ab28

Remove unused

ebda4d7

subdavis requested a review from BryonLewis May 13, 2021 15:36

subdavis added 2 commits May 13, 2021 14:26

Finish girder worker updates

aa7b5cb

Remove hyphen

0f07b6e

BryonLewis approved these changes May 13, 2021

View reviewed changes

subdavis merged commit 1f04a84 into main May 13, 2021

subdavis deleted the gc-rabbit-updates branch May 13, 2021 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Private runner support#736

Private runner support#736
subdavis merged 20 commits intomainfrom
gc-rabbit-updates

subdavis commented May 6, 2021 •

edited

Loading

Uh oh!

BryonLewis commented May 12, 2021

Uh oh!

subdavis commented May 12, 2021

Uh oh!

BryonLewis left a comment •

edited

Loading

Uh oh!

BryonLewis May 12, 2021

Uh oh!

subdavis commented May 12, 2021

Uh oh!

BryonLewis left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

subdavis commented May 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Private job runner.

jobs page

Runner

Uh oh!

BryonLewis commented May 12, 2021

Uh oh!

subdavis commented May 12, 2021

Uh oh!

BryonLewis left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BryonLewis May 12, 2021

Choose a reason for hiding this comment

Uh oh!

subdavis commented May 12, 2021

Uh oh!

BryonLewis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

subdavis commented May 6, 2021 •

edited

Loading

BryonLewis left a comment •

edited

Loading