Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DL models as serverless functions #1767

Merged
merged 133 commits into from
Jul 29, 2020
Merged

DL models as serverless functions #1767

merged 133 commits into from
Jul 29, 2020

Conversation

nmanovic
Copy link
Contributor

@nmanovic nmanovic commented Jun 19, 2020

Fix #796, fix #792, fix #743, fix #297, fix #296, fix #197, fix #196, fix #896, fix #910, fix #1028, fix #1832, fix #1846, fix #1551

image

Motivation and context

Before the PR CVAT has all "automatic annotation" features inside one cvat container. CUDA, OpenVINO, extra python packages. Also each "DL model" (such as Mask-RCNN, DEXTR, Faster RCNN) was implemented as a Django application with own REST API. It was very difficult to support new models.

The PR solves most of these issues. All "automatic annotation" features are serverless functions. Current implementation uses nuclio framework (https://github.com/nuclio/nuclio) to deploy and invoke them. Each such serverless function is a separate docker container which can be accessed by HTTP. lambda_manager is Django app which provides convenient
REST API to work with serverless functions:

image

It is possible to call a function directly using POST /api/v1/lambda/functions/<name> or send a request POST /api/v1/lambda/requests.

How has this been tested?

It was tested manually.

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2020 Intel Corporation
#
# SPDX-License-Identifier: MIT
  • Permissions (only users with task.change permissions can run DL models)
  • Mask RCNN via TensorFlow
  • ReID serverless function
  • Code duplication (python3, model_loader.py) for OpenVINO functions
  • Semantic segmentation for ADAS function
  • Text detection function

Next PR:

  • Images by URL in serverless functions
  • Tracker serverless function
  • Optimize serverless function invocation (measure overhead of nuclio serverless platform and submitting images as json strings, indirect call using dashboard)
  • Fix swagger documentation for lambda_manager REST API

Nikita Manovich added 30 commits April 21, 2020 16:16
GET /api/v1/lambda/functions
GET /api/v1/lambda/functions/public.dextr
- image decoding
- restart policy always for the function
@bsekachev
Copy link
Member

Excluding comments above, the PR looks good to me.

azhavoro
azhavoro previously approved these changes Jul 29, 2020
Copy link
Contributor

@azhavoro azhavoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nmanovic
Copy link
Contributor Author

I had two tasks based on the same video:

The first consists of 1 job
The second consists of 3 job (segment size 50) with enabled ZOrder
Stop frame in both cases is the same: 100

When I run openvino.omz.semantic-segmentation-adas-0001 on the first task, it works correctly
When I run the same model on the second task, progress goes to 100% and then process fails with: ZeroDivisionError: float division by zero

Screenshot from 2020-07-29 13-27-41

Can share the video with you

UPD. Checked Faster RCNN on the multi-job task and it works well

Found a typo in dataset_manager (probably old one). Fixed. Another problem was with a polygon with many points but at the same line. The area was 0.

@bsekachev
Copy link
Member

Found a typo in dataset_manager (probably old one). Fixed. Another problem was with a polygon with many points but at the same line. The area was 0.

Confirm. It has been fixed.

@rushtehrani
Copy link
Contributor

I ran into an issue where I had to pass --platform local to nuctl, example:

./nuctl deploy --project-name cvat \
  --path serverless/openvino/dextr/nuclio \
  --volume `pwd`/serverless/openvino/common:/opt/nuclio/common \
  --platform local

Otherwise, I would get this error:

Error - the server could not find the requested resource (post nuclioprojects.nuclio.io)
    /nuclio/pkg/platform/kube/platform.go:393

This may only be an issue with the latest nuctl release, but I can create a PR and update the docs accordingly if it makes sense.

@rushtehrani
Copy link
Contributor

Also, I'm not seeing a deployed model, even though the model is showing in the API response:

image

It's also showing up in Nuclio's dashboard:

image

@nmanovic
Copy link
Contributor Author

nmanovic commented Aug 4, 2020

@rushtehrani , dextr isn't showed in the CVAT models list. I don't think that it is right and probably we need to fix it in the future. Now it is filtered explicitly.

@nmanovic
Copy link
Contributor Author

nmanovic commented Aug 4, 2020

@rushtehrani , I got the same advice about --platform local from nuclio maintainers. Definitely I will fix that ASAP. Thanks!

@rushtehrani
Copy link
Contributor

dextr isn't showed in the CVAT models list. I don't think that it is right and probably we need to fix it in the future. Now it is filtered explicitly.

Got it. If that's the case, would it make sense to use another example command here that will show up in CVAT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment