Need a Model CI #8903

Superjomn · 2018-03-09T05:21:58Z

Model CI

The users occasionally found a negligible performance or precision difference between different Paddle versions. Though we have unit tests for each class and Travis-CI to ensures the precision of each operator, there is no any logic to ensure the model (a composition of several operators) works as reliable as the operators.

There are several conditions where an existing model will fail either in performance or precision:

the incomplete coverage test cases.
poor performance update, currently, we have no performance tracker for each operator.
API changes, developers are likely to forget to update usages in some other repos such as paddle/models.

The model-CI module is proposed to enhance the weaknesses above and track the overall performance and precision of the model granularity.

Module function

Inputs:

a compiled paddle whl of some version
some test models
historical information of the execution of the test models

Outputs:

execution information of the test models
give an alarm if there is an obvious difference

Indicators tracked

success or not of a model execution
execution time of a model
train cost trend
infer precision trend

panyx0718 · 2018-03-09T05:37:36Z

LGTM.

Just to add more details

Model Selection
Need to have good coverage. Recommend to pick 1~2 most
widely used model for each domain (NLP, vision, speech, etc).
E.g. Resnet, Seq2SeqAttention, GAN, etc.
E.g. single-device, multi-device, multi-machine
The history should be recorded and analyzable.
E.g. the result is from which commit.
Bonus feature
Automatically find which commit cause the problem.

typhoonzero · 2018-03-09T05:39:39Z

How about we build a Docker image every time run the CI, the docker image is named like paddle-regression:b825c79 with the git commit id as it's tag. The docker file also contains dataset and model python programs to run the full regression, so that we can reproduce it whenever wherever.

panyx0718 · 2018-03-09T06:12:33Z

Sounds Good. Not sure how large is a image. Maybe we can keep the last N good ones and the recent bad one

typhoonzero · 2018-03-09T06:22:47Z

Basic lower layers of the image can stay unchanged including the dataset contents. Changed files only contain whl package and python files and it's about 100M per build. We can keep a lot of history builds then.

Superjomn · 2018-03-09T12:29:28Z

how about a dev docker, clone any commitid, and run the test. So that we can test any commitid. No need to compile or store images.

The only varients is commitid.

helinwang · 2018-03-09T18:48:17Z

Thanks for this important effort!

I think it's better to test with prod image (or whl), since prod image is the image that our user uses. It is possible that some required Python dependencies is installed on dev image but not installed on prod image.

Superjomn · 2018-03-10T01:19:46Z

Agree, we have daily compiled whls, those need to be tested. @helinwang

But according to

Bonus feature
Automatically find which commit caused the problem.

the ModelCI needs to test commit ids using a binary search, so there are two input sources to the ModelCI:

any released whls
any commit ID from git

let's clarify the logic of ModelCI:

function test_whl(some_whl)
function compile_whl_from_source(path)
function clone_commit_from_repo(gitpath)
function binary_search_bad_commit(commits) {
  # ...
  source_path = clone_commit_from_repo(commit)
  whl_path = compile_whl_from_source(source_path)
  status = test_whl(whl_path)
  # ...
}


function main() {
  today_released_whl = __download(...)
  ok = test_whl(today_released_whl)
  if (!ok) {
   # raise an alarm
    today_commits = __get_commits(repo, today)
    bad_commit = binary_search_bad_commit(today_commits)
    # raise an alarm in some way
  }
}

panyx0718 · 2018-03-11T04:07:54Z

I think we can raise an alert immediately when !ok is detected. And then search for bad_commit.

helinwang · 2018-03-12T22:22:51Z

I think it's a great idea to binary search bad commit, but not sure it's worth it to spend time to develop and maintain the feature:

In theory all commit should be able to build, but in practice CI just check the last commit of a sequence of commits in a PR. It's very possible that some commit could not build.
I assume that we run the test script daily, there probably is not too much commit to check manually, if it's hard to check, we can increase the test frequency. And the model CI hopefully won't fail too frequently.

panyx0718 · 2018-03-13T01:10:11Z

I think the search should just check the last commit (merged to the main branch) in the PR, which should always build and pass. Maybe a better way to describe the feature is "search for bad PR".
I think as the team grows and more people are committing to Paddle, a automatic solution will save a lot manual effort. As I said, this is a bonus feature and we can develop it later.

Superjomn · 2018-03-13T01:51:18Z

Get it. The ModelCi will keep pulling the latest code and test.

shanyi15 · 2018-08-15T11:37:31Z

您好，此issue在近一个月内暂无更新，我们将于今天内关闭。若在关闭后您仍需跟进提问，可重新开启此问题，我们将在24小时内回复您。因关闭带来的不便我们深表歉意，请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

Superjomn assigned wangkuiyi and panyx0718 Mar 9, 2018

Superjomn added the design_doc label Mar 9, 2018

Superjomn mentioned this issue Mar 13, 2018

Model CI #9002

Closed

shanyi15 closed this as completed Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need a Model CI #8903

Need a Model CI #8903

Superjomn commented Mar 9, 2018 •

edited

panyx0718 commented Mar 9, 2018 •

edited

typhoonzero commented Mar 9, 2018

panyx0718 commented Mar 9, 2018

typhoonzero commented Mar 9, 2018

Superjomn commented Mar 9, 2018

helinwang commented Mar 9, 2018

Superjomn commented Mar 10, 2018 •

edited

panyx0718 commented Mar 11, 2018

helinwang commented Mar 12, 2018 •

edited

panyx0718 commented Mar 13, 2018 •

edited

Superjomn commented Mar 13, 2018 •

edited

shanyi15 commented Aug 15, 2018

Need a Model CI #8903

Need a Model CI #8903

Comments

Superjomn commented Mar 9, 2018 • edited

Model CI

Module function

Indicators tracked

panyx0718 commented Mar 9, 2018 • edited

typhoonzero commented Mar 9, 2018

panyx0718 commented Mar 9, 2018

typhoonzero commented Mar 9, 2018

Superjomn commented Mar 9, 2018

helinwang commented Mar 9, 2018

Superjomn commented Mar 10, 2018 • edited

panyx0718 commented Mar 11, 2018

helinwang commented Mar 12, 2018 • edited

panyx0718 commented Mar 13, 2018 • edited

Superjomn commented Mar 13, 2018 • edited

shanyi15 commented Aug 15, 2018

Superjomn commented Mar 9, 2018 •

edited

panyx0718 commented Mar 9, 2018 •

edited

Superjomn commented Mar 10, 2018 •

edited

helinwang commented Mar 12, 2018 •

edited

panyx0718 commented Mar 13, 2018 •

edited

Superjomn commented Mar 13, 2018 •

edited