Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a Model CI #8903

Closed
Superjomn opened this issue Mar 9, 2018 · 12 comments
Closed

Need a Model CI #8903

Superjomn opened this issue Mar 9, 2018 · 12 comments
Assignees

Comments

@Superjomn
Copy link
Contributor

Superjomn commented Mar 9, 2018

Model CI

The users occasionally found a negligible performance or precision difference between different Paddle versions. Though we have unit tests for each class and Travis-CI to ensures the precision of each operator, there is no any logic to ensure the model (a composition of several operators) works as reliable as the operators.

There are several conditions where an existing model will fail either in performance or precision:

  1. the incomplete coverage test cases.
  2. poor performance update, currently, we have no performance tracker for each operator.
  3. API changes, developers are likely to forget to update usages in some other repos such as paddle/models.

The model-CI module is proposed to enhance the weaknesses above and track the overall performance and precision of the model granularity.

Module function

Inputs:

  • a compiled paddle whl of some version
  • some test models
  • historical information of the execution of the test models

Outputs:

  • execution information of the test models
  • give an alarm if there is an obvious difference

Indicators tracked

  1. success or not of a model execution
  2. execution time of a model
  3. train cost trend
  4. infer precision trend
@panyx0718
Copy link
Contributor

panyx0718 commented Mar 9, 2018

LGTM.

Just to add more details

  1. Model Selection
    Need to have good coverage. Recommend to pick 1~2 most
    widely used model for each domain (NLP, vision, speech, etc).
    E.g. Resnet, Seq2SeqAttention, GAN, etc.
    E.g. single-device, multi-device, multi-machine

  2. The history should be recorded and analyzable.
    E.g. the result is from which commit.

  3. Bonus feature
    Automatically find which commit cause the problem.

@typhoonzero
Copy link
Contributor

How about we build a Docker image every time run the CI, the docker image is named like paddle-regression:b825c79 with the git commit id as it's tag. The docker file also contains dataset and model python programs to run the full regression, so that we can reproduce it whenever wherever.

@panyx0718
Copy link
Contributor

Sounds Good. Not sure how large is a image. Maybe we can keep the last N good ones and the recent bad one

@typhoonzero
Copy link
Contributor

Basic lower layers of the image can stay unchanged including the dataset contents. Changed files only contain whl package and python files and it's about 100M per build. We can keep a lot of history builds then.

@Superjomn
Copy link
Contributor Author

how about a dev docker, clone any commitid, and run the test. So that we can test any commitid. No need to compile or store images.

The only varients is commitid.

@helinwang
Copy link
Contributor

Thanks for this important effort!

I think it's better to test with prod image (or whl), since prod image is the image that our user uses. It is possible that some required Python dependencies is installed on dev image but not installed on prod image.

@Superjomn
Copy link
Contributor Author

Superjomn commented Mar 10, 2018

Agree, we have daily compiled whls, those need to be tested. @helinwang

But according to

Bonus feature
Automatically find which commit caused the problem.

the ModelCI needs to test commit ids using a binary search, so there are two input sources to the ModelCI:

  1. any released whls
  2. any commit ID from git

let's clarify the logic of ModelCI:

function test_whl(some_whl)
function compile_whl_from_source(path)
function clone_commit_from_repo(gitpath)
function binary_search_bad_commit(commits) {
  # ...
  source_path = clone_commit_from_repo(commit)
  whl_path = compile_whl_from_source(source_path)
  status = test_whl(whl_path)
  # ...
}


function main() {
  today_released_whl = __download(...)
  ok = test_whl(today_released_whl)
  if (!ok) {
   # raise an alarm
    today_commits = __get_commits(repo, today)
    bad_commit = binary_search_bad_commit(today_commits)
    # raise an alarm in some way
  }
}

@panyx0718
Copy link
Contributor

I think we can raise an alert immediately when !ok is detected. And then search for bad_commit.

@helinwang
Copy link
Contributor

helinwang commented Mar 12, 2018

I think it's a great idea to binary search bad commit, but not sure it's worth it to spend time to develop and maintain the feature:

  1. In theory all commit should be able to build, but in practice CI just check the last commit of a sequence of commits in a PR. It's very possible that some commit could not build.
  2. I assume that we run the test script daily, there probably is not too much commit to check manually, if it's hard to check, we can increase the test frequency. And the model CI hopefully won't fail too frequently.

@panyx0718
Copy link
Contributor

panyx0718 commented Mar 13, 2018

  1. I think the search should just check the last commit (merged to the main branch) in the PR, which should always build and pass. Maybe a better way to describe the feature is "search for bad PR".
  2. I think as the team grows and more people are committing to Paddle, a automatic solution will save a lot manual effort. As I said, this is a bonus feature and we can develop it later.

@Superjomn
Copy link
Contributor Author

Superjomn commented Mar 13, 2018

Get it. The ModelCi will keep pulling the latest code and test.

@Superjomn Superjomn mentioned this issue Mar 13, 2018
@shanyi15
Copy link
Collaborator

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants