design/model ci #9018

Superjomn · 2018-03-13T06:16:33Z

No description provided.

panyx0718 · 2018-03-13T06:30:08Z

contrib/modelci/README.md

+
+## Make Factor Tracking Extensible
+
+There are some general factors including `train cost`, `validate cost` and `duration`, but some more factors such as `memory usage`, `gpu_memory_usage` and so on should add in the test.


+1 about tracking the memory usage.

panyx0718 · 2018-03-13T06:30:59Z

contrib/modelci/README.md

+
+1. the incomplete coverage test cases. 
+2. poor performance update, currently, we have no performance tracker for each operator.
+3. API changes, developers are likely to forget to update usages in some other repositories such as paddle/models.


there is another benefit:
We can point user to our benchmark models as "currently working examples"

Agree, some classical models will be tested every time.

Some more popular models might be tested every several days, and the ModelCI will generate a report about the precision and performance of the PaddlePaddle platform.

panyx0718 · 2018-03-13T06:56:36Z

contrib/modelci/README.md

+valid_duration_factor = ValidDurationFactor()
+train_memory_factor = TrainMemoryFactor()
+
+tracking_factors = [ train_duration, valid_duration_factor, train_meta_factor ]


train_duration -> train_duration_factor

panyx0718 · 2018-03-13T07:29:33Z

contrib/modelci/README.md

+```python
+# configuration of some model
+import paddle
+import meta


maybe a name better than 'meta'? like continuous_evaluation

panyx0718 · 2018-03-13T07:33:00Z

contrib/modelci/README.md

+- sending email to `paddle-dev@baidu.com` including
+  - error type
+  - the details of the tracked abnormal factors
+- update the error information and push to git


Can have more information, such as "current failed commit", "last success commit" and some history to show if it's flaky.

panyx0718 · 2018-03-13T07:34:17Z

contrib/modelci/README.md

+## Persistence of log
+
+The log of each execution should be stored somewhere, 
+the simplest way is to use Git to maintain a versionable history.


Is it going to append or overwrite? I would prefer append, but I'm a bit worried about the git repo size.

panyx0718 · 2018-03-13T07:35:20Z

contrib/modelci/README.md

+{success_or_not} {short summary of the cause}
+
+execution duration: {overall_duration}
+paddle code version: {commitid}


How about using a proto text format? In the future, we might want a dashboard to read the log and visualize it.

Agree, proto can store a more complex data structure, but a plain text file with a specific file name seems enough to store the values of most KPI.

For example, to store the data for train cost, a file called ./train.cost.txt will be created with data as follows

1.2 0.99 0.4

each line a float number, and not hard to load and visualize.

I wonder whether there is some KPI that has a complex data structure so that a plain text is hard to store the data.

Another question, I am confused about how to persist the history data, a Github repo is good for version and a plain text is human-friendly for analysis, but it can not scale up.

A database is good for scale-up, but works like a black box and make the need for the ModelCI to support some analysis accessing data from the black box.
@panyx0718

I think we can first save it to local disk. In the future, we might apply for some storage resource in the cloud.

panyx0718 · 2018-03-13T07:37:08Z

contrib/modelci/README.md

+To make the testing logic stable, the testable model should assure that
+
+- fix the random seed to make result reproducible
+- just run 10 or 20 batches, and each batch takes no more than 30 seconds to limit the overall execution time


I suspect 30seconds is not enough. Loss could be spiky at the beginning. I would try 10~30min and get the loss.

I think we need eight models to cover the four application fields(image, audio, NLP, and CTR). Each model has at least two computation modes such as CPU and GPU, so the time consumption for a full test is from 3h(10min/test) to 9h on one free machine.

I wonder how long is the ideal duration for a full test.
@panyx0718

luotao1 · 2018-03-14T12:06:54Z

For model ci, should we enforce to squash and merge the PR instead of Merge pull request, which would reduce the number of ci commits?

abhinavarora

The design looks great overall. There are some language changes that I have suggested. Please correct them before merging.

abhinavarora · 2018-03-14T19:59:29Z

contrib/modelci/README.md

@@ -0,0 +1,149 @@
+# Model CI
+
+The users occasionally found a negligible performance or precision issue between different Paddle versions. Though we have unit tests for each class and Travis-CI to ensures the precision of each operator, there is no any logic to ensure the model (a composition of several operators) works as reliable as the operators.


found -> find

Thanks for figuring out the grammar issues.

A new repo is opened for this project. I've fixed the grammar issue in PaddlePaddle/continuous_evaluation#1

abhinavarora · 2018-03-14T19:59:56Z

contrib/modelci/README.md

@@ -0,0 +1,149 @@
+# Model CI
+
+The users occasionally found a negligible performance or precision issue between different Paddle versions. Though we have unit tests for each class and Travis-CI to ensures the precision of each operator, there is no any logic to ensure the model (a composition of several operators) works as reliable as the operators.


drop the any in there is no any logic

abhinavarora · 2018-03-14T20:01:25Z

contrib/modelci/README.md

+
+There are several conditions where an existing model will fail either in performance or precision:
+
+1. the incomplete coverage test cases, such as lacking the test of precision. 


This should be Incomplete coverage of test cases. For example, tests lacking precision check.

abhinavarora · 2018-03-14T20:02:21Z

contrib/modelci/README.md

+
+1. the incomplete coverage test cases, such as lacking the test of precision. 
+2. poor performance update, currently, we have no performance tracker for each operator.
+3. API changes, developers are likely to forget to update usages in some other repositories such as paddle/models.


This sentence will be Developers generally forget to update API use in other repositories such as paddle/models.

abhinavarora · 2018-03-14T20:03:37Z

contrib/modelci/README.md

+            # write to file self.out_file
+```
+
+More factors can be integrated into the test framework, for example, a factor tracker which test the training duration can be added in the following way


tracker which test the -> tracker which tests the

abhinavarora · 2018-03-14T20:04:37Z

contrib/modelci/README.md

+
+## Keep updating the baseline
+The ModelCI will keep comparing the KPIs of the latest code with the last successful evaluated version,
+if the current version has the KPIs better than baseline, update the baseline, otherwise ring an alarm.


has the KPIs better than -> has better KPIs than baseline

abhinavarora · 2018-03-14T20:05:03Z

contrib/modelci/README.md

+
+The models should be placed in `./models` directory, each has a sub-directory, and a `train.xsh` script to define how to run this model. After triggering the `train.xsh`, all the data of `tracking_factors` should be created.
+
+For example, a normal model might have following logic


have the following logic

abhinavarora · 2018-03-14T20:05:20Z

contrib/modelci/README.md

+run_train_gpu
+```
+
+To make the testing logic stable, the testable model should assure that


assure -> ensure

Superjomn and others added 5 commits March 12, 2018 18:35

init

a75f233

update README

00bad6e

init design

c6e3d2f

remote code

ee5ee0a

clean models code

254fee8

Superjomn requested review from reyoung, wangkuiyi, panyx0718 and dzhwinter March 13, 2018 06:18

panyx0718 reviewed Mar 13, 2018

View reviewed changes

Superjomn added 2 commits March 13, 2018 17:07

update with KPI concept

061b62d

make Factor subclass registeration automic

a22f323

panyx0718 approved these changes Mar 14, 2018

View reviewed changes

abhinavarora reviewed Mar 14, 2018

View reviewed changes

Superjomn mentioned this pull request Mar 15, 2018

grammer fix PaddlePaddle/continuous_evaluation#1

Merged

Superjomn closed this in PaddlePaddle/continuous_evaluation#1 Mar 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design/model ci #9018

design/model ci #9018

Superjomn commented Mar 13, 2018

panyx0718 Mar 13, 2018

panyx0718 Mar 13, 2018

Superjomn Mar 13, 2018

panyx0718 Mar 13, 2018

panyx0718 Mar 13, 2018

Superjomn Mar 13, 2018

panyx0718 Mar 13, 2018

panyx0718 Mar 13, 2018

panyx0718 Mar 13, 2018

Superjomn Mar 13, 2018 •

edited

panyx0718 Mar 14, 2018 •

edited

panyx0718 Mar 13, 2018

Superjomn Mar 13, 2018

luotao1 commented Mar 14, 2018

abhinavarora left a comment

abhinavarora Mar 14, 2018

Superjomn Mar 15, 2018

abhinavarora Mar 14, 2018

abhinavarora Mar 14, 2018

abhinavarora Mar 14, 2018

abhinavarora Mar 14, 2018

abhinavarora Mar 14, 2018

abhinavarora Mar 14, 2018

abhinavarora Mar 14, 2018


		## Make Factor Tracking Extensible

		There are some general factors including `train cost`, `validate cost` and `duration`, but some more factors such as `memory usage`, `gpu_memory_usage` and so on should add in the test.

		@@ -0,0 +1,149 @@
		# Model CI

		The users occasionally found a negligible performance or precision issue between different Paddle versions. Though we have unit tests for each class and Travis-CI to ensures the precision of each operator, there is no any logic to ensure the model (a composition of several operators) works as reliable as the operators.


		There are several conditions where an existing model will fail either in performance or precision:

		1. the incomplete coverage test cases, such as lacking the test of precision.


		The models should be placed in `./models` directory, each has a sub-directory, and a `train.xsh` script to define how to run this model. After triggering the `train.xsh`, all the data of `tracking_factors` should be created.

		For example, a normal model might have following logic

design/model ci #9018

design/model ci #9018

Conversation

Superjomn commented Mar 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Superjomn Mar 13, 2018 • edited

Choose a reason for hiding this comment

panyx0718 Mar 14, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 commented Mar 14, 2018

abhinavarora left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Superjomn Mar 13, 2018 •

edited

panyx0718 Mar 14, 2018 •

edited