Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the time cost to go into training #410

Merged
merged 26 commits into from Aug 10, 2020
Merged

Conversation

rickycao-qy
Copy link
Collaborator

@rickycao-qy rickycao-qy commented Aug 5, 2020

This PR changes as following:

  1. Data process and model training will do simultaneously
  2. implement next and nextBatch function in dataloader which will return data once the data fetched has been processed by data-process

Now the plugin of data-process, tfjs-simplecnn, tjfs-train has been changed using this new way to train. The test pipeline is test/pieplines/mnist-image-classification.json

Baseline:
the baseline test pipeline is in this pr branch's test/pipelines/mnist-image-classification
The CPU used is 2.2 GHz Quad-Core Intel Core i7

version time to go into training whole process
current master branch 1 min 30 s 7-8 hour
older version(commit a81e6a8) 1min 15 s 15 min 25s
this PR 11 s 13 min

packages/core/src/types/data/common.ts Outdated Show resolved Hide resolved
packages/core/src/types/data/common.ts Outdated Show resolved Hide resolved
packages/core/src/types/data/common.ts Outdated Show resolved Hide resolved
packages/core/src/types/data/common.ts Outdated Show resolved Hide resolved
packages/core/src/types/data/common.ts Outdated Show resolved Hide resolved
packages/core/src/types/data/common.ts Outdated Show resolved Hide resolved
packages/core/src/types/data/common.ts Outdated Show resolved Hide resolved
@@ -40,6 +42,11 @@ class DataLoader implements ImageDataLoader {
label: this.dataPairs[id].label
};
}

async setItem(id: number, sample: ImageSample) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async setItem(id: number, sample: ImageSample) {
setItem(id: number, sample: ImageSample): void {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is extended from super-class async function. Acutally return value is Promise. I give it as async function since it's likely that in some cases we need to do some async logic inside setItem

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the return value is void, we need to use : Promise<void>.

The second part is the return type. We make it clear which is the return type by using an arrow (=>) between the parameters and the return type. As mentioned before, this is a required part of the function type, so if the function doesn’t return a value, you would use void instead of leaving it off.

https://www.typescriptlang.org/docs/handbook/functions.html#writing-the-function-type

@@ -32,6 +33,10 @@ class DataLoader implements CsvDataLoader {
async getItem(id: number) {
return this.records[id];
}

async setItem(id: number, sample: CsvSample) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async setItem(id: number, sample: CsvSample) {
setItem(id: number, sample: CsvSample): void {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

run_pipeline.sh Outdated Show resolved Hide resolved
getItem: (id: number) => Promise<Sample>;
next?: () => Promise<Sample>;
nextBatch?: (batchSize: number) => Promise<Array<Sample>>;
export abstract class DataLoader{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export abstract class DataLoader{
export abstract class DataLoader {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a unit tests for this class is required :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lint problem has been fixed. I will add a unit test ASAP

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test has been added. Please have a check pls :)

return;
}

if (pkg.pipcook.category === 'datasetProcess') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move the datasetProcess to default handler?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, have remove this

@yorkie
Copy link
Member

yorkie commented Aug 7, 2020

@rickycao-qy @WenheLI Awesome works and reviewed some small nits, BTW there might be a little bit conflicts with #405, @FeelyChau how can we merge them?

yorkie
yorkie previously approved these changes Aug 10, 2020
Copy link
Member

@yorkie yorkie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI is green

FeelyChau
FeelyChau previously approved these changes Aug 10, 2020
Copy link
Collaborator

@FeelyChau FeelyChau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI passed.

@FeelyChau FeelyChau merged commit 1e183f2 into master Aug 10, 2020
@FeelyChau FeelyChau deleted the add/data-process-pipeline branch August 10, 2020 09:06
@WenheLI WenheLI mentioned this pull request Sep 8, 2020
1 task
gindis pushed a commit to gindis/pipcook that referenced this pull request Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants