Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoScheduler] Tutorial on auto-scheduling a network for GPU #6882

Merged
merged 17 commits into from Nov 13, 2020

Conversation

merrymercy
Copy link
Member

@merrymercy merrymercy commented Nov 8, 2020

  • Add a tutorial on auto-scheduling a network for GPU.
  • Fix a bug in kill_child_processes
  • Improve the interface, output message, and early stopping of the task scheduler
  • Improve 'call_func_with_timeout`

Todo

  • Upload pre-tuned logs into CI
  • Add a section on how to register new operators
  • Improve fallback messages and add a section to explain them

@merrymercy merrymercy force-pushed the pr-cuda-tutorial branch 2 times, most recently from 59496c9 to a0577cb Compare November 8, 2020 06:07
@merrymercy
Copy link
Member Author

@merrymercy merrymercy force-pushed the pr-cuda-tutorial branch 5 times, most recently from 3430105 to 018c4f4 Compare November 8, 2020 17:54
Copy link
Contributor

@mbaret mbaret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested in running some of these tutorials to get an idea on the capabilities of the auto-scheduler. However, I think this PR contains a lot of changes/clean-up that aren't related to the tutorial. Would it be possible to break this up?

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take another look once the items in TODOs are addressed.
@mbaret you actually need everything in this PR to make the tutorial work as it describes. Since auto_scheduler is still an experimental feature in upstream and we don't need to backport other changes to v0.7.1, I think it's fine to keep as it is.

src/auto_scheduler/feature.cc Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
Comment on lines 266 to 270
# After auto-tuning, we can compile the network with the best schedules we found.
# All measurement records are dumpled into the log file during auto-tuning,
# so we can read the log file and load the best schedules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to also mention what happen and what messages you will see if there is no valid schedules in the log file.

@mbaret
Copy link
Contributor

mbaret commented Nov 8, 2020

@mbaret you actually need everything in this PR to make the tutorial work as it describes. Since auto_scheduler is still an experimental feature in upstream and we don't need to backport other changes to v0.7.1, I think it's fine to keep as it is.

Personally, the issue I find is that when commits don't describe the changes they make, it can be hard to determine what patch in history led to a certain change in behaviour (I've been burnt by this a few times now). Generally, I assume 'add tutorial' type commits won't change behaviour unless it's explicitly flagged. If all these changes are required, perhaps a middle-ground would be to flag them explicitly in the commit message?

@comaniac
Copy link
Contributor

comaniac commented Nov 9, 2020

Personally, the issue I find is that when commits don't describe the changes they make, it can be hard to determine what patch in history led to a certain change in behaviour (I've been burnt by this a few times now). Generally, I assume 'add tutorial' type commits won't change behaviour unless it's explicitly flagged. If all these changes are required, perhaps a middle-ground would be to flag them explicitly in the commit message?

I agree with your that separating PRs for different functions is important for long-term maintenance, and we should do that for every released feature. However, my point is since we haven't fully released auto_scheduler and the total number of auto_scheduler PRs is just a few, it should be easy to identify the PR that changes the certain behavior (in fact, since currently only a few people using the upstream auto_scheduler, I don't think this would be an issue for now). IMHO, it's fine to follow this principle after auto_scheduler is able to perform end-to-end tuning on all three platforms (x86, ARM, NVIDIA GPU). Meanwhile, my primary concern of separating changes to many small PRs is that it will slower the auto_scheduler upstream process due to the high CI traffic.

def _print_table_info(self, next_task_idx):
# table header
_ffi_api.PrintTitle("Task Scheduler")
print("| ID | Latency (ms) | Speed (GFLOPS) | Trials |")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we extract more information? Like operator name (Conv2D, softmax...) and its shape (1x3x224x224)? Only ID, we have to match its detail information again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not easy to extract this information by parsing the compute dag.
One way to achieve this is to attach this information by using the attrs in te.compute when defining ops in TOPI compute functions.
I leave this to future PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@mbaret
Copy link
Contributor

mbaret commented Nov 9, 2020

Meanwhile, my primary concern of separating changes to many small PRs is that it will slower the auto_scheduler upstream process due to the high CI traffic.

Would you agree then that a more explicit commit message would be valuable?

mbaret
mbaret previously requested changes Nov 9, 2020
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tasks, task_weights = auto_scheduler.extract_tasks(mod["main"], params, target)

# Define the objective as the end-to-end exeuction time of the network
objective = lambda costs: sum(c * w for c, w in zip(costs, task_weights))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone not very familiar with the auto-scheduler, it seems a bit strange to me that this is exposed here. Could this not be a default objective?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. I changed the interface to hide this from users.

tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
tutorials/auto_scheduler/tune_network_cuda.py Outdated Show resolved Hide resolved
python/tvm/auto_scheduler/task_scheduler.py Outdated Show resolved Hide resolved
@comaniac
Copy link
Contributor

comaniac commented Nov 9, 2020

Would you agree then that a more explicit commit message would be valuable?

I'm not sure how helpful it is, but I'll let @merrymercy decide.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just some nits.

python/tvm/auto_scheduler/measure_record.py Outdated Show resolved Hide resolved
python/tvm/auto_scheduler/task_scheduler.py Show resolved Hide resolved
python/tvm/auto_scheduler/task_scheduler.py Show resolved Hide resolved
@merrymercy merrymercy force-pushed the pr-cuda-tutorial branch 2 times, most recently from 3f91bd1 to 912e990 Compare November 13, 2020 02:25
@merrymercy merrymercy dismissed mbaret’s stale review November 13, 2020 08:36

comments are addressed

@merrymercy merrymercy merged commit 050a836 into apache:main Nov 13, 2020
@merrymercy merrymercy deleted the pr-cuda-tutorial branch November 13, 2020 08:36
@merrymercy
Copy link
Member Author

merrymercy commented Nov 13, 2020

Let me merge this PR first because it fixed multiple bugs.
I will send follow up PRs to improve the fallback mechanism in relay.build when there is no valid schedule in the log file.

@merrymercy
Copy link
Member Author

merrymercy commented Nov 13, 2020

@mbaret The tutorial is online now https://tvm.apache.org/docs/tutorials/auto_scheduler/tune_network_cuda.html
You can try it and any feedback is welcomed!

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
…#6882)

* add a tutorial on auto-scheduling a network for cuda

* fix typo

* fix training time printing

* fix lint

* fix

* upload logs

* fix

* use weighted sum as the default objective function

* update ci logs

* fix the bug in kill_child_processes

* fix test

* address comments

* add early stopping in task scheduler & fix a stuck issue in measurement

* fix lint

* trigger CI

* fix early stopping
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
…#6882)

* add a tutorial on auto-scheduling a network for cuda

* fix typo

* fix training time printing

* fix lint

* fix

* upload logs

* fix

* use weighted sum as the default objective function

* update ci logs

* fix the bug in kill_child_processes

* fix test

* address comments

* add early stopping in task scheduler & fix a stuck issue in measurement

* fix lint

* trigger CI

* fix early stopping
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
…#6882)

* add a tutorial on auto-scheduling a network for cuda

* fix typo

* fix training time printing

* fix lint

* fix

* upload logs

* fix

* use weighted sum as the default objective function

* update ci logs

* fix the bug in kill_child_processes

* fix test

* address comments

* add early stopping in task scheduler & fix a stuck issue in measurement

* fix lint

* trigger CI

* fix early stopping
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants