[RFC] [AutoTVM] Implementing an auto-tuning library/cache #4150

mbaret · 2019-10-18T16:36:34Z

Auto-tuning currently relies on manually keeping track of various log files. This can quickly become quite unwieldy when tuning for many different devices, trying to do partial tuning or restarting a tuning session.

Proposals

Create an offline library of auto-tune configurations into which you can feed auto-tuning logs and have the optimal configurations be saved. The library should store not just the configuration, but also the tuning conditions (eg. tuner + no. of trials). This way, it is possible to check whether or not 'sufficient' tuning has already been done on a particular task and if so that task can be skipped. I propose an interface which to the library which would make a typical auto-tuning loop look something like the following:

# Initialise a config library object pointing to some index file
# Probably have the default point to something like ~/.tvm/autotvm/...
config_library = ConfigLibrary('path/to/index.json')
tuner = 'xgb'

# Create a new auto-tuning 'job'
# The library will automatically generate a tmp log file for the job
config_library.start_job()

for i, tsk in enumerate(tasks):
    # get_trials returns the number of trials a task has been tuned for already
    trials_pretuned = config_library.get_trials(tsk)
    if trials_pretuned >= early_stopping or trials_pretuned >= len(tsk.config_space):
	logger.info("[Task  {}/{}]  Found in Config Library!".format(i + 1, len(tasks)))
	continue

    # Create a tuner
    tuner_obj = XGBTuner(tsk, loss_type="rank")

    # If transfer learning is being used, load the existing results
    if use_transfer_learning:
        # get_job_records returns the tuning records for the current job
	tuner_obj.load_history(config_library.get_job_records())

    prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))

    # Perform the tuning
    tuner_obj.tune(
	n_trial=min(n_trial, len(tsk.config_space)),
	early_stopping=early_stopping,
	measure_option=measure_option,
	callbacks=[
	    autotvm.callback.progress_bar(n_trial, prefix=prefix),
            # New autotvm callback to log directly to the Config Library
            autotvm.callback.log_to_library(config_library),
	],
    )

config_library.stop_job()

You would then use the library with something as simple as:

with config_library:
    relay.build(...)

Additional Thoughts

In order to reliably interact with existing records in the library, you need to be able to determine the exact platform/device that the tuning was performed on. I currently use the '-model' parameter to store this information (eg. -model=hikey960), but it would be better to be able to store some arbitrary json object here so that additional platform configuration options can be specified (eg. clock speeds, driver versions etc).

The current logging system is also heavily reliant on writing essentially flat text files. A config library would probably be more suited to being stored in a nosql/json database, however for now I've stuck to keeping it flat.

My WIP PR is here #4151.

Comments/suggestions are welcomed!

Implementation

Save configs to the library (PR: [AutoTVM] Added an autotuning Config Library to store autotune results #4151)
Save autotuning jobs (PR: [AutoTVM] Added an autotuning Config Library to store autotune results #4151)
Load/Skip tasks that exist in the library (PR: [AutoTVM] Added an autotuning Config Library to store autotune results #4151)
Use library to load DispatchContext (PR: [AutoTVM] Added an autotuning Config Library to store autotune results #4151)
Write tutorial on using the config library (PR: [AutoTVM] Added an autotuning Config Library to store autotune results #4151)

The text was updated successfully, but these errors were encountered:

kevinthesun · 2019-10-18T17:23:50Z

Thank you for this proposal. This is helpful to manage local log files. One question about:

with config_library:
    relay.build(...)

What is the relationship between config_library and autotvm dispatch context? It seems that this design replaces dispatch context with config_library. And how are different dispatch contexts managed in this case?

comaniac · 2019-10-18T21:56:25Z

Thanks for the RFC. I like the idea of the config library concept. Some concerns/questions:

Same to @kevinthesun, I'd prefer to keep the current AutoTVM dispatch context instead of introducing a new one. For example, we can just overload apply_history_best to take either a JSON file like now or a config library in this proposal.
Current AutoTVM has a database module that stores tuned configs and results to Redis database (altough no one is using this feature AFAIK). Considering the compatibility and usability, we should hide the resume logic from users and make all storage mechanisms consist. For example, we can first move the "log to" callback to tuner arguments:

    tuner_obj.tune(
	n_trial=min(n_trial, len(tsk.config_space)),
	early_stopping=early_stopping,
	measure_option=measure_option,
        config_library="history.json", # Can be either a string (JSON file name) or a DB object
	callbacks=[
	    autotvm.callback.progress_bar(n_trial, prefix=prefix),
	],
    )

In this way, we can implement the resume logic you proposed in the constructor of tuners.

For the rules that determine if the history can be reused for resuming the tuning, I think it should be a general question as the tophub faces to. However, I would not suggest to refer to the tuning specific information such as trial numbers for the following reasons. First, users may require different trial numbers or different measure options. Second, this limits the history be used only for the exactly same task. If we design this resume logic in a general way, we can also extend it to tophub.

I agree with you that when loading the history, in addition to checking if the history config matches the current task in terms of target, op name, shapes and attributes, we also need to check the device as you mentioned. We can try to retrieve the target device info using system call and add it to every record when dumping to file/database.

In my personal opinion, we also need to invalid the history/records when TVM has been updated, because it may result in different performance even for the same config on the same device. The simplest way is checking the timestamp. For example, we let users config an expiration time when creating a tuner:

    library_option = {"library": "history.json", "expiration": 1440}

    tuner_obj.tune(
	n_trial=min(n_trial, len(tsk.config_space)),
	early_stopping=early_stopping,
	measure_option=measure_option,
        library_option=library_option, 
	callbacks=[
	    autotvm.callback.progress_bar(n_trial, prefix=prefix),
	],
    )

In this example, a user wants the tuner to load history.json and use all records generated within 1440 minutes (1 day). One advantage of using this approach is that we don't need to touch the searching part at all, but only let Tuner.tune bypass the measurement when the record is available already.

Again thanks for the proposal, and any of my suggestions can be adjusted/discussed if they are too overwhelming or unnecessary.

mbaret · 2019-10-21T09:36:13Z

Thanks @kevinthesun and @comaniac for the responses!

I'd prefer to keep the current AutoTVM dispatch context

I'm not intending to replace the existing dispatch context, only provide some syntactic sugar. We could just override the __enter__ method of ConfigLibrary to do apply_history_best. I think it would be more intuitive than extracting the relevant .json file and passing explicitly.

If we design this resume logic in a general way, we can also extend it to tophub.

Does it make sense to generalise here? As far as I can tell, TopHub doesn't store tuning history just optimal configs, so there's no way to 'resume' a TopHub tuning session. In some way we have to determine whether the existing 'tuning effort' to produce a particular config is sufficient and number of trials is the only obvious way I can think of characterising this. I'd be happy to look at any alternative implementation idea though.

We can try to retrieve the target device info using system call and add it to every record when dumping to file/database.

This would be a good start, but I think this needs to also be something a user can fully specify. For instance, we might be interesting in driver versions, memory clock speeds or even physical parameters such as board cooling. Which system calls were you considering using to determine the platform? Perhaps have a default method that relies on these calls with the ability to pass additional arbitrary info to config_library.start_job()?

In my personal opinion, we also need to invalid the history/records when TVM has been updated

I agree with this, but maybe it can be included as part of the previous point on board configuration? In a general sense we need an idea of whether a particular config is 'compatible' with our current platform and I think it's reasonable to include TVM version as a part of this.

comaniac · 2019-10-21T17:04:51Z

Thanks for the reponses and I think they are valuable. I embedded my opinions with yours and leave the dispatch context for @kevinthesun.

Also cc @tqchen and @icemelon9 for their inputs.

If we design this resume logic in a general way, we can also extend it to tophub.

Does it make sense to generalise here? As far as I can tell, TopHub doesn't store tuning history just optimal configs, so there's no way to 'resume' a TopHub tuning session. In some way we have to determine whether the existing 'tuning effort' to produce a particular config is sufficient and number of trials is the only obvious way I can think of characterising this. I'd be happy to look at any alternative implementation idea though.

I agree with you that TopHub is serving a different purpose if we consider trial number in the resume logic, but they can still share the same implementation and history format in the way I suggested. My concern of using trial number is that it limits the use case of this RFC only for resuming interrupted tuning but not others, such as transferring the tuning process to the others, or reuse the configs of 2000 trial random search to launch a new grid search, etc.

Alternativaly, we could decouple the history and a specific tuning process. Speicifically, we do not add any tuning process specific information to the config library but just let the tuner determine if it can reuse the result from the config library or not when it needs to measure that config. For example, the tuning process was interrupted in the 50th trial so we have 50 configs in the library. When resuming the tuning, the tuner still starts from scratch but it could save the time of measuring those 50 configs when the tuner follows the same tuning process. One advantage is that this scenario is applicable to different tuner or even different models with the same task.

One drawback of my alternative comapred to yours is that if the tuning process is non-deterministic (e.g., random search) then we might spend time on tuning different configs, but I think this can be workaround by either exposing an optional random seed argument in tuner (such as random_state used in sklearn), or let user reduce the trial number when resuming.

We can try to retrieve the target device info using system call and add it to every record when dumping to file/database.

This would be a good start, but I think this needs to also be something a user can fully specify. For instance, we might be interesting in driver versions, memory clock speeds or even physical parameters such as board cooling. Which system calls were you considering using to determine the platform? Perhaps have a default method that relies on these calls with the ability to pass additional arbitrary info to config_library.start_job()?

I have the same question actually. This part is relatively vague and probably need some other's input.

In my personal opinion, we also need to invalid the history/records when TVM has been updated

I agree with this, but maybe it can be included as part of the previous point on board configuration? In a general sense we need an idea of whether a particular config is 'compatible' with our current platform and I think it's reasonable to include TVM version as a part of this.

Your response remineded me that the current config history already includes a version information, although it is always 0.1. Not sure if we can make use of it and save some efforts.

tqchen · 2019-10-21T17:13:01Z

Thanks for the helpful discussion. Some of the common themes that I see

Need for more meta data to inform the tuner, if possible
The key question is whether meta data is mandatory or serve as an auxiliary component.
- e.g. we may not want the general features to must depend on the meta-data, while it is nice to have an option to start from trial n, but it would be great if the tuner can still function without them
Everyone seems to agree on a library context that help.

It would be great if we can dissect the discussion, e.g. reach a consensus for meta-data format that we prefer, and then talks about possible context library behaviors and possibility of implementing different variants of libraries

mbaret · 2019-10-21T20:48:02Z

@comaniac I think I understand where our different approaches are coming from. I was proposing that only the optimal configurations be permanently saved to the config library (like with TopHub) and a temporary log file of tuning configs would be maintained only during a tuning job. Storing all of the tuning history would rapidly result in huge files which I think would be fine in the case of a database but seems unwise for text files (in terms of search performance).

From my experience using AutoTVM, I often find interrupted tuning sessions occur while tuning a large multilayer network. In this case, I mostly care about skipping the layers that have already been fully tuned. Restarting the partially tuned layer from scratch is often not a significant time penalty in comparison. I see that this approach is not nearly as good in a workflow that involves iteratively tuning a network more and more, in which case you would save a significant amount of time by being able to resume using the tuning history.

A compromise between the two options might be, as you said, making the tuners deterministic. That way by just knowing the number of trials we can determine which configs can be skipped without needing to store the entire history. I don't think this can be made to work with the xgb tuner though (maybe just treat that as a special case?)

comaniac · 2019-10-21T21:23:26Z

@mbarrett97 I see your point. If the problem is narrowed down to "skip some tasks in a model when resuming the tuning that was accidently interrupted", then your proposal is a lightweight working solution. Maybe we can file another RFC focusing on a more general history reuse support.

Then talking back to your proposal, the current solution is using config_library in the log_to_file callback so that it will store configs as well as the trial number for each layer (task). According to your reply, are you going to store the best config for each layer only? I didn't see the corresponding implementation in your PR, tho (please correct me if I missed it). If config library only stores the best one for the sake of tuning performance and disk space, how do I store all configs like now if I prefer? In addition, I don't think storing all explored configs is a problem. I stored the whole history all the time for the research purpose and didn't feel any performance problem. For the disk space, my history.json for 2000 records is about 1.3M. Taking mobilenet v2 for instance, it has 31 tasks in total, meaning 40.3M history file. I think this not a big problem for the modern disk.

mbaret · 2019-10-22T16:18:14Z

@comaniac Having given this some thought, I think it's reasonable to support both approaches. I didn't want to include full logs because I was hoping to also be able to use config library to distribute tuned configs, however it should be fine to just 'export' a config library with only optimal configs.

In that case, I propose the following. Have each auto-tuning session create a new 'job'. This job will have an entry in a JSON file ('job index') containing at least the target string, start/finish time of the job and a path to the history file generated. Optionally we permit some arbitrary JSON to describe the platform in more detail. By default, we delete the history file when a job completes (but keep the job entry in the index), however a flag can be passed to retain the history.

Now if a task needs to be resume, first a simple check can be done to see if the existing optimal config has already been tuned with sufficiently many trials (and with the right tuner/platform). If so, skip, otherwise search the job index to see if any history files qualify to restart the tuning. In that case, we can use your proposal.

kevinthesun · 2019-10-22T17:46:03Z

For local log file management, how about we store the best K schedules for each workload? User can choose how many schedules they would like to keep.

comaniac · 2019-10-22T18:38:33Z

@comaniac Having given this some thought, I think it's reasonable to support both approaches. I didn't want to include full logs because I was hoping to also be able to use config library to distribute tuned configs, however it should be fine to just 'export' a config library with only optimal configs.

Got your point, altough I think you can always pick the best config before distribution like the current AutoTVM use case. Current AutoTVM log all configs to a JSON file, and if a user only wants to keep the best one, she uses autotvm.record.pick_best to generate another small JSON that only contains the best config for each layer (task).

In that case, I propose the following. Have each auto-tuning session create a new 'job'. This job will have an entry in a JSON file ('job index') containing at least the target string, start/finish time of the job and a path to the history file generated. Optionally we permit some arbitrary JSON to describe the platform in more detail. By default, we delete the history file when a job completes (but keep the job entry in the index), however a flag can be passed to retain the history.

If I undertand correctly, you are going to add tuning process metadata to the JSON file in addition to the configs, like the example code snippet you proposed in the very beginning of this RFC. Since you propose to use config library as the "database" to log all configs (the argument of log_to_file callback), you have to make sure the current history.json is still always available with explored configs whenever the task is interrupted. Maybe we can store two JSON files (e.g., task.json and history.json) to make a clear separation?

Another suggestion is naming. ConfigLibrary seems not accurate in this case. To me, it's more like a TaskContext or TaskSession. Also we should use task instead of job to be consistent.

Now if a task needs to be resume, first a simple check can be done to see if the existing optimal config has already been tuned with sufficiently many trials (and with the right tuner/platform). If so, skip, otherwise search the job index to see if any history files qualify to restart the tuning. In that case, we can use your proposal.

Yeah I think this part is relatively clear.

icemelon · 2019-10-24T16:49:41Z

@mbarrett97 I wonder why not just using the transfer learning in the AutoTVM. After using transfer learning, AutoTVM will skip the tasks that have been tried before. See the example at
https://docs.tvm.ai/tutorials/autotvm/tune_relay_arm.html#begin-tuning

mbaret · 2019-10-28T14:06:41Z

@icemelon9 This suggestion is more about infrastructure so that we're not required to keep track of individual log files and how they were produced. We need this to decide whether or not we can skip a task based on existing results.

@comaniac @kevinthesun I've updated the PR to include more concretely the ideas being discussed. I think an auto-tuning 'job' is distinct from a task as I am using it to refer to a series of tasks tuned sequentially (eg. tuning a network would be a 'job'). A JSON file containing all of the jobs is produced which contains information such as the start/finish time of the job, target/platform parameters and importantly the optimal configs for each task in the job. In principle this would allow you to 'revert' an auto-tuning job from the config library if you discovered you'd done something invalid during a job (I've done this a few times...) Keeping the entire history of a job can be controlled by a flag.

I'm hacking one of the tutorial scripts to use the config library mechanism instead, tune_with_config_library.py. For convenience, here's the current tuning loop:

def tune_kernels(tasks,
                 config_library,
                 measure_option,
                 tuner='gridsearch',
                 early_stopping=None,
                 log_filename='tuning.log'):

    with config_library.tune(target):
        for i, tsk in enumerate(tasks):
            prefix = "[Task %2d/%2d] " % (i+1, len(tasks))

            # converting conv2d tasks to conv2d_NCHWc tasks
            op_name = tsk.workload[0]
            if op_name == 'conv2d':
                func_create = 'topi_x86_conv2d_NCHWc'
            elif op_name == 'depthwise_conv2d_nchw':
                func_create = 'topi_x86_depthwise_conv2d_NCHWc_from_nchw'
            else:
                raise ValueError("Tuning {} is not supported on x86".format(op_name))

            task = autotvm.task.create(func_create, args=tsk.args,
                                       target=target, template_key='direct')
            task.workload = tsk.workload

            # create tuner
            if tuner == 'xgb' or tuner == 'xgb-rank':
                tuner_obj = XGBTuner(task, loss_type='rank')
            elif tuner == 'ga':
                tuner_obj = GATuner(task, pop_size=50)
            elif tuner == 'random':
                tuner_obj = RandomTuner(task)
            elif tuner == 'gridsearch':
                tuner_obj = GridSearchTuner(task)
            else:
                raise ValueError("Invalid tuner: " + tuner)

            # do tuning
            n_trial=10
            tuner_obj.tune(
                n_trial=n_trial,
                early_stopping=early_stopping,
                measure_option=measure_option,
                config_library=config_library,
                callbacks=[autotvm.callback.progress_bar(n_trial, prefix=prefix)],
            )

comaniac · 2019-10-29T04:37:48Z

Some comments after reading the example and the current PR.

The APIs are still confusing to me. I agree with the job part but not others.
config_library still doesn't look like a "library". It's more like a job manager according to your proposal. The use case config_library.tune() is also weird, because we already use tuner.tune for each task. In my personal opinion, something like job_manager.session() would be more reasonable.
I didn't see the features you claimed neither in the PR or the example. Specifically, how to control if I want to record all configs or just the best one? how to resume a job? I don't think the current PR will skip any well-tuned job when resuming. I would suggest making that example more concrete and realistic first before the implementation so that we can all refer to it. I also think it's fine to create a separate tutorial for this feature. In summary, here are the points I wish to see in the tutorial:
How to create a config library and what's the controllable options.
How a config library resumes a job.
How to specify the log file mode (all or the best).
How to apply the log file generated by the config library to the rest building process.

mbaret · 2019-11-12T16:52:33Z

@comaniac I've done some refactoring to disentangle 'TuningJob' from the ConfigLibrary. The tuning loop now looks like this:

def tune_kernels(tasks,
                 n_trial,
                 config_library,
                 measure_option,
                 log_filename='tuning.log'):

    # Create a tuning job and point it at a config library
    job = TuningJob(
        log_filename,
        target,
        config_library=config_library,
    )
    # Use the tuning job during the tuning loop
    with job:
        for i, tsk in enumerate(tasks):
            prefix = "[Task %2d/%2d] " % (i+1, len(tasks))

            # Convert conv2d tasks to conv2d_NCHWc tasks
            task = autotvm.task.create("topi_x86_conv2d_NCHWc", args=tsk.args,
                                       target=target, template_key='direct')
            task.workload = tsk.workload

            # Create tuner
            tuner_obj = GridSearchTuner(task)

            # Do tuning - the tuner will skip tasks which have already been tuned
            # in the config library
            tuner_obj.tune(
                n_trial=n_trial,
                early_stopping=n_trial,
                measure_option=measure_option,
                callbacks=[autotvm.callback.progress_bar(n_trial, prefix=prefix)],
            )

Using 'with job' puts the job into the global tuning scope. The job will then automatically register it's own callback to the tuners and a new tuner method load_library is called if the TuningJob has a ConfigLibrary attached. This is where the resume logic can be implemented. Currently I have only implemented the basic logic to skip completed tasks, but it should be possible to implement more advanced resume logic fairly easily as you have access to the full ConfigLibrary.

If you don't specify a ConfigLibrary with a job, it will just log all the results to the specified log file.

Config files are indexed within the library by target, so to use configs from the library you can simply do with config_library.load(target):. This just returns ApplyHistoryBest, it doesn't implement a new DispatchContext.

I've updated my PR (#4151) accordingly. Note that the PR does not include every feature discussed here but is intended as initial infrastructure on top of which more advanced features can be developed.

comaniac · 2019-11-12T18:21:55Z

I went through the new proposal and the PR. This looks much better to me from the perspecitive of functionality.

One concern in my mind is the long term maintaince. It seems like we will have more and more new features dealing with a set of tasks. As @tqchen mentioned in another RFC, it might be better to make a task pass manager to manage such processes, but we should be able to integrate this one to the task pass manager later on once we have it ready.

@eqy @kevinthesun do you guys have any other concerns?

mbaret mentioned this issue Oct 18, 2019

[AutoTVM] Added an autotuning Config Library to store autotune results #4151

Closed

tqchen added the status: RFC label Oct 18, 2019

mbaret closed this as completed Jan 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] [AutoTVM] Implementing an auto-tuning library/cache #4150

[RFC] [AutoTVM] Implementing an auto-tuning library/cache #4150

mbaret commented Oct 18, 2019 •

edited

kevinthesun commented Oct 18, 2019 •

edited

comaniac commented Oct 18, 2019 •

edited

mbaret commented Oct 21, 2019

comaniac commented Oct 21, 2019

tqchen commented Oct 21, 2019

mbaret commented Oct 21, 2019

comaniac commented Oct 21, 2019

mbaret commented Oct 22, 2019

kevinthesun commented Oct 22, 2019

comaniac commented Oct 22, 2019

icemelon commented Oct 24, 2019

mbaret commented Oct 28, 2019

comaniac commented Oct 29, 2019

mbaret commented Nov 12, 2019

comaniac commented Nov 12, 2019

[RFC] [AutoTVM] Implementing an auto-tuning library/cache #4150

[RFC] [AutoTVM] Implementing an auto-tuning library/cache #4150

Comments

mbaret commented Oct 18, 2019 • edited

Proposals

Additional Thoughts

Implementation

kevinthesun commented Oct 18, 2019 • edited

comaniac commented Oct 18, 2019 • edited

mbaret commented Oct 21, 2019

comaniac commented Oct 21, 2019

tqchen commented Oct 21, 2019

mbaret commented Oct 21, 2019

comaniac commented Oct 21, 2019

mbaret commented Oct 22, 2019

kevinthesun commented Oct 22, 2019

comaniac commented Oct 22, 2019

icemelon commented Oct 24, 2019

mbaret commented Oct 28, 2019

comaniac commented Oct 29, 2019

mbaret commented Nov 12, 2019

comaniac commented Nov 12, 2019

mbaret commented Oct 18, 2019 •

edited

kevinthesun commented Oct 18, 2019 •

edited

comaniac commented Oct 18, 2019 •

edited