Convert run_with_beaker.py to use Blueprints. #1199
Conversation
A few breaking changes are required for this change. In addition, instead of running the experiment an experiment spec is output.
Can we not change the behavior of this script? I use it frequently, and adding another step to running an experiment is annoying. I don't really care about blueprints vs. images, so that part of the change is fine with me, but not cluttering my file system with unnecessary files, or making me use two commands where I used to only need one. If you really want to output a file, could you make that a separate script, or a non-default option to this one? |
@matt-gardner I can mostly leave the script unchanged--although I'm going to remove some commands since they are deprecated and reportedly don't work. One nice thing about the intermediate file is it gives you full control to be able to tweak your experiment. Without using the configuration, we need to keep adding arguments when we want additional functionality. Beaker is deprecating the run command, so some things might be challenging to tune via run. But if that's undesirable to the people who actually use this script, I can |
Sure, if there are arguments that are broken, by all means, great, let's fix them. Adding arguments for additional functionality is rare and a simple PR, and making that PR is much more desirable than adding a step for every invocation of this script. I submitted PRs just to shave a few seconds off of each invocation, if you remember ;). |
To clarify: Beaker is not deprecating the |
print(f"Using the specified blueprint: {blueprint}") | ||
else: | ||
print(f"Building the Docker image ({image})...") | ||
subprocess.run(f'docker build -t {image} .', shell=True, check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern looks ideal. 👍
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser() | ||
|
||
parser.add_argument('param_file', type=str, help='The model configuration file.') | ||
parser.add_argument('--name', type=str, help='A name for the experiment.') | ||
parser.add_argument('output_path', type=str, help='The destination to write the experiment spec.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matt-gardner makes a good point about convenience here. I'd recommend a slightly different pattern here. Drop output_path
and instead echo the spec to STDOUT, so it can be piped into Beaker directly, like so:
./run_with_beaker.py <args...> | beaker experiment create --file=-
You can also switch on whether the spec is automatically run. One possible mechanism would be to offer a --dry-run
flag. In this flag's absence, invoke Beaker directly. If --dry-run
is present, then just echo the spec so the user can redirect to file and/or Beaker on their own.
Another option is to print the command you would invoke, and invoke this script with exec $(run_with_beaker.py)
. The output might be in the following format:
cat << EOF | beaker experiment create --file=-
# Spec Goes Here...
EOF
Either of these options also implies retaining the --name
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny, I just read this after I added a --dry-run
flag ;-)
I actually personally think it's a good thing to have an extra step, but I'm happy to do what works best for other users (commit coming).
}, | ||
"env": env | ||
} | ||
config_task = {"spec": config_spec} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may consider adding "name": "<some value>"
here. While optional, it would slightly simplify how this displays in the UI by hiding the task ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me, and Beaker usage and experiment spec look correct.
The script now maintains the original IO, but it has an option argument for a blueprint (which skips creating a Docker image), an optional argument to output the spec to a specific location (rather than a temporary location), and a Blueprints don't add much new functionality at this point, but we're ready for when they do and this cleans up and speeds up our script a bit by cutting ECR out of the loop. I understand that ECR is supported by not required. |
from allennlp.common.params import Params | ||
|
||
def main(param_file: str, extra_beaker_commands: List[str]): | ||
ecr_repository = "896129387501.dkr.ecr.us-west-2.amazonaws.com" | ||
def main(param_file: str, args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a type annotation for args
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
subprocess.run(command, check=True) | ||
dataset_mounts = [] | ||
for source in (args.source if args.source else []): | ||
datasetId, containerPath = source.split(":") + "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered what this line did, so I tried it in an interpreter, and it crashed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
print(' '.join(command)) | ||
subprocess.run(command, check=True) | ||
dataset_mounts = [] | ||
for source in (args.source if args.source else []): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add default=[]
to the --source
argument below, so you don't need this if/else
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
"datasetId": datasetId, | ||
"containerPath": containerPath | ||
}) | ||
for source in [f"{config_dataset_id}:/config.json"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do sources = args.source + [f"{config_dataset_id}:/config.json"]
(after getting rid of the if/else block above); then you only have one loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works now that I have default
.
"containerPath": containerPath | ||
}) | ||
|
||
for var in (args.env if args.env else []): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - use default=[]
below instead of this if/else
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
config_task = {"spec": config_spec} | ||
if args.name: | ||
config_task["name"] = args.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're putting the name in two places, where I think it just needs to be in one place. I don't think we want it here, we just want it on the experiment. Unless the beaker UI does something magical if the task name matches the experiment name? @ckarenz?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ckarenz recommended putting it here for the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and my guess is that he missed the fact that it was already given below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ckarenz from our earlier conversations I understood that name
needed to be specified on the command-line, but from some experimentation it seems that I can specify it on the command line or in the configuration.
Interestingly, if I specify it on the command-line then it isn't part of the spec in the Beaker UI, but if I specify it in the configuration it is.
What is the best practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I looked at your beaker experiments, and the way you have it seems fine. Beaker has "experiments" and "tasks", where an "experiment" may contain several different "tasks". We only ever use a single "task" for our "experiments", so this distinction is a bit meaningless for us. Giving both the task and the experiment the same name seems fine.
If you look at your two most recent experiments, this one has both the experiment and the task name set to "food", while this one just has the experiment name set to "foo", with an empty task name, so the task shows up as an id. I don't have a strong preference between the two, either way is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think two different names have been conflated here. First we should clarify that there are two different concepts in play. A task is a minimal unit of work, or single process which is run once. An experiment is a graph of one or more tasks, tying them together with their dependencies. This script creates a single experiment containing a single task.
The name supplied to experiment create
on the command line is the experiment name. It must be unique and therefore doesn't make sense to put in the spec; you can't use the same name twice without deleting your prior experiment or renaming it.
The name supplied in the file is the name of the task within that experiment. It must be unique within the experiment, but is not globally unique. It's used to link dependencies within the spec. When paired with an experiment name/ID it also serves as a human-readable substitute for the task's unique ID.
For your purposes the task name should probably remain constant, and be short.
experiment_command.append(args.name) | ||
|
||
if args.dry_run: | ||
print(f"This is a dry run (--dry-run). Launch your job with the following command") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/command/command:/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
output_path = args.spec_output_path if args.spec_output_path else tempfile.mkstemp(".yaml", | ||
"beaker-config")[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be beaker-config-
, to separate the random numbers from "config".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
extra_beaker_commands.append(f'--desc={args.desc}') | ||
if args.name: | ||
# Remove spaces from the name, because Beaker doesn't allow them. | ||
extra_beaker_commands.append(f'--name={args.name.replace(" ", "-")}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't change this behaviour, it is convenient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies--the change was a mistake (I deleted the name argument and then re-added it).
A few breaking changes are required for this change. In addition, instead of running the experiment an experiment spec is output. This behavior can be changed if it's not desired, but I find it to be an improvement because it gives the user the opportunity to add additional flags that this script may not support.