Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Convert run_with_beaker.py to use Blueprints. #1199

Merged
merged 10 commits into from May 31, 2018
Merged

Conversation

schmmd
Copy link
Member

@schmmd schmmd commented May 11, 2018

A few breaking changes are required for this change. In addition, instead of running the experiment an experiment spec is output. This behavior can be changed if it's not desired, but I find it to be an improvement because it gives the user the opportunity to add additional flags that this script may not support.

A few breaking changes are required for this change.
In addition, instead of running the experiment an experiment spec is
output.
@schmmd schmmd requested a review from ckarenz May 11, 2018 22:17
@matt-gardner
Copy link
Contributor

Can we not change the behavior of this script? I use it frequently, and adding another step to running an experiment is annoying. I don't really care about blueprints vs. images, so that part of the change is fine with me, but not cluttering my file system with unnecessary files, or making me use two commands where I used to only need one. If you really want to output a file, could you make that a separate script, or a non-default option to this one?

@schmmd
Copy link
Member Author

schmmd commented May 14, 2018

@matt-gardner I can mostly leave the script unchanged--although I'm going to remove some commands since they are deprecated and reportedly don't work.

One nice thing about the intermediate file is it gives you full control to be able to tweak your experiment. Without using the configuration, we need to keep adding arguments when we want additional functionality. Beaker is deprecating the run command, so some things might be challenging to tune via run.

But if that's undesirable to the people who actually use this script, I can cat the configuration directly to beaker.

@matt-gardner
Copy link
Contributor

Sure, if there are arguments that are broken, by all means, great, let's fix them. Adding arguments for additional functionality is rare and a simple PR, and making that PR is much more desirable than adding a step for every invocation of this script. I submitted PRs just to shave a few seconds off of each invocation, if you remember ;).

@ckarenz
Copy link

ckarenz commented May 14, 2018

To clarify: Beaker is not deprecating the experiment run command. We are transitioning the --file argument over to a new experiment create command. Create is the full-featured command which allows for arbitrarily complex (or simple) spec files. Run offers a simplified subset of Create by replacing the spec file with some short-hand arguments, and is purely for convenience.

print(f"Using the specified blueprint: {blueprint}")
else:
print(f"Building the Docker image ({image})...")
subprocess.run(f'docker build -t {image} .', shell=True, check=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern looks ideal. 👍


if __name__ == "__main__":
parser = argparse.ArgumentParser()

parser.add_argument('param_file', type=str, help='The model configuration file.')
parser.add_argument('--name', type=str, help='A name for the experiment.')
parser.add_argument('output_path', type=str, help='The destination to write the experiment spec.')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt-gardner makes a good point about convenience here. I'd recommend a slightly different pattern here. Drop output_path and instead echo the spec to STDOUT, so it can be piped into Beaker directly, like so:

./run_with_beaker.py <args...> | beaker experiment create --file=-

You can also switch on whether the spec is automatically run. One possible mechanism would be to offer a --dry-run flag. In this flag's absence, invoke Beaker directly. If --dry-run is present, then just echo the spec so the user can redirect to file and/or Beaker on their own.

Another option is to print the command you would invoke, and invoke this script with exec $(run_with_beaker.py). The output might be in the following format:

cat << EOF | beaker experiment create --file=-
# Spec Goes Here...
EOF

Either of these options also implies retaining the --name flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny, I just read this after I added a --dry-run flag ;-)

I actually personally think it's a good thing to have an extra step, but I'm happy to do what works best for other users (commit coming).

},
"env": env
}
config_task = {"spec": config_spec}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may consider adding "name": "<some value>" here. While optional, it would slightly simplify how this displays in the UI by hiding the task ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@ckarenz ckarenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me, and Beaker usage and experiment spec look correct.

@schmmd
Copy link
Member Author

schmmd commented May 14, 2018

The script now maintains the original IO, but it has an option argument for a blueprint (which skips creating a Docker image), an optional argument to output the spec to a specific location (rather than a temporary location), and a --dry-run flag.

Blueprints don't add much new functionality at this point, but we're ready for when they do and this cleans up and speeds up our script a bit by cutting ECR out of the loop. I understand that ECR is supported by not required.

from allennlp.common.params import Params

def main(param_file: str, extra_beaker_commands: List[str]):
ecr_repository = "896129387501.dkr.ecr.us-west-2.amazonaws.com"
def main(param_file: str, args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a type annotation for args?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

subprocess.run(command, check=True)
dataset_mounts = []
for source in (args.source if args.source else []):
datasetId, containerPath = source.split(":") + "\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered what this line did, so I tried it in an interpreter, and it crashed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

print(' '.join(command))
subprocess.run(command, check=True)
dataset_mounts = []
for source in (args.source if args.source else []):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add default=[] to the --source argument below, so you don't need this if/else block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"datasetId": datasetId,
"containerPath": containerPath
})
for source in [f"{config_dataset_id}:/config.json"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do sources = args.source + [f"{config_dataset_id}:/config.json"] (after getting rid of the if/else block above); then you only have one loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works now that I have default.

"containerPath": containerPath
})

for var in (args.env if args.env else []):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - use default=[] below instead of this if/else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
config_task = {"spec": config_spec}
if args.name:
config_task["name"] = args.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're putting the name in two places, where I think it just needs to be in one place. I don't think we want it here, we just want it on the experiment. Unless the beaker UI does something magical if the task name matches the experiment name? @ckarenz?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckarenz recommended putting it here for the UI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and my guess is that he missed the fact that it was already given below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckarenz from our earlier conversations I understood that name needed to be specified on the command-line, but from some experimentation it seems that I can specify it on the command line or in the configuration.

Interestingly, if I specify it on the command-line then it isn't part of the spec in the Beaker UI, but if I specify it in the configuration it is.

What is the best practice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I looked at your beaker experiments, and the way you have it seems fine. Beaker has "experiments" and "tasks", where an "experiment" may contain several different "tasks". We only ever use a single "task" for our "experiments", so this distinction is a bit meaningless for us. Giving both the task and the experiment the same name seems fine.

If you look at your two most recent experiments, this one has both the experiment and the task name set to "food", while this one just has the experiment name set to "foo", with an empty task name, so the task shows up as an id. I don't have a strong preference between the two, either way is fine.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think two different names have been conflated here. First we should clarify that there are two different concepts in play. A task is a minimal unit of work, or single process which is run once. An experiment is a graph of one or more tasks, tying them together with their dependencies. This script creates a single experiment containing a single task.

The name supplied to experiment create on the command line is the experiment name. It must be unique and therefore doesn't make sense to put in the spec; you can't use the same name twice without deleting your prior experiment or renaming it.

The name supplied in the file is the name of the task within that experiment. It must be unique within the experiment, but is not globally unique. It's used to link dependencies within the spec. When paired with an experiment name/ID it also serves as a human-readable substitute for the task's unique ID.

For your purposes the task name should probably remain constant, and be short.

experiment_command.append(args.name)

if args.dry_run:
print(f"This is a dry run (--dry-run). Launch your job with the following command")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/command/command:/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

output_path = args.spec_output_path if args.spec_output_path else tempfile.mkstemp(".yaml",
"beaker-config")[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be beaker-config-, to separate the random numbers from "config".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

extra_beaker_commands.append(f'--desc={args.desc}')
if args.name:
# Remove spaces from the name, because Beaker doesn't allow them.
extra_beaker_commands.append(f'--name={args.name.replace(" ", "-")}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change this behaviour, it is convenient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies--the change was a mistake (I deleted the name argument and then re-added it).

@schmmd schmmd merged commit 8eb358b into allenai:master May 31, 2018
@schmmd schmmd deleted the blueprints branch May 31, 2018 22:26
gabrielStanovsky pushed a commit to gabrielStanovsky/allennlp that referenced this pull request Sep 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants