-
Notifications
You must be signed in to change notification settings - Fork 0
Add CLI command for batch creation that integrates data syncing #203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: * A CLI command is required to support the new batch creation step for DSC. These changes also connect the batch creation step with the syncing process: * Workflow.create_batch can skip data retrieval (e.g. API requests) when '--sync-data' is provided in the command args * The 'create' CLI command can invoke the 'sync' command when '--sync-data' is provided in the command args How this addresses that need: * Add 'create' CLI command * Add 'synced' boolean keyword argument to batch creation methods * Add unit tests Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1456
| def prepare_batch( | ||
| self, | ||
| *, | ||
| synced: bool = False, # noqa: ARG002 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For workflows like Wiley, the synced boolean can be used to skip data retrieval steps if the data has been synced (e.g., in Prod, the batch "folder" and its contents will be pulled from Stage).
Also, note that while SimpleCSV _does not use the synced argument, the name is not leading with an underscore. This is because mypy will raise an error about the mismatched parameter names between the @abstractmethod and the implemented, and it's suspected that this is because the parameter is a keyword argument. 🤔 Ruff error [ARG002](https://docs.astral.sh/ruff/rules/unused-method-argument/) will need to be skipped as a result.
TLDR: Either mypy or Ruff would have an issue. Decided to ignore Ruff's error.
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments but looking good!
dsc/cli.py
Outdated
| @click.option( | ||
| "-s", | ||
| "--source", | ||
| help=( | ||
| "Source directory formatted as a local filesystem path or " | ||
| "an S3 URI in s3://bucket/prefix form" | ||
| ), | ||
| ) | ||
| @click.option( | ||
| "-d", | ||
| "--destination", | ||
| help=( | ||
| "Destination directory formatted as a local filesystem path or " | ||
| "an S3 URI in s3://bucket/prefix form" | ||
| ), | ||
| ) | ||
| @click.option( | ||
| "--dry-run", | ||
| is_flag=True, | ||
| help=( | ||
| "Display the operations that would be performed using the " | ||
| "sync command without actually running them" | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps these should be prepended with sync_ for greater clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @ehanson8 here (or sync- if we're going with dashes in CLI args).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sync-!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the signature of the create CLI command as follows:
@main.command()
@click.pass_context
@click.option("--sync-data/--no-sync-data", default=False)
@click.option(
"--sync-dry-run",
is_flag=True,
help=(
"Display the operations that would be performed using the "
"sync command without actually running them"
),
)
@click.option(
"-s",
"--sync-source",
help=(
"Source directory formatted as a local filesystem path or "
"an S3 URI in s3://bucket/prefix form"
),
)
@click.option(
"-d",
"--sync-destination",
help=(
"Destination directory formatted as a local filesystem path or "
"an S3 URI in s3://bucket/prefix form"
),
)
@click.option(
"-e",
"--email-recipients",
help="The recipients of the batch creation results email as a comma-delimited string",
default=None,
)
def create(
ctx: click.Context,
*,
sync_data: bool = False,
sync_dry_run: bool = False,
sync_source: str | None = None,
sync_destination: str | None = None,
email_recipients: str | None = None,
) -> None:Note: By moving the asterisk (*) to after ctx: click.Context, the args can be arranged in a more logical order (but it does mean if create is ever called outside of the pipenv command or not as a CLI, user must provide all params as keyword arguments (doesn't seem like a bad call given the number of params anyway).
ghukill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A great start. I left a couple of higher level questions, and can do a secondary pass after discussions there.
dsc/cli.py
Outdated
| if sync_data: | ||
| ctx.invoke(sync, source, destination, dry_run=dry_run) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we tried this locally? I'm a little concerned that sys.exit(return_code) line in the sync CLI command may terminate the parent python process and therefore not make it past this step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally and related, what happens if the sync subcommand has an error? how do we know and how is it handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Tagging @ehanson8 for awareness)
This was a great question and led to the following changes:
- Add a
try-exceptblock increateCLI command - Switching use of
sys.exittoctx.exit- As of
click >= 8.2,ctx.exitraises a custom exception,click.exceptions.Exit, whereas before it would raise aSystemExiterror. This was an important learning as it took me a second to figure out the right exception to handle with the addedtry-exceptblock. - Not 100% sure of the differences but it is the recommended way to cleanly exit out of Click commands.
- As of
- Added unit test demonstrating that the
createCLI command will cleanly exit when invoking thesynccommand raises an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loving it! Thanks for the deep dive on this and the explanation.
Pull Request Test Coverage Report for Build 18502412282Details
💛 - Coveralls |
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, thanks for the changes!
|
|
||
| if sync_data: | ||
| ctx.invoke(sync, source, destination, dry_run=dry_run) | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
ghukill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Purpose and background context
A CLI command is required to support the new batch creation step for DSC. These changes also connect the batch creation step with the syncing process:
Workflow.create_batchcan skip data retrieval (e.g. API requests) when--sync-datais provided in the command argscreateCLI command can invoke the 'sync' command when--sync-datais provided in the command argsA couple thoughts:
False.DevandStage, syncing is not expected.*For
Prod, syncing the batch data from the S3 bucket in Stage is expected.syncedboolean to the batch creation methods allows DSC to skip unnecessary reads to S3 altogether.How can a reviewer manually see the effects of these changes?
Review of added unit tests is sufficient for this PR
Includes new or updated dependencies?
NO
Changes expectations for external applications?
YES - This will require changes to the DSC step function. By enabling the
createCLI command to invoke thesyncCLI command, I expect this will allow us to simplify the logic of our step function and consolidate logs for both commands in a single CloudWatch logstream. For context, the latest version of the step function we were proposing is outlined in jc-test-dsc-with-sync-dev. Ticket IN-1457 will address updates to the step function.What are the relevant tickets?