Skip to content

Commit

Permalink
Update command output
Browse files Browse the repository at this point in the history
Why these changes are being introduced:
Recent changes in oai-pmh-harvester and transmogrifier require some
updated command output for those steps.

How this addresses that need:
* For the extract step, updates the command output to use the "get" harvest method for dspace (in addition to aspace).
* For the transform step, removes the --verbose flag from the command output so we log at info level instead of debug.
* Updates tests to reflect changes.
* Updates README to reflect changes.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-189
* https://mitlibraries.atlassian.net/browse/TIMX-193
  • Loading branch information
hakbailey committed Mar 2, 2023
1 parent 1844dfb commit f3ce819
Show file tree
Hide file tree
Showing 5 changed files with 5 additions and 12 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Takes input JSON (usually from EventBridge although it can be passed to a manual
- `full`: Perform a full harvest of all records from the provided `oai-pmh-host`. During load, create a new OpenSearch index, load all records into it, and then promote the new index.
- `daily`: Harvest only records added to or updated in the provided `oai-pmh-host` since the previous calendar day. Previous day is relative to the provided `run-date` field date, *not* the date this process is run, although those will be equivalent in most cases. During load, index/delete records into the current production OpenSearch index for the source.
- `source`: Short name for the source repository, must match one of the source names configured for use in transform and load apps. The provided source is passed to the transform and load app CLI commands, and is also used in the input/output file naming scheme for all steps of the pipeline.
- *Note*: if provided source is "aspace", a method option is passed to the harvest command (if starting at the extract step) to ensure that we use the "get" harvest method instead of the default "list" method used for all other sources. This is required because ArchivesSpace inexplicably provides incomplete oai-pmh responses using the "list" method.
- *Note*: if provided source is "aspace" or "dspace", a method option is passed to the harvest command (if starting at the extract step) to ensure that we use the "get" harvest method instead of the default "list" method used for all other sources. This is required because ArchivesSpace inexplicably provides incomplete oai-pmh responses using the "list" method and DSpace@MIT needs to skip some records by ID, which can only be done using the "get" method.

#### Required OAI-PMH harvest fields

Expand Down
5 changes: 1 addition & 4 deletions lambdas/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def generate_extract_command(
extract_command.append("harvest")
extract_command.append(f"--metadata-format={input_data['oai-metadata-format']}")

if source == "aspace":
if source in ["aspace", "dspace"]:
extract_command.append("--method=get")

if set_spec := input_data.get("oai-set-spec"):
Expand All @@ -53,7 +53,6 @@ def generate_transform_commands(
input_data: dict,
run_date: str,
timdex_bucket: str,
verbose: bool,
) -> dict:
"""Generate task run command for TIMDEX transform."""
files_to_transform: list[dict] = []
Expand All @@ -76,8 +75,6 @@ def generate_transform_commands(
f"--output-file=s3://{timdex_bucket}/{transform_output_file}",
f"--source={source}",
]
if verbose:
transform_command.append("--verbose")

files_to_transform.append({"transform-command": transform_command})

Expand Down
2 changes: 1 addition & 1 deletion lambdas/format_input.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def lambda_handler(event: dict, context: dict) -> dict: # noqa
)
result["next-step"] = "load"
result["transform"] = commands.generate_transform_commands(
extract_output_files, event, run_date, timdex_bucket, verbose
extract_output_files, event, run_date, timdex_bucket
)

elif next_step == "load":
Expand Down
7 changes: 2 additions & 5 deletions tests/test_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def test_generate_transform_commands_required_input_fields():
"testsource/testsource-2022-01-02-full-extracted-records-to-index.xml"
]
assert commands.generate_transform_commands(
extract_output_files, input_data, "2022-01-02", "test-timdex-bucket", False
extract_output_files, input_data, "2022-01-02", "test-timdex-bucket"
) == {
"files-to-transform": [
{
Expand Down Expand Up @@ -94,7 +94,7 @@ def test_generate_transform_commands_all_input_fields():
"testsource/testsource-2022-01-02-daily-extracted-records-to-delete.xml",
]
assert commands.generate_transform_commands(
extract_output_files, input_data, "2022-01-02", "test-timdex-bucket", True
extract_output_files, input_data, "2022-01-02", "test-timdex-bucket"
) == {
"files-to-transform": [
{
Expand All @@ -104,7 +104,6 @@ def test_generate_transform_commands_all_input_fields():
"--output-file=s3://test-timdex-bucket/testsource/"
"testsource-2022-01-02-daily-transformed-records-to-index_01.json",
"--source=testsource",
"--verbose",
]
},
{
Expand All @@ -114,7 +113,6 @@ def test_generate_transform_commands_all_input_fields():
"--output-file=s3://test-timdex-bucket/testsource/"
"testsource-2022-01-02-daily-transformed-records-to-index_02.json",
"--source=testsource",
"--verbose",
]
},
{
Expand All @@ -124,7 +122,6 @@ def test_generate_transform_commands_all_input_fields():
"--output-file=s3://test-timdex-bucket/testsource/"
"testsource-2022-01-02-daily-transformed-records-to-delete.txt",
"--source=testsource",
"--verbose",
]
},
]
Expand Down
1 change: 0 additions & 1 deletion tests/test_format_input.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,6 @@ def test_lambda_handler_with_next_step_transform_files_present(s3_client):
"--output-file=s3://test-timdex-bucket/testsource/"
"testsource-2022-01-02-daily-transformed-records-to-index.json",
"--source=testsource",
"--verbose",
]
}
]
Expand Down

0 comments on commit f3ce819

Please sign in to comment.