-
Notifications
You must be signed in to change notification settings - Fork 0
In 1315 create item db model #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In 1315 create item db model #179
Conversation
| "PLR0913", | ||
| "PLR0915", | ||
| "PTH", | ||
| "S320", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ruff removed this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they removed a bunch so I had to remove several #noqas in s3-bagit-validator, something for all of us to keep an eye out for as we update dependencies in various repos!
| monkeypatch.setenv("SENTRY_DSN", "None") | ||
| monkeypatch.setenv("WORKSPACE", "test") | ||
| monkeypatch.setenv("AWS_REGION_NAME", "us-east-1") | ||
| monkeypatch.setenv("AWS_DEFAULT_REGION", "us-east-1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some AWS services use AWS_REGION_NAME and others use AWS_DEFAULT_REGION to set default region values. When running DSC in AWS, it will use the appropriate env var, which is why I didn't feel it was necessary to add AWS_DEFAULT_REGION to dsc.config.Config. 🤔
Why these changes are being introduced: * To establish idempotency for DSC workflows, it was decided that a DynamoDB table would be used to track the state of an "item" during workflow executions. The dsc.db.item module defines a PynamoDB model, which serves as a Pythonic interface to interact with the proposed DynamoDB table. How this addresses that need: * Define 'ItemDB' model using pynamodb.models.Model Side effects of this change: * DSC CLI commands will need to be updated to include required read/write operations to DynamoDB table. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1315
83ab7ff to
bdb924a
Compare
Pull Request Test Coverage Report for Build 15880254053Details
💛 - Coveralls |
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A few comments
| "PLR0913", | ||
| "PLR0915", | ||
| "PTH", | ||
| "S320", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they removed a bunch so I had to remove several #noqas in s3-bagit-validator, something for all of us to keep an eye out for as we update dependencies in various repos!
dsc/db/item.py
Outdated
| submit_attempts = NumberAttribute(default_for_new=0) | ||
| ingest_attempts = NumberAttribute(default_for_new=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere we should be specific about the CLI command connection between finalize and ingest_attempts given the differing vocabulary. Maybe just README.md? And submit_attempts may not really end up ever mattering but I guess it doesn't hurt to include it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted! I think this can also be explained visa updated docstrings for the base Workflow's submit and process_ingest_result methods as well.
dsc/db/item.py
Outdated
| cls.Meta.table_name = table_name | ||
|
|
||
| @classmethod | ||
| def create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe create_submission or create_submission_row to be really specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if we end up renaming the pynamodb model to include the phrase 'submission', I would opt to keep the method as create for now. 🤔 The naming convention also stems from the standard set of CRUD operations.
dsc/db/item.py
Outdated
| condition=cls.item_identifier.does_not_exist() | ||
| & cls.batch_id.does_not_exist() | ||
| ) | ||
| except PutError as exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only raised in the case of duplicates or could a PutError be raised for different reasons? This seems really specific to just duplicates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a great question, I'm curious too.
If it's not specific to only duplicates, maybe there is something in the PutError that can be used to confirm the exception is related to duplicates and thus re-raising a custom ValueError makes sense?
Might also be worth defining a custom exception like ItemExistsError (or SubmissionExistsError if going that naming route) which could be raised. That would be unambiguous downstream what's happening if encountered, where ValueError could be lots of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that when DSC first writes an item submission to the DynamoDB table, the calling method (the reconcile command) would call ItemDB.create(). However, for updates to rows in the table, the built-in pynamodb.models.Model.update method would be used instead (by submit and finalize commands).
@ehanson8 Yes, this will be raised in the case of duplicates, but a PutError can be raised for any failing conditions.
I like the idea proposed by @ghukill to define a custom exception (db.exceptions) and can make the changes once we decide on the name of the "item" pynamodb model! 🤓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been addressed in the latest commit.
ghukill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments and questions, but overall looking good to me.
tests/test_db.py
Outdated
| assert fetched_item.workflow_name == "workflow" | ||
|
|
||
|
|
||
| def test_db_item_create_if_hash_key_and_range_key_exist_raise_error(mocked_item_db): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might propose a test rename here, something like:
test_db_item_create_duplicate_item_identifier_and_batch_id_raise_error()I think the item_identifier being the primary key and batch_id being the range key feel like implemenation details, but this test is kind of getting at identifer + batch is what makes something unique in the database.
Normally I'm pretty meh on test names, but this jumped out at me. Perhaps in a test docstring you could point out that item_identifier --> primary key and batch_id --> range key? and that the combination of the two is what gaurantees uniqueness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, since we've renamed the DynamoDB model, the test can now be named:
test_db_itemsubmission_create_if_exists_raise_error()What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!
dsc/db/item.py
Outdated
| condition=cls.item_identifier.does_not_exist() | ||
| & cls.batch_id.does_not_exist() | ||
| ) | ||
| except PutError as exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a great question, I'm curious too.
If it's not specific to only duplicates, maybe there is something in the PutError that can be used to confirm the exception is related to duplicates and thus re-raising a custom ValueError makes sense?
Might also be worth defining a custom exception like ItemExistsError (or SubmissionExistsError if going that naming route) which could be raised. That would be unambiguous downstream what's happening if encountered, where ValueError could be lots of things.
* Add detailed docstrings to ItemSubmissionDB * Rename DynamoDB model to 'ItemSubmissionDB' * Rename unit test * Raise custom exception when PutError is caused by 'ConditionalCheckFailedException'
| workflow_name: str, | ||
| **attributes: Unpack[OptionalItemAttributes], | ||
| ) -> None: | ||
| """Create a new item (row) in the 'dsc-item-submissions' table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"item" as in "DynamoDB item"
| item_identifier = UnicodeAttribute(range_key=True) | ||
| workflow_name = UnicodeAttribute() | ||
| dspace_handle = UnicodeAttribute(null=True) | ||
| status = UnicodeAttribute(null=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting for posterity what you shared @jonavellecuerdo, that pynamodb does not support an enum or list of accepted values. I think that's okay.
| class ItemSubmissionStatus(StrEnum): | ||
| RECONCILE_SUCCESS = "reconcile_success" | ||
| RECONCILE_FAILED = "reconcile_failed" | ||
| SUBMIT_SUCCESS = "submit_success" | ||
| SUBMIT_FAILED = "submit_failed" | ||
| SUBMIT_MAX_RETRIES_REACHED = "submit_max_retries_reached" | ||
| INGEST_SUCCESS = "ingest_success" | ||
| INGEST_FAILED = "ingest_failed" | ||
| INGEST_UNKNOWN = "ingest_unknown" | ||
| INGEST_MAX_RETRIES_REACHED = "ingest_max_retries_reached" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to defining the allowed statuses here. Do you have plans to use this elsewhere in code? Perhaps in the business logic when a particular status will get set?
Even if this commit / PR doesn't do that, I think this enum sets us up nicely for that.
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small docstring fix requested but otherwise this is great!
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Purpose and background context
DSC requires a way to interact with a DynamoDB table for tracking the state of an "item" during workflow executions. The
dsc.db.itemmodule defines a PynamoDB model, which serves as a Pythonic interface to interact with a DynamoDB table.How can a reviewer manually see the effects of these changes?
Review the added unit tests in
test/test_db.py.Includes new or updated dependencies?
YES - Adds
pynamodbto packages.Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)