Skip to content

Conversation

@jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Jun 23, 2025

Purpose and background context

DSC requires a way to interact with a DynamoDB table for tracking the state of an "item" during workflow executions. The dsc.db.item module defines a PynamoDB model, which serves as a Pythonic interface to interact with a DynamoDB table.

How can a reviewer manually see the effects of these changes?

Review the added unit tests in test/test_db.py.

Includes new or updated dependencies?

YES - Adds pynamodb to packages.

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

"PLR0913",
"PLR0915",
"PTH",
"S320",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruff removed this rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they removed a bunch so I had to remove several #noqas in s3-bagit-validator, something for all of us to keep an eye out for as we update dependencies in various repos!

monkeypatch.setenv("SENTRY_DSN", "None")
monkeypatch.setenv("WORKSPACE", "test")
monkeypatch.setenv("AWS_REGION_NAME", "us-east-1")
monkeypatch.setenv("AWS_DEFAULT_REGION", "us-east-1")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some AWS services use AWS_REGION_NAME and others use AWS_DEFAULT_REGION to set default region values. When running DSC in AWS, it will use the appropriate env var, which is why I didn't feel it was necessary to add AWS_DEFAULT_REGION to dsc.config.Config. 🤔

Why these changes are being introduced:
* To establish idempotency for DSC workflows, it was
decided that a DynamoDB table would be used to track the
state of an "item" during workflow executions.
The dsc.db.item module defines a PynamoDB model,
which serves as a Pythonic interface to interact with
the proposed DynamoDB table.

How this addresses that need:
* Define 'ItemDB' model using pynamodb.models.Model

Side effects of this change:
* DSC CLI commands will need to be updated to include
required read/write operations to DynamoDB table.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1315
@jonavellecuerdo jonavellecuerdo force-pushed the IN-1315-create-item-db-model branch from 83ab7ff to bdb924a Compare June 23, 2025 15:10
@coveralls
Copy link

coveralls commented Jun 23, 2025

Pull Request Test Coverage Report for Build 15880254053

Details

  • 55 of 61 (90.16%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.4%) to 95.313%

Changes Missing Coverage Covered Lines Changed/Added Lines %
dsc/db/models.py 51 53 96.23%
dsc/config.py 2 6 33.33%
Totals Coverage Status
Change from base Build 15834496448: -0.4%
Covered Lines: 793
Relevant Lines: 832

💛 - Coveralls

@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review June 23, 2025 15:14
@jonavellecuerdo jonavellecuerdo requested a review from a team as a code owner June 23, 2025 15:14
@ehanson8 ehanson8 self-assigned this Jun 23, 2025
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A few comments

"PLR0913",
"PLR0915",
"PTH",
"S320",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they removed a bunch so I had to remove several #noqas in s3-bagit-validator, something for all of us to keep an eye out for as we update dependencies in various repos!

dsc/db/item.py Outdated
Comment on lines 41 to 42
submit_attempts = NumberAttribute(default_for_new=0)
ingest_attempts = NumberAttribute(default_for_new=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere we should be specific about the CLI command connection between finalize and ingest_attempts given the differing vocabulary. Maybe just README.md? And submit_attempts may not really end up ever mattering but I guess it doesn't hurt to include it

Copy link
Contributor Author

@jonavellecuerdo jonavellecuerdo Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted! I think this can also be explained visa updated docstrings for the base Workflow's submit and process_ingest_result methods as well.

dsc/db/item.py Outdated
cls.Meta.table_name = table_name

@classmethod
def create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe create_submission or create_submission_row to be really specific?

Copy link
Contributor Author

@jonavellecuerdo jonavellecuerdo Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if we end up renaming the pynamodb model to include the phrase 'submission', I would opt to keep the method as create for now. 🤔 The naming convention also stems from the standard set of CRUD operations.

dsc/db/item.py Outdated
condition=cls.item_identifier.does_not_exist()
& cls.batch_id.does_not_exist()
)
except PutError as exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only raised in the case of duplicates or could a PutError be raised for different reasons? This seems really specific to just duplicates

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a great question, I'm curious too.

If it's not specific to only duplicates, maybe there is something in the PutError that can be used to confirm the exception is related to duplicates and thus re-raising a custom ValueError makes sense?

Might also be worth defining a custom exception like ItemExistsError (or SubmissionExistsError if going that naming route) which could be raised. That would be unambiguous downstream what's happening if encountered, where ValueError could be lots of things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that when DSC first writes an item submission to the DynamoDB table, the calling method (the reconcile command) would call ItemDB.create(). However, for updates to rows in the table, the built-in pynamodb.models.Model.update method would be used instead (by submit and finalize commands).

@ehanson8 Yes, this will be raised in the case of duplicates, but a PutError can be raised for any failing conditions.

I like the idea proposed by @ghukill to define a custom exception (db.exceptions) and can make the changes once we decide on the name of the "item" pynamodb model! 🤓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been addressed in the latest commit.

Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments and questions, but overall looking good to me.

tests/test_db.py Outdated
assert fetched_item.workflow_name == "workflow"


def test_db_item_create_if_hash_key_and_range_key_exist_raise_error(mocked_item_db):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might propose a test rename here, something like:

test_db_item_create_duplicate_item_identifier_and_batch_id_raise_error()

I think the item_identifier being the primary key and batch_id being the range key feel like implemenation details, but this test is kind of getting at identifer + batch is what makes something unique in the database.

Normally I'm pretty meh on test names, but this jumped out at me. Perhaps in a test docstring you could point out that item_identifier --> primary key and batch_id --> range key? and that the combination of the two is what gaurantees uniqueness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, since we've renamed the DynamoDB model, the test can now be named:

test_db_itemsubmission_create_if_exists_raise_error()

What do you think?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

dsc/db/item.py Outdated
condition=cls.item_identifier.does_not_exist()
& cls.batch_id.does_not_exist()
)
except PutError as exception:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a great question, I'm curious too.

If it's not specific to only duplicates, maybe there is something in the PutError that can be used to confirm the exception is related to duplicates and thus re-raising a custom ValueError makes sense?

Might also be worth defining a custom exception like ItemExistsError (or SubmissionExistsError if going that naming route) which could be raised. That would be unambiguous downstream what's happening if encountered, where ValueError could be lots of things.

@jonavellecuerdo jonavellecuerdo changed the base branch from main to support-for-dynamodb June 23, 2025 20:33
* Add detailed docstrings to ItemSubmissionDB
* Rename DynamoDB model to 'ItemSubmissionDB'
* Rename unit test
* Raise custom exception when PutError is caused by 'ConditionalCheckFailedException'
workflow_name: str,
**attributes: Unpack[OptionalItemAttributes],
) -> None:
"""Create a new item (row) in the 'dsc-item-submissions' table.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"item" as in "DynamoDB item"

item_identifier = UnicodeAttribute(range_key=True)
workflow_name = UnicodeAttribute()
dspace_handle = UnicodeAttribute(null=True)
status = UnicodeAttribute(null=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting for posterity what you shared @jonavellecuerdo, that pynamodb does not support an enum or list of accepted values. I think that's okay.

Comment on lines +16 to +25
class ItemSubmissionStatus(StrEnum):
RECONCILE_SUCCESS = "reconcile_success"
RECONCILE_FAILED = "reconcile_failed"
SUBMIT_SUCCESS = "submit_success"
SUBMIT_FAILED = "submit_failed"
SUBMIT_MAX_RETRIES_REACHED = "submit_max_retries_reached"
INGEST_SUCCESS = "ingest_success"
INGEST_FAILED = "ingest_failed"
INGEST_UNKNOWN = "ingest_unknown"
INGEST_MAX_RETRIES_REACHED = "ingest_max_retries_reached"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to defining the allowed statuses here. Do you have plans to use this elsewhere in code? Perhaps in the business logic when a particular status will get set?

Even if this commit / PR doesn't do that, I think this enum sets us up nicely for that.

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small docstring fix requested but otherwise this is great!

@jonavellecuerdo jonavellecuerdo requested a review from ehanson8 June 25, 2025 15:14
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@jonavellecuerdo jonavellecuerdo merged commit 9d1a1ab into support-for-dynamodb Jun 25, 2025
2 checks passed
@jonavellecuerdo jonavellecuerdo deleted the IN-1315-create-item-db-model branch June 25, 2025 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants