Skip to content

Conversation

@ehanson8
Copy link
Contributor

@ehanson8 ehanson8 commented Jul 2, 2025

Purpose and background context

Updates the submit CLI command to utilize DynamoDB. This required a method for writing submit results to DynamoDB and an allow_submission to skip sending a message when certain statuses are found in DynamoDB.

There were several items in the Jira ticket that did not happen as expected:

  1. last_submission_message was an attribute in the DynamoDB schema, however, we only ever create the message_attributes and message_body for sending to the input queue. I would argue it is not worth assembling an "input message" just to record it in the table given that the result_message is far more important to record. Let me know if you disagree!
  2. Given that errors are not associated with an item_identifier in WorkflowEvents, it's not possible to support the status_details attribute at this time. I'll make a note on @ghukill's Jira ticket to consider this while working on this ticket.
  3. While we discussed not using a submit retry threshold on Slack, I'm still recording submit_attempts for informational purposes.

How can a reviewer manually see the effects of these changes?

Not possible yet

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:
* The submit CLI command and the Workflow methods it calls need to be updated to use DynamoDB

How this addresses that need:
* Remove SUBMIT_MAX_RETRIES_REACHED from ItemSubmissionStatus and rename INGEST_MAX_RETRIES_REACHED > MAX_RETRIES_REACHED
* Update Workflow.submit_items to accept a run_date param
* Add retry_threshold property to Workflow class
* Add allow_submission and write_submit_results_to_dynamodb to Workflow class and corresponding unit tests

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1317
Comment on lines 258 to 260
# validate whether a message should be sent for this item submission
if not self.allow_submission(item_identifier):
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is getting huge so I added comments to break it up. I considered splitting out more methods but it's already calling so many, it was starting to feel cluttered

@ehanson8 ehanson8 marked this pull request as ready for review July 2, 2025 15:22
@ehanson8 ehanson8 requested a review from a team as a code owner July 2, 2025 15:22
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehanson8 This was a good first pass, but changes are needed to resolve some issues regarding when updates are written to the DynamoDB table. I really liked the use of the allow_submission method to organize the logic for when to skip the submit step for a given item!

@ghukill ghukill self-assigned this Jul 2, 2025
Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like changes are coming. Submitting some initial comments at this time.

* Refactor run_date as a Workflow instance attribute and remove run_date args from methods
* Add RETRY_THRESHOLD as Config property sourced from env var and update README.md
* Refactor submit_items method to directly update DynamoDB
* Add skipped count to submission_summary
* Refactor allow_submission method to accept already created ItemSubmissionDB instance and to use match/case pattern
* Remove write_submit_results_to_dynamodb method and corresponding unit test
* Add new unit tests and update existing unit tests to account for updated functionality
@coveralls
Copy link

coveralls commented Jul 2, 2025

Pull Request Test Coverage Report for Build 16055561215

Details

  • 42 of 49 (85.71%) changed or added relevant lines in 4 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.6%) to 95.42%

Changes Missing Coverage Covered Lines Changed/Added Lines %
dsc/config.py 2 3 66.67%
dsc/workflows/base/init.py 38 44 86.36%
Files with Coverage Reduction New Missed Lines %
dsc/workflows/base/init.py 2 93.58%
Totals Coverage Status
Change from base Build 15932667975: -0.6%
Covered Lines: 875
Relevant Lines: 917

💛 - Coveralls

@ehanson8 ehanson8 requested a review from jonavellecuerdo July 2, 2025 20:05
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Thank you for addressing the previous review. Here is a smaller batch of minor change requests. Thanks!

@ghukill ghukill self-requested a review July 3, 2025 12:59
Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, with some accompanying thoughts.

While looking at the Workflow.submit_items() method, something that immediately jumps out is the following codeblock repeating:

item_submission_record.update(
    actions=[
        ItemSubmissionDB.status.set(ItemSubmissionStatus.SUBMIT_FAILED),
        ItemSubmissionDB.status_details.set(str(exception)),
        ItemSubmissionDB.last_run_date.set(self.run_date),
        ItemSubmissionDB.submit_attempts.add(1),
    ]
)

A naive (as in simplistic, not uninformed) suggestion would be to refactor to a method and have this method call that. But I think this excerbates another issue of the Workflow class growing in size and complexity. Reconcilation, now submit, and finalize will, are all introducing additional similar but not identical methods as they perform some kind of "loop through things, do things, save things to DB, and report things".

My fear is that focusing on any given subset -- e.g. submit_items() here -- would miss an opportunity to think more globally about this. For example, what if Workflow was composed of multiple other more focused classes like ReconcileItems, SubmitItems, and FinalizeItems a la:

class Workflow():
    def __init__(self):
        self.reconcile = Reconcile()
        self.submit = Submit()
        self.finalize = Finalize()

where then each of those classes might have something like a .run() method that would perform their work. Perhaps while doing that refactor we notice they all keep doing similar DyanmoDB updating of items. It would stand to reason either a) a standalone function they could call to update items or b) moving that to more methods on the ItemSubmissionDB class.

To be clear: I think this code as-is will function! And I think at this time that's ultimately more valuable to continue exposing contours of this project we may not yet understand. But similar to the Dynamo + reconcile PR, I think this PR is hinting that something more structural at the Workflow and ItemSubmissionDB level isn't quite right and it's rearing up in these methods, making them impossible to really nail.

I say all this as the most removed person on this project, and without a clear vision or proposal of what that reworking looks like. But my gut feeling is that we should continue plowing forward with the functionality like this PR adds, e.g. moving onto finalize, and then pause for a moment and look for some fairly deep structural refactorings that might make all the pieces snap together a bit cleaner.

What I do like about this PR, and something I had written and deleted multiple comments about, is the very procedural nature of the code. Because we have not yet introduced too many methods which increase misdirection, I think we're setup well for some refactoring.

TL/DR: Approved with the belief we keep moving forward towards end-to-end functionality, then pause and take stock for a potentially fairly substantial refactor of working functionality.

@ehanson8
Copy link
Contributor Author

ehanson8 commented Jul 3, 2025

TL/DR: Approved with the belief we keep moving forward towards end-to-end functionality, then pause and take stock for a potentially fairly substantial refactor of working functionality.

💯 agree, for me, this PR has reinforced the a lot of the awkwardness that has developed in the code (which seems kinda inevitable given the complexity of what needs to happen during a submission). And this seems like a great starting point for thinking about a refactor:

class Workflow():
    def __init__(self):
        self.reconcile = Reconcile()
        self.submit = Submit()
        self.finalize = Finalize()

* Create ITEM_SUBMISSION_LOG_STR and replace method logging templates
* Update description of RETRY_THRESHOLD in README.md
* Update unit test names for consistency and streamline allow_submission unit tests
@ehanson8 ehanson8 requested a review from jonavellecuerdo July 3, 2025 16:18
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Thank you for the additional cleanup. ✨

@ehanson8 ehanson8 merged commit ba4c95f into support-for-dynamodb Jul 3, 2025
2 checks passed
@ehanson8 ehanson8 deleted the IN-1317-submit-dynamodb branch July 3, 2025 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants