-
Notifications
You must be signed in to change notification settings - Fork 0
IN-1317 Update submit CLI command to use DynamoDB #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: * The submit CLI command and the Workflow methods it calls need to be updated to use DynamoDB How this addresses that need: * Remove SUBMIT_MAX_RETRIES_REACHED from ItemSubmissionStatus and rename INGEST_MAX_RETRIES_REACHED > MAX_RETRIES_REACHED * Update Workflow.submit_items to accept a run_date param * Add retry_threshold property to Workflow class * Add allow_submission and write_submit_results_to_dynamodb to Workflow class and corresponding unit tests Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1317
| # validate whether a message should be sent for this item submission | ||
| if not self.allow_submission(item_identifier): | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is getting huge so I added comments to break it up. I considered splitting out more methods but it's already calling so many, it was starting to feel cluttered
jonavellecuerdo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehanson8 This was a good first pass, but changes are needed to resolve some issues regarding when updates are written to the DynamoDB table. I really liked the use of the allow_submission method to organize the logic for when to skip the submit step for a given item!
ghukill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like changes are coming. Submitting some initial comments at this time.
* Refactor run_date as a Workflow instance attribute and remove run_date args from methods * Add RETRY_THRESHOLD as Config property sourced from env var and update README.md * Refactor submit_items method to directly update DynamoDB * Add skipped count to submission_summary * Refactor allow_submission method to accept already created ItemSubmissionDB instance and to use match/case pattern * Remove write_submit_results_to_dynamodb method and corresponding unit test * Add new unit tests and update existing unit tests to account for updated functionality
Pull Request Test Coverage Report for Build 16055561215Details
💛 - Coveralls |
jonavellecuerdo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Thank you for addressing the previous review. Here is a smaller batch of minor change requests. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, with some accompanying thoughts.
While looking at the Workflow.submit_items() method, something that immediately jumps out is the following codeblock repeating:
item_submission_record.update(
actions=[
ItemSubmissionDB.status.set(ItemSubmissionStatus.SUBMIT_FAILED),
ItemSubmissionDB.status_details.set(str(exception)),
ItemSubmissionDB.last_run_date.set(self.run_date),
ItemSubmissionDB.submit_attempts.add(1),
]
)A naive (as in simplistic, not uninformed) suggestion would be to refactor to a method and have this method call that. But I think this excerbates another issue of the Workflow class growing in size and complexity. Reconcilation, now submit, and finalize will, are all introducing additional similar but not identical methods as they perform some kind of "loop through things, do things, save things to DB, and report things".
My fear is that focusing on any given subset -- e.g. submit_items() here -- would miss an opportunity to think more globally about this. For example, what if Workflow was composed of multiple other more focused classes like ReconcileItems, SubmitItems, and FinalizeItems a la:
class Workflow():
def __init__(self):
self.reconcile = Reconcile()
self.submit = Submit()
self.finalize = Finalize()where then each of those classes might have something like a .run() method that would perform their work. Perhaps while doing that refactor we notice they all keep doing similar DyanmoDB updating of items. It would stand to reason either a) a standalone function they could call to update items or b) moving that to more methods on the ItemSubmissionDB class.
To be clear: I think this code as-is will function! And I think at this time that's ultimately more valuable to continue exposing contours of this project we may not yet understand. But similar to the Dynamo + reconcile PR, I think this PR is hinting that something more structural at the Workflow and ItemSubmissionDB level isn't quite right and it's rearing up in these methods, making them impossible to really nail.
I say all this as the most removed person on this project, and without a clear vision or proposal of what that reworking looks like. But my gut feeling is that we should continue plowing forward with the functionality like this PR adds, e.g. moving onto finalize, and then pause for a moment and look for some fairly deep structural refactorings that might make all the pieces snap together a bit cleaner.
What I do like about this PR, and something I had written and deleted multiple comments about, is the very procedural nature of the code. Because we have not yet introduced too many methods which increase misdirection, I think we're setup well for some refactoring.
TL/DR: Approved with the belief we keep moving forward towards end-to-end functionality, then pause and take stock for a potentially fairly substantial refactor of working functionality.
💯 agree, for me, this PR has reinforced the a lot of the awkwardness that has developed in the code (which seems kinda inevitable given the complexity of what needs to happen during a submission). And this seems like a great starting point for thinking about a refactor:
|
* Create ITEM_SUBMISSION_LOG_STR and replace method logging templates * Update description of RETRY_THRESHOLD in README.md * Update unit test names for consistency and streamline allow_submission unit tests
jonavellecuerdo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved! Thank you for the additional cleanup. ✨
Purpose and background context
Updates the
submitCLI command to utilize DynamoDB. This required a method for writingsubmitresults to DynamoDB and anallow_submissionto skip sending a message when certain statuses are found in DynamoDB.There were several items in the Jira ticket that did not happen as expected:
last_submission_messagewas an attribute in the DynamoDB schema, however, we only ever create themessage_attributesandmessage_bodyfor sending to the input queue. I would argue it is not worth assembling an "input message" just to record it in the table given that theresult_messageis far more important to record. Let me know if you disagree!errorsare not associated with anitem_identifierinWorkflowEvents, it's not possible to support thestatus_detailsattribute at this time. I'll make a note on @ghukill's Jira ticket to consider this while working on this ticket.submitretry threshold on Slack, I'm still recordingsubmit_attemptsfor informational purposes.How can a reviewer manually see the effects of these changes?
Not possible yet
Includes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)