-
Notifications
You must be signed in to change notification settings - Fork 0
In 1355 refactor reconcile for itemsubmission #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In 1355 refactor reconcile for itemsubmission #190
Conversation
Pull Request Test Coverage Report for Build 16680805114Details
💛 - Coveralls |
| """Domain class that stores both persistence and business logic for item submissions. | ||
| This class maintains two types of attributes: | ||
| 1. Persisted attributes: Mapped directly to ItemSubmissionDB columns | ||
| and persisted in the database | ||
| 2. Processing attributes: Temporary data used during submission processing | ||
| but not persisted | ||
| This class provides high-level methods for managing item submissions while abstracting | ||
| the underlying database operations. It wraps the dsc.db.models.ItemSubmissionDB model, | ||
| providing intuitive CRUD-like methods (get, create, upsert) for data persistence. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Received help from GitHub Copilot to write up this docstring 🤓
|
|
||
| items = [] | ||
| for item_submission_record in ItemSubmissionDB.query(self.batch_id): | ||
| for item_submission in ItemSubmission.get_batch(self.batch_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows us to remove references to ItemSubmissionDB in the Workflow class 🎉
ghukill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looking great! I have a truncated amount of time to review today, but wanted to provide some immediate feedback/requests. Mostly formatting and logging and what feels like some refactor churn.
Thought I may request these changes now, and then can return for a final review.
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, a few comments and I second most of @ghukill's suggestions!
* Clean up comment in Workflow.reconcile_items * Replace get_item_identifier method calls with indexing of item_metadata * Access item_submission.status when needed * Remove unneeded logging in Workflow.reconcile_items
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question!
dsc/workflows/base/__init__.py
Outdated
| ) | ||
| item_submission.status = ItemSubmissionStatus.RECONCILE_FAILED | ||
|
|
||
| logger.debug(f"Reconcile status: {item_submission.status}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need item_submission.item_identifier in this log message as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much as I'm leaning towards making logging more lean lately, I'd agree it would be helpful.
@jonavellecuerdo - I will have a comment forthcoming about this block which may have tiny bearing on this (or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at the latest commit! :)
ghukill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one comment, but it's 100% optional. I think the reconcile_items() is looking pretty lean and mean as-is.
Nice work on this PR!
dsc/workflows/base/__init__.py
Outdated
| if item_submission.status in [ | ||
| None, | ||
| ItemSubmissionStatus.RECONCILE_FAILED, | ||
| ]: | ||
| item_submission.last_run_date = self.run_date | ||
| if ( | ||
| item_submission.item_identifier | ||
| in self.workflow_events.reconciled_items | ||
| ): | ||
| item_submission.status = ItemSubmissionStatus.RECONCILE_SUCCESS | ||
| else: | ||
| item_submission.status = ItemSubmissionStatus.RECONCILE_FAILED | ||
|
|
||
| logger.debug(f"Reconcile status: {item_submission.status}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of early exits, we could also continue early which allows de-denting the actual code we're more interested in. This also provides a space for logging something like "skipping reconciliation for item XYZ" if we wanted.
if item_submission.status not in [
None,
ItemSubmissionStatus.RECONCILE_FAILED,
]:
continue
# update reconciliation status
item_submission.last_run_date = self.run_date
if item_submission.item_identifier in self.workflow_events.reconciled_items:
item_submission.status = ItemSubmissionStatus.RECONCILE_SUCCESS
else:
item_submission.status = ItemSubmissionStatus.RECONCILE_FAILED
item_submission.upsert_db()
logger.debug(
f"Reconcile status: {item_submission.status}, "
f"for: {item_metadata["item_identifier"]}"
)
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect!
Why these changes are being introduced: * This is to ensure that the item-identifying column is explicitly called 'item_identifier' when SimpleCSV.item_metadata_iter yields rows from the metadata CSV file. This also ensures that all column names are made lowercase. How this addresses that need: * Add 'item_identifier_column_names' property to SimpleCSV * Set SCCS.item_identifier_column_names to include possible names * Include additional processing in item_metadata_iter Side effects of this change: * This allows us to replace all calls to 'get_item_identifier' with 'item_metadata["item_identifier"]'. It will also ensure that calls to ItemSubmission.from_metadata will work now that all rows yielded by item_metadata_iter will ensure the 'item_identifier' column name is used. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1355
Why these changes are being introduced: * Since all 'item_metadata_iter' methods are set to return metadata with the item identifying column explicitly named 'item_identifier', workflows no longer require a custom method for retrieving the item identifier from the metadata. How this addresses that need: * Remove 'get_item_identifier' methods from SimpleCSV, SCCS, and OpenCourseWare workflows * Create final method on base Workflow class Side effects of this change: * Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1355
Why these changes are being introduced:
* Initial work to update the 'reconcile' and 'submit' commands to use the refactored
ItemSubmission class underscored the need for additional updates to improve the
ergonomics of the class and emphasize the encapsulation of data persistance
as part of the domain model. While most of the code was already in place,
these updates focus on some renaming and slight reorganization of methods related
to database operations. The renames were intended to reflect that pulling data
from the DynamoDB table is an inherent part of creating an ItemSubmission.
How this addresses that need:
* Move logic for loading data into ItemSubmission into a private method ('_from_db')
* Rename 'update_db' -> 'upsert_db' to clarify insertion of new records
* Rework existing methods to get, get_batch, and create methods
* Update Workflow.reconcile_items to use get and create methods
* Update Workflow.submit_items to use get_batch to iterate over ItemSubmission
instances for a given batch
Side effects of this change:
* None
Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1355
1955ece to
0f256cc
Compare
Purpose and background context
While this PR was originally to focus on updating
reconcileto use the refactoredItemSubmissiondomain class, there were a number of prerequisite updates to support this work. The work is organized into logical commits and it is recommended for reviewers to review changes by commit. Though there are a number of files changed, the total number of lines added/removed isn't significant. 😉The changes are described at high-level below:
item_metadata_itermethods return a dict where the item identifying column is explicitly calleditem_identifier.get_item_identifiermethod.How can a reviewer manually see the effects of these changes?
For now, review of unit tests is sufficient.
Includes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)