Skip to content

Conversation

@jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Jul 31, 2025

Purpose and background context

While this PR was originally to focus on updating reconcile to use the refactored ItemSubmission domain class, there were a number of prerequisite updates to support this work. The work is organized into logical commits and it is recommended for reviewers to review changes by commit. Though there are a number of files changed, the total number of lines added/removed isn't significant. 😉

The changes are described at high-level below:

  1. Include additional processing steps to SimpleCSV 'item_metadata_iter': @jonavellecuerdo noted the following in comment IN-1354-itemsubmission-class-update #187 (comment); this ensures that all item_metadata_iter methods return a dict where the item identifying column is explicitly called item_identifier.
  2. Rework 'get_item_identifier' as final method on base Workflow: The change in 1 allows further simplification of the get_item_identifier method.
  3. Rework ItemSubmission class to clearly demonstrate encapsulated data …: Our discussion the other day led to the changes in this commit.

How can a reviewer manually see the effects of these changes?

For now, review of unit tests is sufficient.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

@coveralls
Copy link

coveralls commented Jul 31, 2025

Pull Request Test Coverage Report for Build 16680805114

Details

  • 42 of 44 (95.45%) changed or added relevant lines in 4 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.4%) to 96.447%

Changes Missing Coverage Covered Lines Changed/Added Lines %
dsc/workflows/base/simple_csv.py 6 7 85.71%
dsc/workflows/sccs.py 2 3 66.67%
Files with Coverage Reduction New Missed Lines %
dsc/cli.py 2 91.49%
dsc/db/models.py 2 97.06%
Totals Coverage Status
Change from base Build 16680249799: -0.4%
Covered Lines: 950
Relevant Lines: 985

💛 - Coveralls

@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review July 31, 2025 17:52
@jonavellecuerdo jonavellecuerdo requested a review from a team as a code owner July 31, 2025 17:52
Comment on lines +34 to +46
"""Domain class that stores both persistence and business logic for item submissions.
This class maintains two types of attributes:
1. Persisted attributes: Mapped directly to ItemSubmissionDB columns
and persisted in the database
2. Processing attributes: Temporary data used during submission processing
but not persisted
This class provides high-level methods for managing item submissions while abstracting
the underlying database operations. It wraps the dsc.db.models.ItemSubmissionDB model,
providing intuitive CRUD-like methods (get, create, upsert) for data persistence.
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Received help from GitHub Copilot to write up this docstring 🤓


items = []
for item_submission_record in ItemSubmissionDB.query(self.batch_id):
for item_submission in ItemSubmission.get_batch(self.batch_id):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows us to remove references to ItemSubmissionDB in the Workflow class 🎉

Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looking great! I have a truncated amount of time to review today, but wanted to provide some immediate feedback/requests. Mostly formatting and logging and what feels like some refactor churn.

Thought I may request these changes now, and then can return for a final review.

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, a few comments and I second most of @ghukill's suggestions!

jonavellecuerdo added a commit that referenced this pull request Aug 1, 2025
* Clean up comment in Workflow.reconcile_items
* Replace get_item_identifier method calls with indexing of item_metadata
* Access item_submission.status when needed
* Remove unneeded logging in Workflow.reconcile_items
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question!

)
item_submission.status = ItemSubmissionStatus.RECONCILE_FAILED

logger.debug(f"Reconcile status: {item_submission.status}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need item_submission.item_identifier in this log message as well?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I'm leaning towards making logging more lean lately, I'd agree it would be helpful.

@jonavellecuerdo - I will have a comment forthcoming about this block which may have tiny bearing on this (or not).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the latest commit! :)

Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment, but it's 100% optional. I think the reconcile_items() is looking pretty lean and mean as-is.

Nice work on this PR!

Comment on lines 168 to 181
if item_submission.status in [
None,
ItemSubmissionStatus.RECONCILE_FAILED,
]:
item_submission.last_run_date = self.run_date
if (
item_submission.item_identifier
in self.workflow_events.reconciled_items
):
item_submission.status = ItemSubmissionStatus.RECONCILE_SUCCESS
else:
item_submission.status = ItemSubmissionStatus.RECONCILE_FAILED

logger.debug(f"Reconcile status: {item_submission.status}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of early exits, we could also continue early which allows de-denting the actual code we're more interested in. This also provides a space for logging something like "skipping reconciliation for item XYZ" if we wanted.

if item_submission.status not in [
    None,
    ItemSubmissionStatus.RECONCILE_FAILED,
]:
    continue

# update reconciliation status
item_submission.last_run_date = self.run_date
if item_submission.item_identifier in self.workflow_events.reconciled_items:
    item_submission.status = ItemSubmissionStatus.RECONCILE_SUCCESS
else:
    item_submission.status = ItemSubmissionStatus.RECONCILE_FAILED
item_submission.upsert_db()

logger.debug(
    f"Reconcile status: {item_submission.status}, "
    f"for: {item_metadata["item_identifier"]}"
)

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

Why these changes are being introduced:
* This is to ensure that the item-identifying column is explicitly called
'item_identifier' when SimpleCSV.item_metadata_iter yields rows from the
metadata CSV file. This also ensures that all column names are made lowercase.

How this addresses that need:
* Add 'item_identifier_column_names' property to SimpleCSV
* Set SCCS.item_identifier_column_names to include possible names
* Include additional processing in item_metadata_iter

Side effects of this change:
* This allows us to replace all calls to 'get_item_identifier' with
'item_metadata["item_identifier"]'. It will also ensure that calls to
ItemSubmission.from_metadata will work now that all rows yielded by
item_metadata_iter will ensure the 'item_identifier' column name is
used.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1355
Why these changes are being introduced:
* Since all 'item_metadata_iter' methods are set to return metadata
with the item identifying column explicitly named 'item_identifier',
workflows no longer require a custom method for retrieving the
item identifier from the metadata.

How this addresses that need:
* Remove 'get_item_identifier' methods from SimpleCSV, SCCS, and
OpenCourseWare workflows
* Create final method on base Workflow class

Side effects of this change:
*

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1355
Why these changes are being introduced:
* Initial work to update the 'reconcile' and 'submit' commands to use the refactored
ItemSubmission class underscored the need for additional updates to improve the
ergonomics of the class and emphasize the encapsulation of data persistance
as part of the domain model. While most of the code was already in place,
these updates focus on some renaming and slight reorganization of methods related
to database operations. The renames were intended to reflect that pulling data
from the DynamoDB table is an inherent part of creating an ItemSubmission.

How this addresses that need:
* Move logic for loading data into ItemSubmission into a private method ('_from_db')
* Rename 'update_db' -> 'upsert_db' to clarify insertion of new records
* Rework existing methods to get, get_batch, and create methods
* Update Workflow.reconcile_items to use get and create methods
* Update Workflow.submit_items to use get_batch to iterate over ItemSubmission
instances for a given batch

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1355
@jonavellecuerdo jonavellecuerdo force-pushed the IN-1355-refactor-reconcile-for-itemsubmission branch from 1955ece to 0f256cc Compare August 1, 2025 17:02
@jonavellecuerdo jonavellecuerdo merged commit f8ff28b into main Aug 1, 2025
3 checks passed
@jonavellecuerdo jonavellecuerdo deleted the IN-1355-refactor-reconcile-for-itemsubmission branch August 1, 2025 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants