Skip to content

Conversation

@ghukill
Copy link
Contributor

@ghukill ghukill commented Aug 28, 2025

Purpose and background context

The high level goal of this PR is to avoid "stale" rows in the Employee Salary History table. These were discovered during some unrelated spot checks. Note: no data was corrupted or missing, we just have some extra rows in this table.

Ticket HRQB-36 is worth a skim for some background context when we first encountered this.

"Stale" rows were accumulating in the table when one of two things happened:

  1. one of an employee's appointments ended
  2. an employee had a salary change

In both situations, an "end" date changed that we were using to construct the Quickbase row's unique, merge field value with. When that happens, any updates to that salary history item did not update the pre-existing row, they created a new one.

The fix is quite straight-forward to remove the appointment end date and the salary adjustment end date from the construction of the unique key. Those are volatile values and therefore cannot be used for that purpose.

The actual fix can be found in the last commit, 58726e4, which is literally 4-5 lines. The commits leading up to that are dependency updates and some added integrity checks that will bubble this up should it happen again in the future.

How can a reviewer manually see the effects of these changes?

The story remains the same with HRQB, manual testing is really not possible.

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

Why these changes are being introduced:

Tasks can define integrity checks with the decorator @HRQBTask.integrity_check.  These
run during the tasks main run sequence.  If the checks fail, the task is considered a failure,
potentially prohibiting downstream tasks from firing.

However, as we lean into using these more, it can be beneficial to skip them for situations
like testing or potentially even to force a run through if its known the integrity checks
aren't needed.  Note that it will be quite rare outside of unit testing, if ever, that this
is used.

How this addresses that need:
* Add new truthy env var SKIP_TASK_INTEGRITY_CHECKS to skip integrity checks

Side effects of this change:
* Able to skip task integrity checks during unit testing

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1439
Why these changes are being introduced:

For some tasks, e.g. transform tasks for 'Employee Appointments' and
'Employee Salary History', we know in advance that the number of rows we are transforming
and preparing for upsert should always exceed the number of rows currently in Quickbase
for that table.  This is because upserts for these tables are all-time data.  If there are
more rows in Quickbase than the upsert, this indicates a problem, usually with how we
are constructing the unique merge field.

For example, we had a situation where 'Employee Appointment' had two rows in Quickbase
that should not have been there, due to a bug where an employee leaves the library but
goes to another MIT position, but we still pickup the new appointment as a library a
appointment.  This integrity check would flag would have flagged the issue for analysis
versus stumbling on it in the data.

How this addresses that need:
* Adds new integrity checks

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1439
Why these changes are being introduced:

Two end dates were used in the construction of a unique, enduring
merge field value for the Employee Salary History transform step:
the associated appointment end date and the end date of the salary
adjustment.

However, it's quite normal for these values to change over time, e.g.
when apointments end or salary adjustments end.  We similarly removed
an appointment's end date from the Employee Appointments table's merge field
construction table for this same reason.

The result is "stale" rows in the Employee Salary History table.

How this addresses that need:

By removing the appointment and salary adjustment end dates, we construct
a hash value that serves as a unique and enduring value for that salary
history row.  In the event an appointment changes, or a new salary history
row is added (e.g. yearly review increase), these values will remain the same
for pre-existing rows and update them versus adding new rows.

Again, this was a fix take in the Employee Appointments table at one point,
but it was missed we needed to do this in Employee Salary History.

Side effects of this change:
* No more "stale" rows in the Employee Salary History table

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1439
@ghukill ghukill requested a review from a team August 28, 2025 20:30
@ghukill ghukill marked this pull request as ready for review August 28, 2025 20:30
@ehanson8 ehanson8 self-assigned this Aug 29, 2025
Copy link

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment on lines +45 to +51
@property
def skip_task_integrity_checks(self) -> bool:
value = os.getenv("SKIP_TASK_INTEGRITY_CHECKS")
if value is None:
return False
return value.strip().lower() in {"1", "true", "t", "yes", "y"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart option for testing!

@ghukill ghukill merged commit 9755a14 into main Aug 29, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants