Skip to content

Conversation

@ghukill
Copy link
Contributor

@ghukill ghukill commented May 30, 2024

Purpose and background context

This PR introduces a new pipeline UpdateLibHRData. Unlike FullUpdate, this pipeline is anticipated to get run very infrequently, but is needed. This pipeline will load static data provided by HR into a table called LibHR Employee Appointments. This pipeline will be used for an initial load of data, and potentially for bulk updates requested by HR, but this table will primarily be managed directly in Quickbase.

How can a reviewer manually see the effects of these changes?

Still in discussion with HR about credentials and access, so it may be some time before we have readily accessible ways to test run these pipelines for everyone.

That said, here is an example CLI command to invokes it, passing a parameter for a local CSV file:

pipenv run hrqb pipeline -p UpdateLibHRData \
-pp csv_filepath=output/headcount/QB-GH-May24.csv \
run

And the output:

├── COMPLETE: UpdateLibHRData(csv_filepath=output/headcount/QB-GH-May24.csv)
   ├── COMPLETE: LoadLibHREmployeeAppointments(pipeline=UpdateLibHRData, table_name=LibHR Employee Appointments, stage=Load, csv_filepath=output/headcount/QB-GH-May24.csv)
      ├── COMPLETE: TransformLibHREmployeeAppointments(pipeline=UpdateLibHRData, table_name=, stage=Transform, csv_filepath=output/headcount/QB-GH-May24.csv)
         ├── COMPLETE: ExtractLibHREmployeeAppointments(table_name=, pipeline=UpdateLibHRData, stage=Extract, csv_filepath=output/headcount/QB-GH-May24.csv)
         ├── COMPLETE: ExtractQBDepartments(table_name=, pipeline=UpdateLibHRData, stage=Extract)

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

@ghukill ghukill force-pushed the HRQB-15-libhr-table branch from 1ce8711 to fb08645 Compare May 30, 2024 16:21
ghukill added 3 commits May 30, 2024 12:25
Why these changes are being introduced:

HRQBClient needs to support a unique pipeline that will load static
data from HR into a Quickbase table.  This pipeline will be run
quite rarely, and would be suitable to do so manually from the CLI
locally; an initial load and potentially bulk updates from HR.
After that, this table will be managed directly in QB.  The data
in this table is utilized by other tasks and pipelines.

How this addresses that need:
* Creates new UpdateLibHRData pipeline and associated tasks

Side effects of this change:
* Ability to load and update LibHREmployeeAppointments table via
HRQBClient

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/HRQB-15
Why these changes are being introduced:

Ideally, whenever data is pulled for use in a transform task,
this will occur via an extract task for separation of concerns.

How this addresses that need:
* Moves QBClient for Departments data into extract task

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/HRQB-15
@ghukill ghukill force-pushed the HRQB-15-libhr-table branch from fb08645 to 9a69f4f Compare May 30, 2024 16:25
@ghukill ghukill changed the base branch from HRQB-21-employees-table to main May 30, 2024 16:25
@ghukill ghukill marked this pull request as ready for review May 30, 2024 16:26
Comment on lines +55 to +56
libhr_df = self.named_inputs["ExtractLibHREmployeeAppointments"].read()
departments_df = self.named_inputs["ExtractQBDepartments"].read()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is first time this has been used. This task requires two parent tasks, therefore we cannot use self.single_input_dataframe. This is utilizing the self.named_inputs to get the target data from the parent tasks by those task names.

Copy link

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One optional suggestion

"""Pipeline to load Library HR employee appointment data from static CSV file.
This pipeline loads the table 'LibHR Employee Appointments', which contains
information known only by Library HR, that we cannot get from the data warehouse,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might capitalize data warehouse for consistency

@ghukill ghukill merged commit e0a4dbe into main May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants