Skip to content

Conversation

ghukill
Copy link
Contributor

@ghukill ghukill commented May 6, 2025

Purpose and background context

This PR removes feature flags that were used during development work moving to a parquet dataset for ETL.

The removal of feature flag code branching was fairly minimal, but some followup work was required for the testing suite.

Many, if not most, of the unit tests are exercising transformation logic which is the core of Transmogrifier. There was a pattern throughout where a Transformer was instantiated and then a single transformed record was asserted, psuedo-code roughly:

output_records = Alma(source_records)
assert next(output_records) == timdex.TIMDEXRecord(...)

Now, the next() of a Transformer instance returns a TDA DatasetRecord not a TIMDEXRecord. The tests needed to be updated to follow this pattern:

output_records = Alma(source_records)
timdex_record = output_records.transform(next(output_records.source_records))
assert timdex_record == timdex.TIMDEXRecord(...)

The changes to the testing suite were primarily of this nature. Ignoring the changes in the /tests folder will help review the actual feature flag removal changes.

How can a reviewer manually see the effects of these changes?

The normal make test + make lint.

And, setting AWS Dev TimdexManagers credentials and running the following:

pipenv run transform \
-i s3://timdex-extract-dev-222053980223/libguides/libguides-2025-02-26-full-extracted-records-to-index.xml \
-o s3://timdex-extract-dev-222053980223/dataset/ \
-s libguides

Includes new or updated dependencies?

NO

Changes expectations for external applications?

YES: Transmogrifier no longer supports the "v1" JSON file output.

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

ghukill added 2 commits May 6, 2025 10:11
Why these changes are being introduced:

As of pipenv 2025.0.1 the use of `pipenv check` would throw
an error, indicating that the library `safety` was not installed.
It worked to run `pipenv check --auto-install` which would
temporarily install `safety`, but this was not ideal for multiple
reasons.

First, we anticipate potentially moving away from `pipenv`.

Second, it appears that `safety` is moving to a pay / subscription
model.

Third, it remains a little obfuscated what `pipenv check` is actually
doing.

As this new situation affects all builds in Github Actions CI,
we need a way to scan for vulnerabilities that ideally is not
a massive overhaul of our vulnerability scanning approach.

How this addresses that need:

`pip-audit` is a nice standalone, open-source library that
performs very similar work to `safety`.

This commit replaces `pipenv check` (which was `safety` under
the hood) with `pip-audit`.

Side effects of this change:
* Builds will be successful in Github Actions

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1240
Why these changes are being introduced:

At the onset of the parquet dataset work, feature flags were used
to support both pre-parquet and parquet code pathways.  With the
parquet dataset working and fully utilized now, we can remove
the feature flags.

This project required some additional tweaks when removing the flags,
namely updating tests to be fully parquet compliant.

How this addresses that need:
* Remove all feature flag code branching
* Update Transformer class to support missing 'source_file', if
testing environment detected, to remain backwards compatible with
many tests

Side effects of this change:
* Original "v1" support is ended.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-489
@ghukill ghukill force-pushed the TIMX-489-remove-parquet-feature-flags branch from 91766d6 to aa1e78b Compare May 6, 2025 20:30
@ghukill ghukill requested a review from a team May 6, 2025 20:32
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghukill Great job! I just have one small change request.

@ghukill ghukill requested a review from jonavellecuerdo May 7, 2025 15:06
@ghukill ghukill merged commit c94d01f into main May 7, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants