Skip to content

Conversation

@mawelborn
Copy link
Contributor

This PR repays tech debt in the etloutput module by removing code paths for ETL Output files produced by v1 workflows on IPA 6.X versions of the platform. Dropping support for legacy versions simplifies the functional code and eliminates the need to proactively enable loading of tables OCR.

v3 workflows on IPA 7.X organize OCR text, tokens, and tables into separate files linked in etl_output.json. This makes all ETL Output information discoverable to the point where it can be loaded automatically, and all three are now loaded by default when present.

The boolean flags etloutput.load(..., text=True, tokens=True, tables=True) are now entirely optional. They've been left in because they're not very complex and the ability to disable loading of information you don't need might be useful to improve performance.

Changes:

  • Remove code paths for v1 workflows.
  • Remove rewrite code paths to use the simpler layout of provided by IPA 7.X.
  • Enable loading tables information by default when present.
  • Update and expand unit tests to reach near 100% code coverage.

@mawelborn mawelborn merged commit 1feb492 into dev-7-2 Jun 16, 2025
2 checks passed
@mawelborn mawelborn deleted the mawelborn/etloutput-7-2 branch June 16, 2025 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants