ETL Output Tech Debt #217

mawelborn · 2025-06-09T21:48:45Z

This PR repays tech debt in the etloutput module by removing code paths for ETL Output files produced by v1 workflows on IPA 6.X versions of the platform. Dropping support for legacy versions simplifies the functional code and eliminates the need to proactively enable loading of tables OCR.

v3 workflows on IPA 7.X organize OCR text, tokens, and tables into separate files linked in etl_output.json. This makes all ETL Output information discoverable to the point where it can be loaded automatically, and all three are now loaded by default when present.

The boolean flags etloutput.load(..., text=True, tokens=True, tables=True) are now entirely optional. They've been left in because they're not very complex and the ability to disable loading of information you don't need might be useful to improve performance.

Changes:

Remove code paths for v1 workflows.
Remove rewrite code paths to use the simpler layout of provided by IPA 7.X.
Enable loading tables information by default when present.
Update and expand unit tests to reach near 100% code coverage.

mawelborn added 4 commits June 6, 2025 10:02

Remove v1 code paths and load tables automatically when available

052c37f

Update unit tests with 7.2 ETL Output JSON files

3e9403b

Update docstrings

f5f449d

Add more unit tests

bb863bf

mawelborn requested review from Scott771, andrew8bit, annaliu-indico and nickesparza June 9, 2025 21:48

mawelborn self-assigned this Jun 9, 2025

mawelborn added 2 commits June 11, 2025 15:05

Improve variable names

d2af0a1

Move return out of try..except

bc3182a

mawelborn merged commit 1feb492 into dev-7-2 Jun 16, 2025
2 checks passed

mawelborn deleted the mawelborn/etloutput-7-2 branch June 16, 2025 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETL Output Tech Debt #217

ETL Output Tech Debt #217

Uh oh!

mawelborn commented Jun 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ETL Output Tech Debt #217

ETL Output Tech Debt #217

Uh oh!

Conversation

mawelborn commented Jun 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants