Skip to content

Ar migrate storage builds drafts#2175

Merged
alexrichey merged 13 commits intomainfrom
ar-migrate-storage-builds-drafts-2
Feb 6, 2026
Merged

Ar migrate storage builds drafts#2175
alexrichey merged 13 commits intomainfrom
ar-migrate-storage-builds-drafts-2

Conversation

@alexrichey
Copy link
Contributor

@alexrichey alexrichey commented Jan 14, 2026

Re-implements the Connectors for builds, drafts, published and gis, but doesn't swap them out just yet. You can read commit-by-commit for localized/grouped changes (e.g. just changes to builds functionality) as well as explanations for the smaller changes in the commit messages. (There's some 🌶️ content in the commit bodies)

Anyhow, the logic in published.py is basically distributed into files in a few different locations:

  • connectors/edm/{conn_type}.py: The pure connector interfaces for the types above
  • dcpy/lifecycle/builds/artifacts/{conn_type}.py: the business logic, including which connector to use, which files to upload, etc.
  • dcpy/lifecycle/builds/_cli.py: all the CLI commands

There will be some continuing commits where we implement this:

  • Swapping out the commands (PR here)
  • TODO: delete publishing.

@fvankrieken @damonmcc Here's where I could use some input:
Consider a command like publishing a draft. This command needs to

  1. pull the draft (after determining some details about the draft itself)
  2. push the draft to published

Ideally this code would live in a module a little bit "above" artifacts/drafts and artifacts/published, almost similar to how lifecycle/scripts lives "above" the various lifecycle stages.

So I'd love it if you could take a look at the files under dcpy/lifecycle/builds/artifacts/*, and consider these "glue" functions like promote and publish. In the CLI they fit really nicely under namespaced commands like
dcpy lc builds artifacts drafts publish -p nypl_libraries -v 24v1 -dn 2 -ip
which make sense, because we publishing is an action that we take on a draft. However, where this glue code should live... I could use some input.

@alexrichey alexrichey force-pushed the ar-migrate-storage-builds-drafts-2 branch 18 times, most recently from 3ccfd61 to 232ee0d Compare January 18, 2026 02:48
@codecov
Copy link

codecov bot commented Jan 18, 2026

Codecov Report

❌ Patch coverage is 83.88215% with 93 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.21%. Comparing base (6e5d0a3) to head (70ed00f).
⚠️ Report is 45 commits behind head on main.

Files with missing lines Patch % Lines
dcpy/lifecycle/builds/artifacts/published.py 82.07% 13 Missing and 6 partials ⚠️
dcpy/lifecycle/builds/artifacts/_cli.py 0.00% 17 Missing ⚠️
dcpy/lifecycle/builds/artifacts/drafts.py 75.40% 9 Missing and 6 partials ⚠️
dcpy/connectors/edm/gis.py 82.66% 7 Missing and 6 partials ⚠️
dcpy/connectors/edm/builds.py 85.52% 7 Missing and 4 partials ⚠️
dcpy/lifecycle/builds/artifacts/builds.py 89.55% 3 Missing and 4 partials ⚠️
dcpy/connectors/edm/drafts.py 94.31% 4 Missing and 1 partial ⚠️
dcpy/connectors/edm/published.py 95.83% 2 Missing and 1 partial ⚠️
dcpy/lifecycle/builds/_cli.py 0.00% 3 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
dcpy/configuration.py 100.00% <100.00%> (ø)
dcpy/connectors/hybrid_pathed_storage.py 80.12% <ø> (-3.93%) ⬇️
dcpy/lifecycle/builds/connector.py 100.00% <100.00%> (ø)
dcpy/lifecycle/config.py 67.74% <ø> (ø)
dcpy/lifecycle/connector_registry.py 100.00% <100.00%> (ø)
dcpy/connectors/edm/published.py 95.83% <95.83%> (ø)
dcpy/lifecycle/builds/_cli.py 0.00% <0.00%> (ø)
dcpy/connectors/edm/drafts.py 94.31% <94.31%> (ø)
dcpy/lifecycle/builds/artifacts/builds.py 89.55% <89.55%> (ø)
dcpy/connectors/edm/builds.py 85.52% <85.52%> (ø)
... and 4 more

... and 18 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alexrichey alexrichey force-pushed the ar-migrate-storage-builds-drafts-2 branch 7 times, most recently from e6d700f to 5d2328a Compare January 20, 2026 22:35
@alexrichey alexrichey force-pushed the ar-migrate-storage-builds-drafts-2 branch from 34f6ef9 to 6180d94 Compare January 21, 2026 21:12
This was taking a long time. I think there's absolutely utility
in running a scraper against BYTES every week or so, but this
is just too much
Having to re-populate my .env after an *unfortunate* incident
(see two commits ahead) prompted this. It's a little out of date.
This was... maybe the worst two lines of code I have every written.
If you used this connector to attempt to download a file to, say, your
data-engineering/ directory, it would wipe the directory
@alexrichey alexrichey force-pushed the ar-migrate-storage-builds-drafts-2 branch from 6180d94 to 8258bb3 Compare January 21, 2026 21:18
@alexrichey alexrichey marked this pull request as ready for review January 21, 2026 21:18
Comment on lines -286 to -287
if destination_path.exists():
shutil.rmtree(destination_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omg. I'm sure I'm guilty of this too somewhere

Comment on lines +198 to +199
def data_local_sub_path(self, key: str, *, version: str, **kwargs) -> Path:
return Path("edm") / "builds" / "datasets" / key / version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nit, but this would presumably be so someone else (i.e. the lifecycle code pulling this) knows where to put it right? Can we not do "datasets" then? builds/datasets makes me think of what's going into a build. Even if it's "data_products".

Or would it just make sense to have this match what's in DO? as in, key / builds / version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to review how this even works, honestly... The local_sub_path stuff seems really misplaced. Would honestly like to refactor it out if possible. For now, I'll just leave it be



@pytest.mark.incremental
class TestLifecycleE2E:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

Copy link
Contributor

@fvankrieken fvankrieken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do worry a little about how much boiler plate there currently is - the three connectors have a LOT of shared material, and I wonder it it'd be simpler to keep them a little closer to the storage - really just be configurable by foler (builds) and a string key (25v1/2-fixed-lots) and leave the rest of the logic - dataclasses, etc - to lifecycle code.

That said, this is a big step in getting us to any desired state. So more just food for thought, and curious how you've thought about that as this PR evolved.

Re the cli, I think I still have my opinion from before - I like to think of the ingest modules and submodules as stages, and "review" or something like that is more of the stage we're in when we're moving around these artifacts than "artifacts". That being said, especially with what I said above about how maybe more of the "draftkey" etc logic should live in lifecycle and not connectors, it doesn't hurt to have these dedicated files more coupled explicitly to the artifacts themselves.

But maybe there's room for both? Leave the code where it is, but have a cli that's more verb/stage focused?

@alexrichey alexrichey merged commit 3314ca3 into main Feb 6, 2026
25 checks passed
@alexrichey alexrichey deleted the ar-migrate-storage-builds-drafts-2 branch February 6, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants