Support partial updates in STACAPIJobDatabase.persist #794

soxofaan · 2025-08-08T23:06:23Z

for STACAPIJobDatabase persist: support partial updates #793:
related to unblocking Hv issue719 job manager threaded job start #736

related to PR #736

soxofaan · 2025-08-08T23:10:04Z

this is initial attempt to add support for partial updates in STACAPIJobDatabase.persist, but there are still some tests failing

soxofaan · 2025-08-08T23:10:25Z

cc @HansVRP

HansVRP · 2025-08-11T13:46:03Z

I'll take a look this week

HansVRP · 2025-08-12T08:01:46Z

main issue seems to be related to the item_id moving from a column to the dataframe index; causing a mismatch in size.

and the item_id is no longer popped out

HansVRP · 2025-08-13T11:24:55Z

there were some inconsistencies in the test in terms of the mocks, and the string based nature of the IDs; however I am uncertain the current version would not cause regression. I'd like to test it on our cropsar stac based job manager and compare the input-output items.

WEED is also using the stac based job manager, so every change needs to be validated thoroughly

soxofaan · 2025-08-18T11:56:34Z

main issue seems to be related to the item_id moving from a column to the dataframe index;

Indeed that was intentional in my initial commit to allow partial updates, where you need a meaningful index (instead of an auto-increment one). So I changed the "item_id" column to be the index.

But if I understand you correctly, there are users or use cases that expect an "item_id" column in the data frame?
I wonder however why, as the pandas dataframe is (or at least should be) internal business. If you use STACAPIJobDatabase, you want to persist your data to a STAC API, and don't care about dataframes, or am I misunderstanding? Do you have more info on why STACAPIJobDatabase users interact with the pandas internals?

HansVRP · 2025-08-18T11:59:14Z

Maybe good to discuss tomorrow; It's rather that there have already been 2 workflows build on top of the current stac based job manager; I want to avoid that their STAC collection becomes inconsistent

soxofaan · 2025-08-18T12:01:42Z

openeo/extra/job_management/stac_job_db.py

+
+        # Handle datetime
+        dt = series_dict.get("datetime")
+        if not dt:


I understand the need to bring back the series_dict.pop("item_id"), but are the other changes relevant here?

I'd like to avoid that this PR review also spirals out of scope

I understand the need to bring back the series_dict.pop("item_id")

on further consideration: I'd like to reconsider:
"item_id" as column name has no special meaning anymore, so it should not get special treatment (meaning it should not be popped)

soxofaan · 2025-08-18T12:04:08Z

openeo/extra/job_management/stac_job_db.py

-        if self.has_geometry:
-            item_dict["geometry"] = series[self.geometry_column]
-        else:
-            item_dict["geometry"] = None


is removal of these lines relevant here?

as noted, I'd like to keep this PR focused to avoid it strands in eternal review

Might still be in use in certain use cases further eliminate fixture anti-patterns in tests, allowing more parameterization

soxofaan · 2025-08-19T07:24:37Z

openeo/extra/job_management/stac_job_db.py

+        else:
+            # Merge data on item_id (in the index)
+            df_to_persist = existing_df
+            df_to_persist.update(df, overwrite=True)


While working on test coverage, it turned out that this pandas update might cause data loss:
it only updates the intersection of both dataframes, so if there is a mismatch between items in existing_df and df, there will be less updates than expected

cherry-picked from #794/#798

Might still be in use in certain use cases further eliminate fixture anti-patterns in tests, allowing more parameterization

Eliminate some fixture anti-patterns (too much abstraction and decoupling) Based on working on #794 and #798

further elimination of unnecessary fixtures Based on working on #794 and #798

soxofaan · 2025-08-20T10:20:40Z

as mentioned in #793 (comment) : let's move the task of merging existing data with updates to the job manager (instead of requiring each job db implementation to do this correctly.
(Requires introduction of a new API JobDatabaseInterface.get_by_indices, but that should not be too hard to implement).

This closes this PR (without merge).
(Note that some test related tweaks were ported to master anyway)

cherry-picked from #794/#798

Eliminate some fixture anti-patterns (too much abstraction and decoupling) Based on working on #794 and #798

further elimination of unnecessary fixtures Based on working on #794 and #798

cherry-picked from #794/#798

Eliminate some fixture anti-patterns (too much abstraction and decoupling) Based on working on #794 and #798

further elimination of unnecessary fixtures Based on working on #794 and #798

Issue 793: Support partial updates in STACAPIJobDatabase.persist

be42cd7

related to PR #736

soxofaan mentioned this pull request Aug 8, 2025

Hv issue719 job manager threaded job start #736

Closed

HansVRP added 6 commits August 12, 2025 11:43

instill string based indiches in dummys

1cec65e

legacy addition of item_id column

55fa6bf

remove item_id from STAC items

d17430d

ensure consistent indexing upon appending by additional normalization

3242824

fix test

f48c148

changed unit test to account for expected mock output

b87c86f

consider a few additional tests on error handling and thread safety

2d975dd

soxofaan commented Aug 18, 2025

View reviewed changes

soxofaan mentioned this pull request Aug 18, 2025

STACAPIJobDatabase.item_from always uses "now" as datetime? #797

Closed

soxofaan added a commit that referenced this pull request Aug 18, 2025

Issue #793/#794 preserve "item_id" column for now

3130384

Might still be in use in certain use cases further eliminate fixture anti-patterns in tests, allowing more parameterization

soxofaan added a commit that referenced this pull request Aug 18, 2025

fixup! Issue #793/#794 preserve "item_id" column for now

b8ac36e

soxofaan commented Aug 19, 2025

View reviewed changes

soxofaan mentioned this pull request Aug 19, 2025

Support partial updates in STACAPIJobDatabase.persist (take 2) #798

Closed

soxofaan added a commit that referenced this pull request Aug 19, 2025

DummyStacApi._get_search tweak

09e8fe4

cherry-picked from #794/#798

soxofaan added a commit that referenced this pull request Aug 19, 2025

Issue #793/#794 preserve "item_id" column for now

f01a2a1

Might still be in use in certain use cases further eliminate fixture anti-patterns in tests, allowing more parameterization

soxofaan mentioned this pull request Aug 19, 2025

STACAPIJobDatabase persist: support partial updates #793

Closed

soxofaan added a commit that referenced this pull request Aug 20, 2025

Finetune STACAPIJobDatabase tests

36b6155

Eliminate some fixture anti-patterns (too much abstraction and decoupling) Based on working on #794 and #798

soxofaan added a commit that referenced this pull request Aug 20, 2025

Finetune STACAPIJobDatabase tests

481ab0c

Eliminate some fixture anti-patterns (too much abstraction and decoupling) Based on working on #794 and #798

soxofaan added a commit that referenced this pull request Aug 20, 2025

Make TestSTACAPIJobDatabase.test_get_by_status_result less fake

94a97c8

further elimination of unnecessary fixtures Based on working on #794 and #798

soxofaan closed this Aug 20, 2025

soxofaan deleted the issue793-stac-api-job-db-persist-partial-update branch August 20, 2025 10:21

soxofaan added a commit that referenced this pull request Sep 9, 2025

DummyStacApi._get_search tweak

dc7c626

cherry-picked from #794/#798

soxofaan added a commit that referenced this pull request Sep 9, 2025

Finetune STACAPIJobDatabase tests

84d8ecc

Eliminate some fixture anti-patterns (too much abstraction and decoupling) Based on working on #794 and #798

soxofaan added a commit that referenced this pull request Sep 9, 2025

Make TestSTACAPIJobDatabase.test_get_by_status_result less fake

b3f665a

further elimination of unnecessary fixtures Based on working on #794 and #798

soxofaan added a commit that referenced this pull request Sep 9, 2025

DummyStacApi._get_search tweak

efbbbd1

cherry-picked from #794/#798

soxofaan added a commit that referenced this pull request Sep 9, 2025

Finetune STACAPIJobDatabase tests

e14cc9d

Eliminate some fixture anti-patterns (too much abstraction and decoupling) Based on working on #794 and #798

soxofaan added a commit that referenced this pull request Sep 9, 2025

Make TestSTACAPIJobDatabase.test_get_by_status_result less fake

36b3a22

further elimination of unnecessary fixtures Based on working on #794 and #798

Support partial updates in STACAPIJobDatabase.persist #794

Support partial updates in STACAPIJobDatabase.persist #794

Uh oh!

Conversation

soxofaan commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soxofaan commented Aug 8, 2025

Uh oh!

soxofaan commented Aug 8, 2025

Uh oh!

HansVRP commented Aug 11, 2025

Uh oh!

HansVRP commented Aug 12, 2025

Uh oh!

HansVRP commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soxofaan commented Aug 18, 2025

Uh oh!

HansVRP commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soxofaan Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

soxofaan Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soxofaan Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

soxofaan Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

soxofaan commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

soxofaan commented Aug 8, 2025 •

edited

Loading

HansVRP commented Aug 13, 2025 •

edited

Loading

HansVRP commented Aug 18, 2025 •

edited

Loading

soxofaan Aug 18, 2025 •

edited

Loading