[Fixes #13966] Harvesters: Integrity error when a title is updated in the remote WMS service by Gpetrak · Pull Request #13967 · GeoNode/geonode

Gpetrak · 2026-02-17T07:28:21Z

This PR is created according to this issue #13966

Checklist

Reviewing is a process done by project maintainers, mostly on a volunteer basis. We try to keep the overhead as small as possible and appreciate if you help us to do so by completing the following items. Feel free to ask in a comment if you have troubles with any of them.

For all pull requests:

Confirm you have read the contribution guidelines
You have sent a Contribution Licence Agreement (CLA) as necessary (not required for small changes, e.g., fixing typos in the documentation)
Make sure the first PR targets the master branch, eventual backports will be managed later. This can be ignored if the PR is fixing an issue that only happens in a specific branch, but not in newer ones.

The following are required only for core and extension modules (they are welcomed, but not required, for contrib modules):

There is a ticket in https://github.com/GeoNode/geonode/issues describing the issue/improvement/feature (a notable exemption is, changes not visible to end-users)
The issue connected to the PR must have Labels and Milestone assigned
PR for bug fixes and small new features are presented as a single commit
PR title must be in the form "[Fixes #<issue_number>] Title of the PR"
New unit tests have been added covering the changes, unless there is an explanation on why the tests are not necessary/implemented

Submitting the PR does not require you to check all items, but by the time it gets merged, they should be either satisfied or inapplicable.

gemini-code-assist · 2026-02-17T07:28:35Z

Summary of Changes

Hello @Gpetrak, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an IntegrityError that could occur during the harvesting process when a remote WMS service updates the title of an existing resource. The changes ensure that the system gracefully handles these title changes by updating the local HarvestableResource entry instead of failing, thus improving the robustness of the harvesting mechanism.

Highlights

Integrity Error Handling: Implemented robust error handling for IntegrityError during get_or_create operations in the harvesting batch update task, specifically addressing race conditions when a resource might be created by another worker.
Resource Title and Type Updates: Ensured that if a HarvestableResource already exists but its remote title or resource type has changed, the local record is updated accordingly, preventing data inconsistencies.
New Test Coverage: Added a new unit test case to validate that the system correctly handles remote title mismatches during batch updates without generating integrity errors or creating duplicate entries.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

geonode/harvesting/tasks.py
- Imported IntegrityError from django.db to enable specific error handling.
- Wrapped the HarvestableResource.objects.get_or_create call within an atomic transaction and a try-except IntegrityError block to catch potential race conditions.
- Added logic to retrieve an existing HarvestableResource if an IntegrityError occurs, assuming another process created it concurrently.
- Implemented an update mechanism for the title and remote_resource_type of an existing HarvestableResource if it was not newly created, reflecting changes from the remote service.
geonode/harvesting/tests/test_tasks.py
- Introduced test_update_batch_corrects_title_mismatch to simulate a remote resource title change.
- Verified that the HarvestableResource is updated correctly with the new title and resource type.
- Asserted that no new records are created when a title mismatch occurs, confirming the fix for duplicate entries.

Activity

The pull request was created to address issue Harvesters: Discovery sessions fail because of a "duplicate key" database error #13966, which describes an integrity error when a title is updated in a remote WMS service.
The author, Gpetrak, has provided a standard checklist for contributions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request effectively resolves the IntegrityError encountered when remote resource titles are updated by moving the title field to the defaults parameter in get_or_create. It also introduces robust handling for race conditions using transaction.atomic() and IntegrityError catching. The added test case correctly validates the fix. I have suggested a minor optimization to avoid redundant database writes for newly created resources and to ensure timestamp consistency within the batch processing.

codecov · 2026-02-17T08:30:46Z

Codecov Report

❌ Patch coverage is 88.46154% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.25%. Comparing base (a3959be) to head (adb879c).
⚠️ Report is 28 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #13967      +/-   ##
==========================================
+ Coverage   74.24%   74.25%   +0.01%     
==========================================
  Files         947      950       +3     
  Lines       56620    56797     +177     
  Branches     7675     7699      +24     
==========================================
+ Hits        42038    42176     +138     
- Misses      12892    12925      +33     
- Partials     1690     1696       +6

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

giohappy · 2026-02-17T10:16:09Z

geonode/harvesting/tasks.py

-                    },
-                )
+                try:
+                    with transaction.atomic():


why so we need transaction.atomic here @Gpetrak ?

@giohappy I added the atomic block to handle concurrency safety for our 10 parallel harvesting workers (as an extra safety level). While the primary bug was caused by a metadata mismatch, this addition prevents the discovery session from crashing because of the multiple workers.

In this case it might be redundant the atomic block as specified in the django doc:
https://docs.djangoproject.com/en/5.2/ref/models/querysets/#get-or-create
In this case the unique of the two fields is present, so it should be automatically used as an atomic transaction

@mattiagiupponi you are right and thank you that mention it, but I don't think that it's 100% reduntant. The problem is not the Django itself but the whole database transaction inside the task execution (Celery) workflow. In a UniqueViolation error, the database aborts the current transaction. While Django's get_or_create handles the logic, it doesn't clear the 'Aborted' state from the database connection. By wrapping it in a manual atomic block, we create a savepoint. This allows us to roll back only the collision, clean the connection, and safely call .save() and continue the rest of the loop without crashing the whole Celery task. I think the problem here is not Django but Celery / PostgreSQL.
Anyway, if we want to prioritize code simplicity, I’m fine with removing it for now. We can monitor the logs for Transaction-aborted errors and re-add the savepoint if the high-concurrency proves it's necessary.

I see you already removed the transaction.atomic() context in your last commit, but for clarity I want to highlight that the transaction is already handled inside the get_or_create method. Wrapping the method inside another atomic transaction would be useless.

def get_or_create(self, defaults=None, **kwargs): """ Look up an object with the given kwargs, creating one if necessary. Return a tuple of (object, created), where created is a boolean specifying whether an object was created. """ # The get() needs to be targeted at the write database in order # to avoid potential transaction consistency problems. self._for_write = True try: return self.get(**kwargs), False except self.model.DoesNotExist: params = self._extract_model_params(defaults, **kwargs) # Try to create an object using passed params. try: with transaction.atomic(using=self.db): params = dict(resolve_callables(params)) return self.create(**params), True except IntegrityError: try: return self.get(**kwargs), False except self.model.DoesNotExist: pass raise

… the remote WMS service (#13967) * handling the title mismatch with GeoNode and the remote WMS service (cherry picked from commit b94cb9a)

… the remote WMS service (#13967) (#13972) * handling the title mismatch with GeoNode and the remote WMS service (cherry picked from commit b94cb9a) Co-authored-by: George Petrakis <gkpetrak@gmail.com>

Gpetrak added 2 commits February 16, 2026 17:56

handling the title mismatch with GeoNode and the remote WMS service

c06e024

adding a test for the title conflict

ac8b88e

Gpetrak added this to the 5.1.0 milestone Feb 17, 2026

Gpetrak requested a review from mattiagiupponi February 17, 2026 07:28

Gpetrak assigned Gpetrak and mattiagiupponi Feb 17, 2026

Gpetrak added bug backport 5.0.x labels Feb 17, 2026

cla-bot bot added the cla-signed CLA Bot: community license agreement signed label Feb 17, 2026

gemini-code-assist bot reviewed Feb 17, 2026

View reviewed changes

Gpetrak marked this pull request as ready for review February 17, 2026 08:18

Gpetrak removed their assignment Feb 17, 2026

Gpetrak linked an issue Feb 17, 2026 that may be closed by this pull request

Harvesters: Discovery sessions fail because of a "duplicate key" database error #13966

Open

Gpetrak mentioned this pull request Feb 17, 2026

Harvesters: Discovery sessions fail because of a "duplicate key" database error #13966

Open

giohappy reviewed Feb 17, 2026

View reviewed changes

removing atomic block

adb879c

mattiagiupponi approved these changes Feb 18, 2026

View reviewed changes

mattiagiupponi merged commit b94cb9a into master Feb 18, 2026
17 checks passed

mattiagiupponi deleted the ISSUE_13966 branch February 18, 2026 11:21

github-actions bot mentioned this pull request Feb 18, 2026

[Backport 5.0.x] [Fixes #13966] Harvesters: Integrity error when a title is updated in the remote WMS service #13972

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fixes #13966] Harvesters: Integrity error when a title is updated in the remote WMS service #13967

[Fixes #13966] Harvesters: Integrity error when a title is updated in the remote WMS service #13967
mattiagiupponi merged 3 commits intomasterfrom
ISSUE_13966

Gpetrak commented Feb 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

codecov bot commented Feb 17, 2026 •

edited

Loading

Uh oh!

giohappy Feb 17, 2026

Uh oh!

Gpetrak Feb 17, 2026

Uh oh!

mattiagiupponi Feb 17, 2026

Uh oh!

Gpetrak Feb 18, 2026

Uh oh!

giohappy Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Gpetrak commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

gemini-code-assist bot commented Feb 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

codecov bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

giohappy Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Gpetrak Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

mattiagiupponi Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Gpetrak Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

giohappy Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Gpetrak commented Feb 17, 2026 •

edited

Loading

codecov bot commented Feb 17, 2026 •

edited

Loading