Skip to content

[Fixes #13966] Harvesters: Integrity error when a title is updated in the remote WMS service #13967

Merged
mattiagiupponi merged 3 commits intomasterfrom
ISSUE_13966
Feb 18, 2026
Merged

[Fixes #13966] Harvesters: Integrity error when a title is updated in the remote WMS service #13967
mattiagiupponi merged 3 commits intomasterfrom
ISSUE_13966

Conversation

@Gpetrak
Copy link
Member

@Gpetrak Gpetrak commented Feb 17, 2026

This PR is created according to this issue #13966

Checklist

Reviewing is a process done by project maintainers, mostly on a volunteer basis. We try to keep the overhead as small as possible and appreciate if you help us to do so by completing the following items. Feel free to ask in a comment if you have troubles with any of them.

For all pull requests:

  • Confirm you have read the contribution guidelines
  • You have sent a Contribution Licence Agreement (CLA) as necessary (not required for small changes, e.g., fixing typos in the documentation)
  • Make sure the first PR targets the master branch, eventual backports will be managed later. This can be ignored if the PR is fixing an issue that only happens in a specific branch, but not in newer ones.

The following are required only for core and extension modules (they are welcomed, but not required, for contrib modules):

  • There is a ticket in https://github.com/GeoNode/geonode/issues describing the issue/improvement/feature (a notable exemption is, changes not visible to end-users)
  • The issue connected to the PR must have Labels and Milestone assigned
  • PR for bug fixes and small new features are presented as a single commit
  • PR title must be in the form "[Fixes #<issue_number>] Title of the PR"
  • New unit tests have been added covering the changes, unless there is an explanation on why the tests are not necessary/implemented

Submitting the PR does not require you to check all items, but by the time it gets merged, they should be either satisfied or inapplicable.

@Gpetrak Gpetrak added this to the 5.1.0 milestone Feb 17, 2026
@cla-bot cla-bot bot added the cla-signed CLA Bot: community license agreement signed label Feb 17, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Gpetrak, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an IntegrityError that could occur during the harvesting process when a remote WMS service updates the title of an existing resource. The changes ensure that the system gracefully handles these title changes by updating the local HarvestableResource entry instead of failing, thus improving the robustness of the harvesting mechanism.

Highlights

  • Integrity Error Handling: Implemented robust error handling for IntegrityError during get_or_create operations in the harvesting batch update task, specifically addressing race conditions when a resource might be created by another worker.
  • Resource Title and Type Updates: Ensured that if a HarvestableResource already exists but its remote title or resource type has changed, the local record is updated accordingly, preventing data inconsistencies.
  • New Test Coverage: Added a new unit test case to validate that the system correctly handles remote title mismatches during batch updates without generating integrity errors or creating duplicate entries.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • geonode/harvesting/tasks.py
    • Imported IntegrityError from django.db to enable specific error handling.
    • Wrapped the HarvestableResource.objects.get_or_create call within an atomic transaction and a try-except IntegrityError block to catch potential race conditions.
    • Added logic to retrieve an existing HarvestableResource if an IntegrityError occurs, assuming another process created it concurrently.
    • Implemented an update mechanism for the title and remote_resource_type of an existing HarvestableResource if it was not newly created, reflecting changes from the remote service.
  • geonode/harvesting/tests/test_tasks.py
    • Introduced test_update_batch_corrects_title_mismatch to simulate a remote resource title change.
    • Verified that the HarvestableResource is updated correctly with the new title and resource type.
    • Asserted that no new records are created when a title mismatch occurs, confirming the fix for duplicate entries.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request effectively resolves the IntegrityError encountered when remote resource titles are updated by moving the title field to the defaults parameter in get_or_create. It also introduces robust handling for race conditions using transaction.atomic() and IntegrityError catching. The added test case correctly validates the fix. I have suggested a minor optimization to avoid redundant database writes for newly created resources and to ensure timestamp consistency within the batch processing.

@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 88.46154% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.25%. Comparing base (a3959be) to head (adb879c).
⚠️ Report is 28 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #13967      +/-   ##
==========================================
+ Coverage   74.24%   74.25%   +0.01%     
==========================================
  Files         947      950       +3     
  Lines       56620    56797     +177     
  Branches     7675     7699      +24     
==========================================
+ Hits        42038    42176     +138     
- Misses      12892    12925      +33     
- Partials     1690     1696       +6     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

},
)
try:
with transaction.atomic():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why so we need transaction.atomic here @Gpetrak ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giohappy I added the atomic block to handle concurrency safety for our 10 parallel harvesting workers (as an extra safety level). While the primary bug was caused by a metadata mismatch, this addition prevents the discovery session from crashing because of the multiple workers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it might be redundant the atomic block as specified in the django doc:

Image https://docs.djangoproject.com/en/5.2/ref/models/querysets/#get-or-create

In this case the unique of the two fields is present, so it should be automatically used as an atomic transaction

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattiagiupponi you are right and thank you that mention it, but I don't think that it's 100% reduntant. The problem is not the Django itself but the whole database transaction inside the task execution (Celery) workflow. In a UniqueViolation error, the database aborts the current transaction. While Django's get_or_create handles the logic, it doesn't clear the 'Aborted' state from the database connection. By wrapping it in a manual atomic block, we create a savepoint. This allows us to roll back only the collision, clean the connection, and safely call .save() and continue the rest of the loop without crashing the whole Celery task. I think the problem here is not Django but Celery / PostgreSQL.
Anyway, if we want to prioritize code simplicity, I’m fine with removing it for now. We can monitor the logs for Transaction-aborted errors and re-add the savepoint if the high-concurrency proves it's necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you already removed the transaction.atomic() context in your last commit, but for clarity I want to highlight that the transaction is already handled inside the get_or_create method. Wrapping the method inside another atomic transaction would be useless.

def get_or_create(self, defaults=None, **kwargs):
        """
        Look up an object with the given kwargs, creating one if necessary.
        Return a tuple of (object, created), where created is a boolean
        specifying whether an object was created.
        """
        # The get() needs to be targeted at the write database in order
        # to avoid potential transaction consistency problems.
        self._for_write = True
        try:
            return self.get(**kwargs), False
        except self.model.DoesNotExist:
            params = self._extract_model_params(defaults, **kwargs)
            # Try to create an object using passed params.
            try:
                with transaction.atomic(using=self.db):
                    params = dict(resolve_callables(params))
                    return self.create(**params), True
            except IntegrityError:
                try:
                    return self.get(**kwargs), False
                except self.model.DoesNotExist:
                    pass
                raise

@mattiagiupponi mattiagiupponi merged commit b94cb9a into master Feb 18, 2026
17 checks passed
@mattiagiupponi mattiagiupponi deleted the ISSUE_13966 branch February 18, 2026 11:21
github-actions bot pushed a commit that referenced this pull request Feb 18, 2026
… the remote WMS service (#13967)

* handling the title mismatch with GeoNode and the remote WMS service

(cherry picked from commit b94cb9a)
mattiagiupponi pushed a commit that referenced this pull request Feb 18, 2026
… the remote WMS service (#13967) (#13972)

* handling the title mismatch with GeoNode and the remote WMS service

(cherry picked from commit b94cb9a)

Co-authored-by: George Petrakis <gkpetrak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 5.0.x bug cla-signed CLA Bot: community license agreement signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Harvesters: Discovery sessions fail because of a "duplicate key" database error

3 participants