Skip to content

fix(transport): prevent duplicate AuthPolicy creation on retry after timeout#826

Merged
ankitpatnaik-atlan merged 8 commits intomainfrom
GOV-667
Mar 9, 2026
Merged

fix(transport): prevent duplicate AuthPolicy creation on retry after timeout#826
ankitpatnaik-atlan merged 8 commits intomainfrom
GOV-667

Conversation

@ankitpatnaik-atlan
Copy link
Copy Markdown
Contributor

@ankitpatnaik-atlan ankitpatnaik-atlan commented Mar 6, 2026

✨ Description

Fixes a race condition where a POST /api/meta/entity/bulk request to create an AuthPolicy succeeds on the server but times out on the client side. On retry, the SDK was blindly re-sending the request, resulting in duplicate policies being created silently.

This PR adds duplicate detection logic inside the sync (and async) transport retry path: before retrying a bulk entity creation, the transport searches for an existing AuthPolicy with the same name and persona GUID. If found, it returns a synthetic success response with the existing policy — preventing the duplicate.

Jira link: GOV-667


🧩 Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

🔍 Root Cause & Fixes

Three bugs were found and fixed in pyatlan/client/transport.py:

# Method Bug Fix
1 _check_for_duplicate_policy Read attributes.persona to extract persona GUID — this key doesn't exist in the serialized request body Changed to read attributes.accessControl.guid, which is the actual key used by the SDK
2 _find_existing_policy Called Term.with_name("name.keyword", policy_name)with_name only accepts one argument (value), not (field, value) Changed to Term(field="name.keyword", value=policy_name)
3 _find_existing_policy Called Term.with_type_name("AuthPolicy") — wrong helper for a raw type filter Changed to Term(field="__typeName.keyword", value="AuthPolicy")

✅ How has this been tested?

A mock-based integration test was run against a live tenant (datamesh2.atlan.com) to verify the full retry + duplicate prevention flow end-to-end.

Test flow:

  1. Find persona New on the tenant (real API call)
  2. Patch the inner transport to intercept bulk POST requests
  3. On attempt Create initial prototype SDK #1: simulate the policy being created server-side, then raise httpx.ReadTimeout
  4. On retry: intercept the _check_for_duplicate_policy indexsearch and return the fake existing policy
  5. Assert that the bulk POST was only called once and the correct policy GUID was returned

Test output:

Step 1: Finding persona 'New'...
  Found: New (guid: 7610d810-f9e0-4d0c-bdfa-15feb957353e)

Step 2: Patching transport mock...

Step 3: Saving policy 'Test_Retry_1772798846'...
  → Bulk POST attempt #1
  → Policy 'created' (guid: fake-policy-guid-abc123), simulating timeout...
  → Duplicate check indexsearch intercepted, returning fake existing policy...

✅ Got response after 1 bulk attempt(s)
✅ Policy guid in response: fake-policy-guid-abc123
✅ DUPLICATE PREVENTION WORKED: bulk POST only called once, retry short-circuited via indexsearch

What this confirms:

  • Retry triggers correctly after ReadTimeout
  • _check_for_duplicate_policy now correctly extracts the persona GUID from accessControl
  • _find_existing_policy now builds a valid indexsearch query using Term(field=..., value=...)
  • On finding the existing policy, the retry is short-circuited and the original policy GUID is returned — no duplicate created

📋 Checklist

  • My code follows the project's style guidelines
  • I've performed a self-review of my code
  • I've added comments in tricky or complex areas
  • All new and existing tests pass locally

Screenshots

  • Unit tests
Screenshot 2026-03-09 at 14 18 43
  • Integration tests
Screenshot 2026-03-09 at 16 42 33

Note

Medium Risk
Touches the core HTTP retry transport and adds extra INDEX_SEARCH calls during retry, which can alter behavior and failure modes for policy-creation requests under transient network errors.

Overview
Prevents duplicate AuthPolicy creation when a POST /api/meta/entity/bulk request times out client-side but likely succeeded server-side, by adding a retry-time duplicate check that searches for an existing policy (same name + persona) and returns a synthetic success response instead of re-sending the bulk create.

Wires the AtlanClient/AsyncAtlanClient instance into PyatlanSyncTransport and PyatlanAsyncTransport (including max_retries context managers) and introduces shared helpers in pyatlan/client/common/transport.py to parse bulk requests, run INDEX_SEARCH, and build the mock response. Adds a new ErrorCode.UNABLE_TO_SEARCH_EXISTING_POLICY plus unit and live-tenant integration tests covering both the “retry proceeds” and “retry short-circuits” paths for sync and async transports.

Written by Cursor Bugbot for commit a8aeffc. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Comment thread pyatlan/client/transport.py Outdated
Comment thread pyatlan/client/transport.py Outdated
@ankitpatnaik-atlan ankitpatnaik-atlan changed the title GOV-667: Check if policy was created during retry fix(transport): prevent duplicate AuthPolicy creation on retry after timeout Mar 6, 2026
Comment thread pyatlan/client/transport.py
Comment thread pyatlan/client/transport.py Outdated
Copy link
Copy Markdown
Member

@Aryamanz29 Aryamanz29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - a few requested changes below 🙏

Could you please also add unit and integration tests for this change? Since this is a direct change to the SDK transport layer, we need those tests in place to make sure things are working as expected and to prevent any future regressions.

Comment thread pyatlan/client/transport.py Outdated
Comment thread pyatlan/client/transport.py Outdated
Comment thread pyatlan/client/transport.py Outdated
Comment thread pyatlan/client/transport.py Outdated
Comment thread pyatlan/client/transport.py Outdated
Comment thread pyatlan/client/transport.py Outdated
Comment thread pyatlan/client/transport.py Outdated
Comment thread pyatlan/client/transport.py Outdated
@Aryamanz29 Aryamanz29 added feature New feature or request change Pyatlan change pull request labels Mar 9, 2026
Comment thread pyatlan/client/transport.py Outdated
Comment thread pyatlan/client/common/transport.py
Comment thread pyatlan/client/common/transport.py Outdated
"RETRY PREVENTED: Policy already exists (likely from previous "
"request that timed out but succeeded). Returning existing policy."
)
return duplicate_response
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate check runs before backoff sleep, likely missing index

High Severity

The duplicate policy check runs before retry.increment() and retry.sleep(response), meaning the IndexSearch fires immediately after the timeout with zero delay. Since the whole point of this feature is to detect a policy that was just created by a timed-out request, the Elasticsearch index almost certainly hasn't reflected the new entity yet. The check will return None, the retry proceeds, and a duplicate policy is created anyway — defeating the entire purpose of this PR. The check needs to happen after the sleep/backoff to give the search index time to catch up.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Member

@Aryamanz29 Aryamanz29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good - few changes below:

  • could you please also add integration tests for async implementation as well? 🙏 (under tests/integration/aio/test_transport.py)

Comment thread tests/integration/test_transport.py Outdated
Comment on lines +26 to +28
PERSONA_NAME = "New"
CONNECTION_QN = "default/redshift/1769838984"
MODULE_NAME = TestId.make_unique("TransportRetry")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an integration test; so instead of hardcoding this for specific tenant constant, I would create a new fixture that would create a new persona and connection for me for this test; example:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration test would fail without this change @ankitpatnaik-atlan ^^

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread tests/integration/transport_test.py Outdated
Comment thread tests/integration/transport_test.py Outdated
Comment thread pyatlan/client/transport.py Outdated
Comment thread tests/integration/test_transport.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

"RETRY PREVENTED: Policy already exists (likely from previous "
"request that timed out but succeeded). Returning existing policy."
)
return duplicate_response
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search failure crashes retry instead of proceeding normally

High Severity

check_for_duplicate_policy can raise UNABLE_TO_SEARCH_EXISTING_POLICY if the index search fails, and this exception is uncaught inside _retry_operation. A transient search-service outage would cause the entire operation to abort with a confusing search error, instead of letting the retry proceed normally. The duplicate check is meant to be a best-effort optimization, but its failure blocks the retry mechanism entirely — making things worse than if the feature didn't exist.

Additional Locations (2)

Fix in Cursor Fix in Web

"RETRY PREVENTED: Policy already exists (likely from previous "
"request that timed out but succeeded). Returning existing policy."
)
return duplicate_response
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate check triggers on non-timeout retryable errors too

Medium Severity

The duplicate-prevention check runs on every retry attempt regardless of the original error type — not just after timeouts. If a bulk POST returns a retryable status code (e.g., 503) without creating the entity, and a policy with the same name and persona GUID already exists from a prior unrelated operation, the check incorrectly short-circuits the retry with a mock response containing the pre-existing policy instead of retrying the actual request.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Member

@Aryamanz29 Aryamanz29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update integration tests:
#826 (comment)

Comment thread tests/integration/aio/test_transport.py Outdated
Comment thread tests/integration/test_transport.py Outdated
Comment on lines +26 to +28
PERSONA_NAME = "New"
CONNECTION_QN = "default/redshift/1769838984"
MODULE_NAME = TestId.make_unique("TransportRetry")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration test would fail without this change @ankitpatnaik-atlan ^^

Comment thread tests/integration/test_transport.py Outdated
Comment on lines +26 to +28
PERSONA_NAME = "New"
CONNECTION_QN = "default/redshift/1769838984"
MODULE_NAME = TestId.make_unique("TransportRetry")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

@Aryamanz29 Aryamanz29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@ankitpatnaik-atlan ankitpatnaik-atlan merged commit 2f35fa5 into main Mar 9, 2026
28 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change Pyatlan change pull request feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants