fix(transport): prevent duplicate AuthPolicy creation on retry after timeout#826
fix(transport): prevent duplicate AuthPolicy creation on retry after timeout#826ankitpatnaik-atlan merged 8 commits intomainfrom
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Aryamanz29
left a comment
There was a problem hiding this comment.
Looks good - a few requested changes below 🙏
Could you please also add unit and integration tests for this change? Since this is a direct change to the SDK transport layer, we need those tests in place to make sure things are working as expected and to prevent any future regressions.
| "RETRY PREVENTED: Policy already exists (likely from previous " | ||
| "request that timed out but succeeded). Returning existing policy." | ||
| ) | ||
| return duplicate_response |
There was a problem hiding this comment.
Duplicate check runs before backoff sleep, likely missing index
High Severity
The duplicate policy check runs before retry.increment() and retry.sleep(response), meaning the IndexSearch fires immediately after the timeout with zero delay. Since the whole point of this feature is to detect a policy that was just created by a timed-out request, the Elasticsearch index almost certainly hasn't reflected the new entity yet. The check will return None, the retry proceeds, and a duplicate policy is created anyway — defeating the entire purpose of this PR. The check needs to happen after the sleep/backoff to give the search index time to catch up.
Additional Locations (1)
| PERSONA_NAME = "New" | ||
| CONNECTION_QN = "default/redshift/1769838984" | ||
| MODULE_NAME = TestId.make_unique("TransportRetry") |
There was a problem hiding this comment.
Since this is an integration test; so instead of hardcoding this for specific tenant constant, I would create a new fixture that would create a new persona and connection for me for this test; example:
There was a problem hiding this comment.
The integration test would fail without this change @ankitpatnaik-atlan ^^
There was a problem hiding this comment.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| "RETRY PREVENTED: Policy already exists (likely from previous " | ||
| "request that timed out but succeeded). Returning existing policy." | ||
| ) | ||
| return duplicate_response |
There was a problem hiding this comment.
Search failure crashes retry instead of proceeding normally
High Severity
check_for_duplicate_policy can raise UNABLE_TO_SEARCH_EXISTING_POLICY if the index search fails, and this exception is uncaught inside _retry_operation. A transient search-service outage would cause the entire operation to abort with a confusing search error, instead of letting the retry proceed normally. The duplicate check is meant to be a best-effort optimization, but its failure blocks the retry mechanism entirely — making things worse than if the feature didn't exist.
Additional Locations (2)
| "RETRY PREVENTED: Policy already exists (likely from previous " | ||
| "request that timed out but succeeded). Returning existing policy." | ||
| ) | ||
| return duplicate_response |
There was a problem hiding this comment.
Duplicate check triggers on non-timeout retryable errors too
Medium Severity
The duplicate-prevention check runs on every retry attempt regardless of the original error type — not just after timeouts. If a bulk POST returns a retryable status code (e.g., 503) without creating the entity, and a policy with the same name and persona GUID already exists from a prior unrelated operation, the check incorrectly short-circuits the retry with a mock response containing the pre-existing policy instead of retrying the actual request.
Additional Locations (1)
| PERSONA_NAME = "New" | ||
| CONNECTION_QN = "default/redshift/1769838984" | ||
| MODULE_NAME = TestId.make_unique("TransportRetry") |
There was a problem hiding this comment.
The integration test would fail without this change @ankitpatnaik-atlan ^^
| PERSONA_NAME = "New" | ||
| CONNECTION_QN = "default/redshift/1769838984" | ||
| MODULE_NAME = TestId.make_unique("TransportRetry") |
There was a problem hiding this comment.


✨ Description
Fixes a race condition where a
POST /api/meta/entity/bulkrequest to create anAuthPolicysucceeds on the server but times out on the client side. On retry, the SDK was blindly re-sending the request, resulting in duplicate policies being created silently.This PR adds duplicate detection logic inside the sync (and async) transport retry path: before retrying a bulk entity creation, the transport searches for an existing
AuthPolicywith the same name and persona GUID. If found, it returns a synthetic success response with the existing policy — preventing the duplicate.Jira link: GOV-667
🧩 Type of change
🔍 Root Cause & Fixes
Three bugs were found and fixed in
pyatlan/client/transport.py:_check_for_duplicate_policyattributes.personato extract persona GUID — this key doesn't exist in the serialized request bodyattributes.accessControl.guid, which is the actual key used by the SDK_find_existing_policyTerm.with_name("name.keyword", policy_name)—with_nameonly accepts one argument (value), not (field, value)Term(field="name.keyword", value=policy_name)_find_existing_policyTerm.with_type_name("AuthPolicy")— wrong helper for a raw type filterTerm(field="__typeName.keyword", value="AuthPolicy")✅ How has this been tested?
A mock-based integration test was run against a live tenant (
datamesh2.atlan.com) to verify the full retry + duplicate prevention flow end-to-end.Test flow:
Newon the tenant (real API call)httpx.ReadTimeout_check_for_duplicate_policyindexsearch and return the fake existing policyTest output:
What this confirms:
ReadTimeout_check_for_duplicate_policynow correctly extracts the persona GUID fromaccessControl_find_existing_policynow builds a valid indexsearch query usingTerm(field=..., value=...)📋 Checklist
Screenshots
Note
Medium Risk
Touches the core HTTP retry transport and adds extra
INDEX_SEARCHcalls during retry, which can alter behavior and failure modes for policy-creation requests under transient network errors.Overview
Prevents duplicate
AuthPolicycreation when aPOST /api/meta/entity/bulkrequest times out client-side but likely succeeded server-side, by adding a retry-time duplicate check that searches for an existing policy (same name + persona) and returns a synthetic success response instead of re-sending the bulk create.Wires the
AtlanClient/AsyncAtlanClientinstance intoPyatlanSyncTransportandPyatlanAsyncTransport(includingmax_retriescontext managers) and introduces shared helpers inpyatlan/client/common/transport.pyto parse bulk requests, runINDEX_SEARCH, and build the mock response. Adds a newErrorCode.UNABLE_TO_SEARCH_EXISTING_POLICYplus unit and live-tenant integration tests covering both the “retry proceeds” and “retry short-circuits” paths for sync and async transports.Written by Cursor Bugbot for commit a8aeffc. This will update automatically on new commits. Configure here.