fix(transport): prevent duplicate AuthPolicy creation on retry after timeout by ankitpatnaik-atlan · Pull Request #826 · atlanhq/atlan-python

ankitpatnaik-atlan · 2026-03-06T12:22:38Z

✨ Description

Fixes a race condition where a POST /api/meta/entity/bulk request to create an AuthPolicy succeeds on the server but times out on the client side. On retry, the SDK was blindly re-sending the request, resulting in duplicate policies being created silently.

This PR adds duplicate detection logic inside the sync (and async) transport retry path: before retrying a bulk entity creation, the transport searches for an existing AuthPolicy with the same name and persona GUID. If found, it returns a synthetic success response with the existing policy — preventing the duplicate.

Jira link: GOV-667

🧩 Type of change

🐛 Bug fix (non-breaking change that fixes an issue)

🔍 Root Cause & Fixes

Three bugs were found and fixed in pyatlan/client/transport.py:

#	Method	Bug	Fix
1	`_check_for_duplicate_policy`	Read `attributes.persona` to extract persona GUID — this key doesn't exist in the serialized request body	Changed to read `attributes.accessControl.guid`, which is the actual key used by the SDK
2	`_find_existing_policy`	Called `Term.with_name("name.keyword", policy_name)` — `with_name` only accepts one argument (value), not (field, value)	Changed to `Term(field="name.keyword", value=policy_name)`
3	`_find_existing_policy`	Called `Term.with_type_name("AuthPolicy")` — wrong helper for a raw type filter	Changed to `Term(field="__typeName.keyword", value="AuthPolicy")`

✅ How has this been tested?

A mock-based integration test was run against a live tenant (datamesh2.atlan.com) to verify the full retry + duplicate prevention flow end-to-end.

Test flow:

Find persona New on the tenant (real API call)
Patch the inner transport to intercept bulk POST requests
On attempt Create initial prototype SDK #1: simulate the policy being created server-side, then raise httpx.ReadTimeout
On retry: intercept the _check_for_duplicate_policy indexsearch and return the fake existing policy
Assert that the bulk POST was only called once and the correct policy GUID was returned

Test output:

Step 1: Finding persona 'New'...
  Found: New (guid: 7610d810-f9e0-4d0c-bdfa-15feb957353e)

Step 2: Patching transport mock...

Step 3: Saving policy 'Test_Retry_1772798846'...
  → Bulk POST attempt #1
  → Policy 'created' (guid: fake-policy-guid-abc123), simulating timeout...
  → Duplicate check indexsearch intercepted, returning fake existing policy...

✅ Got response after 1 bulk attempt(s)
✅ Policy guid in response: fake-policy-guid-abc123
✅ DUPLICATE PREVENTION WORKED: bulk POST only called once, retry short-circuited via indexsearch

What this confirms:

Retry triggers correctly after ReadTimeout
_check_for_duplicate_policy now correctly extracts the persona GUID from accessControl
_find_existing_policy now builds a valid indexsearch query using Term(field=..., value=...)
On finding the existing policy, the retry is short-circuited and the original policy GUID is returned — no duplicate created

📋 Checklist

My code follows the project's style guidelines
I've performed a self-review of my code
I've added comments in tricky or complex areas
All new and existing tests pass locally

Screenshots

Unit tests

Integration tests

Note

Medium Risk
Touches the core HTTP retry transport and adds extra INDEX_SEARCH calls during retry, which can alter behavior and failure modes for policy-creation requests under transient network errors.

Overview
Prevents duplicate AuthPolicy creation when a POST /api/meta/entity/bulk request times out client-side but likely succeeded server-side, by adding a retry-time duplicate check that searches for an existing policy (same name + persona) and returns a synthetic success response instead of re-sending the bulk create.

Wires the AtlanClient/AsyncAtlanClient instance into PyatlanSyncTransport and PyatlanAsyncTransport (including max_retries context managers) and introduces shared helpers in pyatlan/client/common/transport.py to parse bulk requests, run INDEX_SEARCH, and build the mock response. Adds a new ErrorCode.UNABLE_TO_SEARCH_EXISTING_POLICY plus unit and live-tenant integration tests covering both the “retry proceeds” and “retry short-circuits” paths for sync and async transports.

^{Written by Cursor Bugbot for commit a8aeffc. This will update automatically on new commits. Configure here.}

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Aryamanz29

Looks good - a few requested changes below 🙏

Could you please also add unit and integration tests for this change? Since this is a direct change to the SDK transport layer, we need those tests in place to make sure things are working as expected and to prevent any future regressions.

cursor · 2026-03-09T10:16:54Z

+                            "RETRY PREVENTED: Policy already exists (likely from previous "
+                            "request that timed out but succeeded). Returning existing policy."
+                        )
+                        return duplicate_response


Duplicate check runs before backoff sleep, likely missing index

High Severity

The duplicate policy check runs before retry.increment() and retry.sleep(response), meaning the IndexSearch fires immediately after the timeout with zero delay. Since the whole point of this feature is to detect a policy that was just created by a timed-out request, the Elasticsearch index almost certainly hasn't reflected the new entity yet. The check will return None, the retry proceeds, and a duplicate policy is created anyway — defeating the entire purpose of this PR. The check needs to happen after the sleep/backoff to give the search index time to catch up.

Additional Locations (1)

pyatlan/client/transport.py#L229-L240

Aryamanz29

Looking good - few changes below:

could you please also add integration tests for async implementation as well? 🙏 (under tests/integration/aio/test_transport.py)

Aryamanz29 · 2026-03-09T10:30:47Z

+PERSONA_NAME = "New"
+CONNECTION_QN = "default/redshift/1769838984"
+MODULE_NAME = TestId.make_unique("TransportRetry")


Since this is an integration test; so instead of hardcoding this for specific tenant constant, I would create a new fixture that would create a new persona and connection for me for this test; example:

atlan-python/tests/integration/persona_test.py

Line 56 in bd44034

def persona(

The integration test would fail without this change @ankitpatnaik-atlan ^^

https://github.com/atlanhq/atlan-python/actions/runs/22850777667/job/66279044998?pr=826

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-09T11:19:28Z

+                            "RETRY PREVENTED: Policy already exists (likely from previous "
+                            "request that timed out but succeeded). Returning existing policy."
+                        )
+                        return duplicate_response


Search failure crashes retry instead of proceeding normally

High Severity

check_for_duplicate_policy can raise UNABLE_TO_SEARCH_EXISTING_POLICY if the index search fails, and this exception is uncaught inside _retry_operation. A transient search-service outage would cause the entire operation to abort with a confusing search error, instead of letting the retry proceed normally. The duplicate check is meant to be a best-effort optimization, but its failure blocks the retry mechanism entirely — making things worse than if the feature didn't exist.

Additional Locations (2)

pyatlan/client/transport.py#L229-L239

pyatlan/client/common/transport.py#L102-L106

cursor · 2026-03-09T11:19:28Z

+                            "RETRY PREVENTED: Policy already exists (likely from previous "
+                            "request that timed out but succeeded). Returning existing policy."
+                        )
+                        return duplicate_response


Duplicate check triggers on non-timeout retryable errors too

Medium Severity

The duplicate-prevention check runs on every retry attempt regardless of the original error type — not just after timeouts. If a bulk POST returns a retryable status code (e.g., 503) without creating the entity, and a policy with the same name and persona GUID already exists from a prior unrelated operation, the check incorrectly short-circuits the retry with a mock response containing the pre-existing policy instead of retrying the actual request.

Additional Locations (1)

pyatlan/client/transport.py#L220-L239

Aryamanz29

We need to update integration tests:
#826 (comment)

Aryamanz29 · 2026-03-09T11:19:52Z

+PERSONA_NAME = "New"
+CONNECTION_QN = "default/redshift/1769838984"
+MODULE_NAME = TestId.make_unique("TransportRetry")


The integration test would fail without this change @ankitpatnaik-atlan ^^

Aryamanz29 · 2026-03-09T11:20:36Z

+PERSONA_NAME = "New"
+CONNECTION_QN = "default/redshift/1769838984"
+MODULE_NAME = TestId.make_unique("TransportRetry")


https://github.com/atlanhq/atlan-python/actions/runs/22850777667/job/66279044998?pr=826

Aryamanz29

LGTM - thanks!

GOV-667: Check if policy was created during retry

6f0535f

greptile-apps Bot reviewed Mar 6, 2026

View reviewed changes

cursor Bot reviewed Mar 6, 2026

View reviewed changes

Comment thread pyatlan/client/transport.py Outdated

Comment thread pyatlan/client/transport.py Outdated

ankitpatnaik-atlan changed the title ~~GOV-667: Check if policy was created during retry~~ fix(transport): prevent duplicate AuthPolicy creation on retry after timeout Mar 6, 2026

GOV-667: Fix review comments

b96e8f8

cursor Bot reviewed Mar 6, 2026

View reviewed changes

Comment thread pyatlan/client/transport.py

GOV-667: fix async transport path

49968b6