Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

403: Forbidden Response when AllowBulkExecution = true + Resource Token #1783

Open
InquisitorJax opened this issue Aug 17, 2020 · 18 comments · May be fixed by #3987
Open

403: Forbidden Response when AllowBulkExecution = true + Resource Token #1783

InquisitorJax opened this issue Aug 17, 2020 · 18 comments · May be fixed by #3987
Assignees

Comments

@InquisitorJax
Copy link

Describe the bug
When retrieving a resource token to make api calls to Azure Cosmos DB, and the client is created with the AllowBulkExecution = true, then Azure Cosmos DB returns a 403: Forbidden error on an api call.
The same options work fine when using the local CosmosDB emulator.
The call api calls also work fine if AllowBulkExecution setting is false.
I've checked my azure CosmosDB firewall settings: it's set to allow all networks.

To Reproduce
Can be reproduced using this project: https://github.com/InquisitorJax/Xamarin-Cosmos-DB
Check the readme file on setting up the local.settings.json file to retrieve the resource token from an azure instance.

Expected behavior
Api calls should be successful.

Actual behavior
403: Forbidden Error is returned:

Environment summary
SDK Version: 3.12.0
OS Version (e.g. Windows, Linux, MacOSX)
Windows

Additional context
Stack Trace:
{Microsoft.Azure.Cosmos.CosmosException : Response status code does not indicate success: Forbidden (403); Substatus: 0; ActivityId: ; Reason: ();
at Microsoft.Azure.Cosmos.ResponseMessage.EnsureSuccessStatusCode () [0x0000e] in <250931dbc7a64174b3f1ed93d3081ffb>:0
at Microsoft.Azure.Cosmos.CosmosResponseFactoryCore.ProcessMessage[T] (Microsoft.Azure.Cosmos.ResponseMessage responseMessage, System.Func2[T,TResult] createResponse) [0x00002] in <250931dbc7a64174b3f1ed93d3081ffb>:0 at Microsoft.Azure.Cosmos.CosmosResponseFactoryCore.CreateItemResponse[T] (Microsoft.Azure.Cosmos.ResponseMessage responseMessage) [0x00000] in <250931dbc7a64174b3f1ed93d3081ffb>:0 at Microsoft.Azure.Cosmos.ContainerCore.CreateItemAsync[T] (Microsoft.Azure.Cosmos.CosmosDiagnosticsContext diagnosticsContext, T item, System.Nullable1[T] partitionKey, Microsoft.Azure.Cosmos.ItemRequestOptions requestOptions, System.Threading.CancellationToken cancellationToken) [0x000b0] in <250931dbc7a64174b3f1ed93d3081ffb>:0
at Microsoft.Azure.Cosmos.ClientContextCore.RunWithDiagnosticsHelperAsync[TResult] (Microsoft.Azure.Cosmos.CosmosDiagnosticsContext diagnosticsContext, System.Func2[T,TResult] task) [0x0009e] in <250931dbc7a64174b3f1ed93d3081ffb>:0 at XamarinCosmosDB.CosmosRepository.SaveModelAsync[T] (T model) [0x0009e] in D:\DEV\Git\Xamarin-Cosmos-DB\XamarinCosmosDB\XamarinCosmosDB\CosmosRepository.cs:137 --- Cosmos Diagnostics ---{"DiagnosticVersion":"2","Summary":{"StartUtc":"2020-08-17T21:32:30.8715930Z","TotalElapsedTimeInMs":12290.9743,"UserAgent":"cosmos-netstandard-sdk/3.12.0|3.11.4|02|X86|Unix 4.14.112.0|Mono 6.12.0 2020-02 83105ba2246 |F 00000001|","TotalRequestCount":2,"FailedRequestCount":2},"Context":[{"Id":"SynchronizationContext","ElapsedTimeInMs":11.1346},{"Id":"ItemSerialize","ElapsedTimeInMs":134.708},{"Id":"UsingWaitAsync","ElapsedTimeInMs":0.8897},{"Id":"Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler","HandlerElapsedTimeInMs":3558.1352},{"Id":"Microsoft.Azure.Cosmos.Handlers.RetryHandler","HandlerElapsedTimeInMs":3504.7875000000004},{"Id":"Microsoft.Azure.Cosmos.Handlers.RouterHandler","HandlerElapsedTimeInMs":3464.9487000000004},{"Id":"Microsoft.Azure.Cosmos.Handlers.TransportHandler","HandlerElapsedTimeInMs":127.7205},{"Id":"AggregatedClientSideRequestStatistics","ContactedReplicas":[{"Count":1,"Uri":"rntbd://cdb-ms-prod-eastus1-fd42.documents.azure.com:14104/apps/814e7582-0a8b-498b-8d4f-92263e029030/services/f2d1101a-4d64-4d76-bfa7-e60d2dcfe6c4/partitions/22211ae2-1228-4497-bdbd-d913f933812e/replicas/132368769092028206s/"},{"Count":1,"Uri":"rntbd://cdb-ms-prod-eastus1-fd42.documents.azure.com:14432/apps/814e7582-0a8b-498b-8d4f-92263e029030/services/f2d1101a-4d64-4d76-bfa7-e60d2dcfe6c4/partitions/22211ae2-1228-4497-bdbd-d913f933812e/replicas/132389420512205363s/"},{"Count":1,"Uri":"rntbd://cdb-ms-prod-eastus1-fd42.documents.azure.com:14148/apps/814e7582-0a8b-498b-8d4f-92263e029030/services/f2d1101a-4d64-4d76-bfa7-e60d2dcfe6c4/partitions/22211ae2-1228-4497-bdbd-d913f933812e/replicas/132415566475158421s/"}],"RegionsContacted":["https://remotime-eastus.documents.azure.com:443/"],"FailedReplicas":[]},{"Id":"Microsoft.Azure.Documents.ServerStoreModel","ElapsedTimeInMs":3401.2183},{"Id":"AddressResolutionStatistics","StartTimeUtc":"2020-08-17T21:32:39.6743060Z","EndTimeUtc":"2020-08-17T21:32:39.9263650Z","ElapsedTimeInMs":252.05900000000003,"TargetEndpoint":"https://remotime-eastus.documents.azure.com//addresses/?$resolveFor=dbs%2fmhwpAA%3d%3d%2fcolls%2fmhwpAJ28n9Q%3d%2fdocs&$filter=protocol eq rntbd&$partitionKeyRangeIds=0"},{"Id":"StoreResponseStatistics","StartTimeUtc":"2020-08-17T21:32:39.6010870Z","ResponseTimeUtc":"2020-08-17T21:32:42.8359020Z","ElapsedTimeInMs":3234.815,"ResourceType":"Document","OperationType":"Batch","LocationEndpoint":"https://remotime-eastus.documents.azure.com:443/","ActivityId":"db273de7-94b4-48a2-9ebc-1a519c5f13fe","StoreResult":"StorePhysicalAddress: rntbd://cdb-ms-prod-eastus1-fd42.documents.azure.com:14063/apps/814e7582-0a8b-498b-8d4f-92263e029030/services/f2d1101a-4d64-4d76-bfa7-e60d2dcfe6c4/partitions/22211ae2-1228-4497-bdbd-d913f933812e/replicas/132415566475158422p/, LSN: 57, GlobalCommittedLsn: 57, PartitionKeyRangeId: , IsValid: True, StatusCode: 403, SubStatusCode: 0, RequestCharge: 0, ItemLSN: -1, SessionToken: -1#57, UsingLocalLSN: False, TransportException: null"},{"Id":"PointOperationStatistics","ActivityId":"db273de7-94b4-48a2-9ebc-1a519c5f13fe","ResponseTimeUtc":"2020-08-17T21:32:43.0081550Z","StatusCode":403,"SubStatusCode":0,"RequestCharge":0.0,"RequestUri":"dbs/remotimedb/colls/UserData","ErrorMessage":"Microsoft.Azure.Documents.ForbiddenException: Message: {\"Errors\":[\"Request is blocked. Please check your authorization token and Cosmos DB account firewall settings.\"]}\nActivityId: db273de7-94b4-48a2-9ebc-1a519c5f13fe, Request URI: /apps/814e7582-0a8b-498b-8d4f-92263e029030/services/f2d1101a-4d64-4d76-bfa7-e60d2dcfe6c4/partitions/22211ae2-1228-4497-bdbd-d913f933812e/replicas/132415566475158422p/, RequestStats: Please see CosmosDiagnostics, SDK: Linux/Unknown cosmos-netstandard-sdk/3.11.4\n at Microsoft.Azure.Documents.TransportClient.ThrowServerException (System.String resourceAddress, Microsoft.Azure.Documents.StoreResponse storeResponse, System.Uri physicalAddress, System.Guid activityId, Microsoft.Azure.Documents.DocumentServiceRequest request) [0x004d2] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.Rntbd.TransportClient.InvokeStoreAsync (System.Uri physicalAddress, Microsoft.Azure.Documents.ResourceOperation resourceOperation, Microsoft.Azure.Documents.DocumentServiceRequest request) [0x003c8] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.StoreResult.ToResponse (Microsoft.Azure.Documents.RequestChargeTracker requestChargeTracker) [0x0004f] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.ConsistencyWriter.WritePrivateAsync (Microsoft.Azure.Documents.DocumentServiceRequest request, Microsoft.Azure.Documents.TimeoutHelper timeout, System.Boolean forceRefresh) [0x00573] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.BackoffRetryUtility1[T].ExecuteRetryAsync (System.Func1[TResult] callbackMethod, System.Func3[T1,T2,TResult] callShouldRetry, System.Func1[TResult] inBackoffAlternateCallbackMethod, System.TimeSpan minBackoffForInBackoffCallback, System.Threading.CancellationToken cancellationToken, System.Action1[T] preRetryCallback) [0x00096] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying (System.Runtime.ExceptionServices.ExceptionDispatchInfo capturedException) [0x00011] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.BackoffRetryUtility1[T].ExecuteRetryAsync (System.Func1[TResult] callbackMethod, System.Func3[T1,T2,TResult] callShouldRetry, System.Func1[TResult] inBackoffAlternateCallbackMethod, System.TimeSpan minBackoffForInBackoffCallback, System.Threading.CancellationToken cancellationToken, System.Action1[T] preRetryCallback) [0x001bf] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.ConsistencyWriter.WriteAsync (Microsoft.Azure.Documents.DocumentServiceRequest entity, Microsoft.Azure.Documents.TimeoutHelper timeout, System.Boolean forceRefresh, System.Threading.CancellationToken cancellationToken) [0x00102] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.ReplicatedResourceClient+<>c__DisplayClass26_0.<InvokeAsync>b__0 (Microsoft.Azure.Documents.GoneAndRetryRequestRetryPolicyContext contextArguments) [0x00187] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.RequestRetryUtility.ProcessRequestAsync[TRequest,IRetriableResponse] (System.Func1[TResult] executeAsync, System.Func1[TResult] prepareRequest, Microsoft.Azure.Documents.IRequestRetryPolicy2[TRequest,TResponse] policy, System.Threading.CancellationToken cancellationToken, System.Func1[TResult] inBackoffAlternateCallbackMethod, System.Nullable1[T] minBackoffForInBackoffCallback) [0x000df] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying (System.Runtime.ExceptionServices.ExceptionDispatchInfo capturedException) [0x00011] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.RequestRetryUtility.ProcessRequestAsync[TRequest,IRetriableResponse] (System.Func1[TResult] executeAsync, System.Func1[TResult] prepareRequest, Microsoft.Azure.Documents.IRequestRetryPolicy2[TRequest,TResponse] policy, System.Threading.CancellationToken cancellationToken, System.Func1[TResult] inBackoffAlternateCallbackMethod, System.Nullable1[T] minBackoffForInBackoffCallback) [0x0028b] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.StoreClient.ProcessMessageAsync (Microsoft.Azure.Documents.DocumentServiceRequest request, System.Threading.CancellationToken cancellationToken, Microsoft.Azure.Documents.IRetryPolicy retryPolicy, System.Func2[T,TResult] prepareRequestAsyncDelegate) [0x002fd] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Documents.ServerStoreModel.ProcessMessageAsync (Microsoft.Azure.Documents.DocumentServiceRequest request, System.Threading.CancellationToken cancellationToken) [0x00165] in <18285060bc1144e3a326a50b67232825>:0 \n at Microsoft.Azure.Cosmos.Handlers.TransportHandler.ProcessMessageAsync (Microsoft.Azure.Cosmos.RequestMessage request, System.Threading.CancellationToken cancellationToken) [0x0019d] in <250931dbc7a64174b3f1ed93d3081ffb>:0 \n at Microsoft.Azure.Cosmos.Handlers.TransportHandler.SendAsync (Microsoft.Azure.Cosmos.RequestMessage request, System.Threading.CancellationToken cancellationToken) [0x00074] in <250931dbc7a64174b3f1ed93d3081ffb>:0 ","RequestSessionToken":null,"ResponseSessionToken":"0:-1#57"},{"Id":"BatchAsyncContainerExecutor.ToResponse","ElapsedTimeInMs":13.0839}]}}

@j82w
Copy link
Contributor

j82w commented Aug 18, 2020

@ealsur do you have any suggestions?

@j82w j82w added bug Something isn't working Bulk customer-reported Issue created by a customer needs-investigation labels Aug 18, 2020
@ealsur
Copy link
Member

ealsur commented Aug 18, 2020

@abhijitpai Does Bulk/Batch require a particular permission on Resource Tokens to execute?

@ealsur
Copy link
Member

ealsur commented Aug 18, 2020

@InquisitorJax Your code, when it creates a Permission, it is creating only for a particular PartitionKey:

var partitionKey = new PartitionKey(permissionId);

				var permissionProperties = new PermissionProperties(
					permissionId,
					PermissionMode.All, //default to all. This should be derived from scope in future
					_container,
					partitionKey);

When you issue Bulk operations, are all within the same PartitionKey?

@InquisitorJax
Copy link
Author

@ealsur yup - same partition key.
to be clear, the 403 is returned from a single UpsertAsync() call - I hadn't got round to making actual bulk operations yet.
Are bulk operations supposed to run with different container instances?

@ealsur
Copy link
Member

ealsur commented Aug 18, 2020

I was able to repro this in a small snippet

CosmosClient client1 = new CosmosClient("https://localhost:8081", "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==");

            DatabaseResponse response = await client1.CreateDatabaseIfNotExistsAsync(Guid.NewGuid().ToString());
            Database database = response.Database;

            ContainerResponse containerResponse = await database.CreateContainerAsync(Guid.NewGuid().ToString(), "/pk");
            Container container = containerResponse;

            User user = await database.CreateUserAsync("test");

            PartitionKey partitionKey = new PartitionKey("test");

            PermissionResponse permission = await user.CreatePermissionAsync(new PermissionProperties(
                 "testP",
                    PermissionMode.All, 
                    container,
                    partitionKey
            ));

            CosmosClientOptions clientOptions = new CosmosClientOptions();
            clientOptions.AllowBulkExecution = true;
            CosmosClient client = new CosmosClient("https://localhost:8081", permission.Resource.Token, clientOptions);

            Container containerWithPermissions = client.GetContainer(database.Id, container.Id);

            List<Task<ItemResponse<dynamic>>> tasks = new List<Task<ItemResponse<dynamic>>>();
            for (int i = 0; i < 100; i++)
            {
                tasks.Add(containerWithPermissions.CreateItemAsync<dynamic>(new { id = i.ToString(), pk = "test" }, partitionKey));
            }

            await Task.WhenAll(tasks);

Basically the issue is that when Bulk is enabled, the BATCH backend API request seems to be blocked.

When Bulk is off, the SDK does normal point operations (POST) for each operation. When Bulk is on, we group operations and issue them under a different verb (BATCH), and this might be the source of the issue. @abhijitpai is this a supported scenario?

@j82w
Copy link
Contributor

j82w commented Aug 18, 2020

@ealsur does batch work?

@ealsur
Copy link
Member

ealsur commented Aug 18, 2020

TransactionalBatch works.

I think the root cause of the issue is that Permission is scoped to a PartitionKey. But the Bulk request is not scoped to a PartitionKey (does not send the PK header) but to a physical partition.

TransactonalBatch is scoped to a PartitionKey.

@ealsur
Copy link
Member

ealsur commented Aug 18, 2020

If the Permissions is not scoped to a PartitionKey, for example:

PermissionResponse permission = await user.CreatePermissionAsync(new PermissionProperties(
                 "testP",
                    PermissionMode.All,
                    container
            ));

Then it works.

@InquisitorJax
Copy link
Author

InquisitorJax commented Aug 18, 2020

@ealsur is that by design? My expectation is that batch operation permissions also be limited to a partition.
So in my example, partitions are separated by userID - so I wouldn't want user A to receive a token that has bulk operation permissions on user B's data.
... unless I'm misunderstanding something?

@ealsur
Copy link
Member

ealsur commented Aug 18, 2020

Your point is correct, what I'm saying is that the protocol used in this case does not send the PartitionKey header, because technically the Bulk requests could include documents from different Partition Keys that are stored in the same physical partition. Since we don't send the PartitionKey header, even though all documents are for the same PartitionKey, the validation process that happens on the backend side is rejecting the request with a 403, because the Permission is set for a particular PartitionKey (and the request does not have the header).

@kirankumarkolli kirankumarkolli removed the bug Something isn't working label Jan 8, 2021
@ghost
Copy link

ghost commented Dec 15, 2021

Closing due to in-activity, pease feel free to re-open.

@ghost ghost closed this as completed Dec 15, 2021
@InquisitorJax
Copy link
Author

@ealsur should this be closing? It doesn't look like it's been resolved.
I've just not been using the bulk execution because of this issue.

@ealsur
Copy link
Member

ealsur commented Jun 20, 2023

@InquisitorJax Resurfacing this. By any chance, doesn't TransactionalBatch fit the scenario?

@InquisitorJax
Copy link
Author

@ealsur I'm not so sure. The intent (at least for me anyways) isn't to be sure that a batch of operations all pass or fail as one (ie. requiring them to be placed in a single transaction)
My hope was that by turning on AllowBulkExecution flag, it would increase performance of consecutive, independent operations to a partition - unless I'm misunderstanding the intent of the flag?

@ealsur
Copy link
Member

ealsur commented Jun 21, 2023

@InquisitorJax thanks for clarifying.

Bulk acts as a client network optimization that groups operations and it's designed to make use of the available RU. If you want to perform big data ingestions and use all your available RU, Bulk reduces the network bottleneck by grouping operations per physical partition.

Your scenario however is a bit on the edge, because if you are sending operations just for one PK, you are not really targeting using all the container's available RU (RU is distributed across physical partitions and you are just targeting one), which falls a bit out of what the mode optimizes for.

The problem comes from the fact that your Resource Token is for operations on a single PK while Bulk groups and sends the operations to the physical partition, that is where the authorization issue arises, because the token is not for the full physical partition, but for a single PK value (logical partition != Physical partition). The protocol we currently have might simply not work on this scenario, I'll dig a bit more.

@ealsur
Copy link
Member

ealsur commented Jul 13, 2023

@InquisitorJax we discussed this scenario through several angles and Bulk in this case might not be the best option. The reason is that limiting Bulk to a single PK Value (not even a full physical partition) is really taking away the biggest improvement Bulk brings (grouping operations by physical partition affinity to reduce network requests) and it's not towards the goal (exhaust available RU on the container) because it's simply impossible to take advantage of all the available RUs by issuing operations for only 1 PK value.

@InquisitorJax
Copy link
Author

InquisitorJax commented Jul 13, 2023

Thanks for the feedback @ealsur - would it then make sense to have an error response that is a little more informative than "forbidden"? I can imagine people making PK scoped permissions, and thinking turning on bulk operations flag on a request will improve performance... would be nice if response error / documentation was clear that this is not a supported scenario by design.

@ealsur
Copy link
Member

ealsur commented Jul 14, 2023

I agree. While we cannot drive backend changes from this repo, we can at least make sure to update linked documentation.

We'll use this issue to track an improvement on the CosmosClientOptions.AllowBulkExecution API to add a <remarks> section:

/// <summary>
/// Allows optimistic batching of requests to service. Setting this option might impact the latency of the operations. Hence this option is recommended for non-latency sensitive scenarios only.
/// </summary>
public bool AllowBulkExecution { get; set; }

@iainx Could you please add such a section on the above area that says something like: "The use of Resource Tokens as an authentication mechanism when Bulk is enabled is not recommended because it reduces the potential throughput benefits by reducing the scope of operations."

@ealsur ealsur added documentation and removed needs-investigation customer-reported Issue created by a customer labels Jul 14, 2023
iainx pushed a commit that referenced this issue Jul 17, 2023
Add a remarks section to AllowBulkExecution explaining that it is not
recommended to be used with Resource Token authentication

Fixes #1783
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Approved
Development

Successfully merging a pull request may close this issue.

5 participants