Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding individual errors in batch operations #623

Closed
betabandido opened this issue Apr 20, 2017 · 5 comments
Closed

Finding individual errors in batch operations #623

betabandido opened this issue Apr 20, 2017 · 5 comments
Labels
guidance Question that needs advice or information.

Comments

@betabandido
Copy link

I am doing some performance measurements with DynamoDB using a C# .NET core console application. Depending on the capacity specified for the table, sometimes the SDK throws a ProvisionedThroughputExceededException when the write throughput exceeds the provisioned throughput.

Based on the experiments, it seems that the SDK handles some of these exceptions, but eventually the exception escapes the SDK and it reaches my application. Once that happens, apparently there is no way for me to know which individual put or delete operations failed within the batch request.

I am using the DataModel API, as it allows me to directly save objects. For instance, I do the following to save a group of objects:

var context = new DynamoDBContext(dbClient);
var request = context.CreateBatchWrite<ItemDto>();

IEnumerable<ItemDto> items = Enumerable.Range(0, 1000)
    .Select(x => new ItemDto
    {
        ItemKey = $"ID-{x}",
        Data = new string('a', 100) + x.ToString()
    });

request.AddPutItems(items);
request.ExecuteAsync().Wait();

where ItemDto is just a very simple class with two properties (ItemKey and Data).

When using the low-level API, as in the following line:

dbClient.BatchWriteItemAsync(request).Result.UnprocessedItems

it is possible to access a dictionary with the unprocessed items. But I cannot find a similar way to do so when using the higher-level API in the DynamoDBContext.

Is there any way to do so?

Or is it just the case that the higher-level API guarantees the atomicity of operations (i.e., either all the individual operations fail or all of them succeed)?

@PavelSafronov
Copy link

DynamoDBContext takes the 1,000 items you pass into it, splits it up into 25-item batches, and sends those to DynamoDB. When there are unprocessed items, these are re-sent in the next request. If you have provisioned enough throughput, eventually all items should get written to the table. That's the happy path.

If the table doesn't have enough provisioned throughput, the low-level client will receive the ProvisionedThroughputExceededException, and will retry up to 10 times, with exponential backoff. If a given request cannot succeed after 10 attempts (which will take about a minute), the final ProvisionedThroughputExceededException is thrown. So for the exception to bubble up, it would have had to fail about 10 consecutive times, which likely suggests the table's provisioned throughput is way too low. This is slightly complicated by retry throttling, a feature we recently added. With retry throttling we may not be retrying 10 times if a large number of previous calls have failed and we don't expect the current retry to succeed.

That's mostly the explanation of why you're seeing this exception. You can read more about retry throttling and high DynamoDB throughput in this forum message.

In the unhappy path case, some of the items you gave us will be written, but we don't currently provide you with a list of which items have not been written. We can mark this issue as a Feature Request for this addition, but no promises on when this work would get done. Of course if this is an important case for you, you can add this capability to the SDK and submit it to us as a pull request.

If you don't mind waiting a long time for your writes to complete, you can disable retry throttling and increase the max retries to a large enough value, as shown below. This approach will take longer to write your data, but it's much more likely to succeed.

var config = new AmazonDynamoDBConfig
{
    MaxErrorRetry = 20,
    ThrottleRetries = false
};
var client = new AmazonDynamoDBClient(config);

Hope this provides an insight into the workings and helps with your scenario.

@betabandido
Copy link
Author

Thanks for the explanation. It perfectly matches with my observations during my experiments.

For the project I am working on now, these errors are not critical -- I believe it should not be difficult to estimate an adequate capacity value for the tables. But, it would certainly be a nice feature to have. Being able to implement robust applications is always good.

@PavelSafronov
Copy link

You're right, having a way to return the unprocessed items would be helpful. Adding this to the feature request log and resolving.

@dcsena
Copy link

dcsena commented Sep 1, 2022

Hey, looks 5 years later and the high-level SDK still doesn't give unprocessed items in the response? Is this the right place to +1 this feature request?

@johan-lindqvist
Copy link

This would be a great feature for our use case as well, any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

5 participants