Skip to content

Add random sharding for the DynamoDb outbox#2813

Merged
iancooper merged 4 commits into
BrighterCommand:masterfrom
jtsalva:ShardDynamoDbOutbox
Sep 25, 2023
Merged

Add random sharding for the DynamoDb outbox#2813
iancooper merged 4 commits into
BrighterCommand:masterfrom
jtsalva:ShardDynamoDbOutbox

Conversation

@jtsalva
Copy link
Copy Markdown
Contributor

@jtsalva jtsalva commented Sep 7, 2023

Addresses #2810

  • Add configuration for number of shards (default 3, max 20)
  • Add optional configuration for TTL (default null - so forever)
  • Applies random sharding when writing items into the dynamo outbox for the 'Outstanding' index PK
  • Query all shards when getting all outstanding messages

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Sep 7, 2023

CLA assistant check
All committers have signed the CLA.

@jtsalva
Copy link
Copy Markdown
Contributor Author

jtsalva commented Sep 7, 2023

@iancooper I'm not sure how to trigger CI


private async Task<IEnumerable<MessageItem>> QueryAllOutstandingShardsAsync(string topic, DateTime minimumAge, CancellationToken cancellationToken = default)
{
using var semaphore = new SemaphoreSlim(20);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reasoning behind this being 20, maybe should be configurable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for the semaphore?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now realise maybe doing a Parallel.ForEachAsync is more suitable? I was thinking we need to set MaxDegreeOfParallelism as max number of shards is unbounded

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could sensibly limit the number of shards. At some point we only want so many threads, so many open sockets etc. If someone exceeds that number in order for their outbox to work, I wonder if an outbox is the right solution to transactional messaging for them.

It is a good question though

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should set an upper limit when you configure the number of partitions. Let's say 20 for now and throw an error if you use more than that.

I wonder if you need more than 20 partitions whether you should first think about using DAX over increasing the number of partitions further.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, will simplify to remove semaphores and just set max shards to 20. Can always be another PR if we find 20 isn't enough.

Speaking of DAX, I'm not sure how that would solve the hot partitioning issue, I think it would alleviate maybe some reads and reduce latency but underlying issue would still be there? (from someone who's only just read about DAX on the surface)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have reached agreement to limit at config, which is more explicit.

In principle a write through-cache will limit a hotspot as our partition would tend to be held in memory not on disk so access would not hit an RCU or WCU limit. For "hot spots" DAX is often a good solution, particularly in this case where you write and then read shortly after what you just read, and can evict older elements (which would be dispatched messages for us). I would guess that a write through cache would resolve this issue. So if our partitioning strategy gets too many partitions and thus requires a lot of threads on your Sweeper process it may be more efficient to move to an in-memory cache.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining Ian

_dynamoOverwriteTableConfig = new DynamoDBOperationConfig
{
OverrideTableName = _configuration.TableName,
ConsistentRead = true
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting ConsistentRead = true as default for non-GSI lookups to mitigate against NullReferenceExceptions we've been seeing when calling PostAsync

@iancooper
Copy link
Copy Markdown
Member

@iancooper I'm not sure how to trigger CI

GitHub feature, one of us has to approve you when you have not triggered one before. It's to stop you uploading something that will start bitcoin mining or the like

@iancooper iancooper merged commit bb9623b into BrighterCommand:master Sep 25, 2023
@jtsalva jtsalva deleted the ShardDynamoDbOutbox branch September 26, 2023 08:41
DevJonny pushed a commit to DevJonny/Brighter that referenced this pull request Feb 28, 2026
* Start to add random sharding for the DynamoDb outbox

* Limit to 20 shards and remove semaphores

---------

Co-authored-by: Ian Cooper <ian_hammond_cooper@yahoo.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants