Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB: 'id' key is S-typed but coerced as bytes #5455

Open
ikonst opened this issue Apr 11, 2019 · 9 comments
Open

DynamoDB: 'id' key is S-typed but coerced as bytes #5455

ikonst opened this issue Apr 11, 2019 · 9 comments

Comments

@ikonst
Copy link

ikonst commented Apr 11, 2019

The resulting stored data looks like:

{
    "Count": 1,
    "Items": [
        {
            "timestamp": {
                "N": "1554962898.2549338"
            },
            "result": {
                "B": "..."
            },
            "id": {
                "S": "b'celery-task-meta-3b0be007-8cf5-490b-a75f-c6d05d3c81ad'"
            }
        }
    ],
    "ScannedCount": 1,
    "ConsumedCapacity": null
}

The DynamoDB 'S' type expects a Unicode string. Note the b'' repr, presumably an effect of BaseKeyValueStoreBackend.key_t that defaults to ensure_bytes.

@auvipy
Copy link
Member

auvipy commented Apr 11, 2019

got any working solution to fix it?

@ikonst
Copy link
Author

ikonst commented Apr 11, 2019

from celery.backends.dynamodb import DynamoDBBackend
DynamoDBBackend.key_t = str

fixes it, but this breaks compatibility with the existing IDs

@georgepsarakis
Copy link
Contributor

georgepsarakis commented Apr 13, 2019

@ikonst thanks for the bug report. Would it make sense then to handle both formats simultaneously? Just an observation that the code is currently performing additional transformations on the key (for Python 2/3 compatibility reasons), both on put and get operations.

I think the best way to make the transition would be to use BatchGetItem API and attempt to retrieve both Task ID formats.

@ikonst
Copy link
Author

ikonst commented Apr 15, 2019

I think the best way to make the transition would be to use BatchGetItem API and attempt to retrieve both Task ID formats.

You probably wouldn't want this behavior by default, since it 2x cost even for new instances.

@georgepsarakis
Copy link
Contributor

@ikonst I see your point, you are right, we need to examine the impact on cost, since it could require more read capacity units. However, with what I am proposing the read request will always return a single item (depending on the Celery version that created the result record), so I think read capacity unit consumption will be the same. How does that sound?

@ikonst
Copy link
Author

ikonst commented Apr 18, 2019

If you perform a read operation on an item that does not exist, DynamoDB still consumes provisioned read throughput

DynamoDB processes each item in the batch as an individual GetItem request

So a BatchGetItem for two keys will consume two units, or am I reading it wrong?

@georgepsarakis
Copy link
Contributor

@ikonst indeed, this seems to be the case, so my assumption in the previous comment is invalid.

@georgepsarakis
Copy link
Contributor

We can add an option, so that users can enable the key format change, along with a documentation note and a deprecation warning in the code, that will prompt for upgrade. Eventually the setting will be removed and the new key format will become the default. If the additional unit consumption is not a problem, users can enable the backwards compatible query for a short migration period, and then use the new format.

How does that sound?

@ikonst
Copy link
Author

ikonst commented May 24, 2019

So there's effectively 3 modes:

  • old id
  • new id
  • both ids for migration period

Sounds good to me.

@auvipy auvipy modified the milestones: 4.4.0, 4.6 Jun 14, 2019
@auvipy auvipy modified the milestones: 4.6, 4.4.x Dec 16, 2019
@auvipy auvipy modified the milestones: 4.4.x, Future Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants