Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return plaintext values in UnprocessedItems in batch_write_item response #92

Closed
mattsb42-aws opened this issue Sep 12, 2018 · 8 comments

Comments

@mattsb42-aws
Copy link
Member

Related to #90 and #91, in order for our encrypted helpers to function as drop-in replacements, we need to decrypt the contents of UnprocessedItems when we receive them in response to a batch_write_item operation.

In order to avoid hitting the CMP twice for every item, we should retain the plaintext key materials for use in processing the response, in the form of an ephemeral CMP that will be discarded once that request is handled.

@mattsb42-aws
Copy link
Member Author

As laid out in #90, thinking through this, we don't need to decrypt anything or do anything with an ephemeral CMP. We just need to hold onto the original plaintext items and replace the encrypted unprocessed items in the response with the original plaintext items.

@mattsb42-aws mattsb42-aws changed the title Decrypt UnprocessedItems in batch_write_item response Return plaintext values in UnprocessedItems in batch_write_item response Dec 6, 2018
@woodyza
Copy link

woodyza commented Dec 21, 2018

Hiya, as the reporter of #91 that's blocked by this I'm pretty interested in the status of the issue...

We're looking to take some code dependent on the batch writer into production early next year, so I'm pretty motivated to get this sorted so we don't need to work around it. If I can help out with a PR just let me know!

@mattsb42-aws
Copy link
Member Author

Hi @woodyza, we always welcome PRs, but knowing you have a timeline when you need this also helps with prioritization. It's not a big change, we just need to take the time to do it and make sure we're testing it thoroughly. #90 is the primary tracking point for this, and is where I laid out the changes that we want to make.

If you would like to take a crack at this, go ahead and post in that thread that you're working on it and use that thread to bring up any issues you run into. Otherwise, we'll look at prioritizing this work over the next few weeks. We've been trying to clean up some of the smaller issues like this one over December.

@woodyza
Copy link

woodyza commented Jan 15, 2019

Hey @mattsb42-aws, I did some digging around in this yesterday and I'm not sure I could provide a respectable PR for the full story in #90 - I did however have a crack at just handling UnprocessedItems in encrypt_batch_write_item by essentially setting them back to the original state of the items before returning the results [1].

Downside is clearly that the current implementation will waste cycles by re-encrypting them, which we could possibly handle with a customised BatchWriter rather than the built in boto one to explicitly pass already encrypted items through without encryption? Unsure if it's worth it though.

This does also has the benefit of making the EncryptedResource and EncryptedClient behave more as drop-ins as expected, with UnprocessedItems returned as they were submitted by the caller. It also means not altering the item encryption in any way to try and verify an already encrypted item is actually still valid and properly encrypted by us.

Any thoughts before I pursue PR-ing something to that effect?

[1] https://github.com/aws/aws-dynamodb-encryption-python/blob/master/src/dynamodb_encryption_sdk/internal/utils.py#L298

@mattsb42-aws
Copy link
Member Author

Re-updating myself on this, my statement above that we're tracking this in #90 was incorrect. The logic for handling batch_write_item is sufficiently different that it is an independent thing (which presumably is why I made a separate issue in the first place now that I think of it).

It would definitely be possible to avoid the re-encryption by adding an item caching mechanism and changing encrypt_batch_write_item to understand it, but because of the level of complexity this would add, for the high-level clients I'm inclined to accept the possibility of re-encryption until we see performance issues. If someone wants to avoid the potential for re-encryption, they can always use the dynamodb_encryption_sdk.encrypted.item functions directly.

How I see this working is a processing layer that handles the response, probably best manifesting in another function that takes in the plaintext request (we'll want to copy this at the beginning of encrypt_batch_write_item), the response, and a CryptoConfig instance. The EncryptionContext in the CryptoConfig could then be used to determine the primary key attributes, then those could be used to match the items and replace the encrypted items in the response with the original values in the plaintext request.

One sticky point that we'll need to handle is what to do if the EncryptedClient/EncryptedResource was created with auto_refresh_table_indexes=False. This would mean that the EncryptionContext in the CryptoConfig does not know the primary key attributes, so we cannot match the encrypted item to the plaintext item.

@woodyza
Copy link

woodyza commented Jan 16, 2019

Ok great, I'll see what I can come up with along those lines - thanks!

My initial thought for the case when the primary key attributes are unknown is just to match on all of the unencrypted fields based on the CryptoConfig AttributeActions, is that too naive? It could get expensive if there are a ton of them I suppose, or if they're large enough.

@mattsb42-aws
Copy link
Member Author

You could do that, yes. I was initially thinking that you would not be able to tell what attributes were encrypted, but you still have the AttributeActions instance in the CryptoConfig, which will tell you that.

@mattsb42-aws
Copy link
Member Author

This is released in 1.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants