Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization of DynamoDB LastEvaluatedKey to a Map<String, AttributeValue> for paginated scan results is not trivially possible #3224

Open
2 tasks
kiritsuku opened this issue May 31, 2022 · 7 comments
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue

Comments

@kiritsuku
Copy link

Describe the feature

In order to use paged scanned results in DynamoDB one has to rely on the LastEvaluatedKey field of the scan response. However, in the Java API the raw JSON response can't be accessed, we only can access it through its Java type Map<String, AttributeValue>. This makes it quite cumbersome to serialize and deserialize the LastEvaluatedKey to a string that can be forwarded to a client, which then in return can ask for more elements if wanted.

Serializing the Java type was already discussed here: #2295
It can be done with:

    private String encodeKey(Map<String, AttributeValue> key) {
        // We can only serialize the bean-based builder. See: https://github.com/aws/aws-sdk-java-v2/issues/2295#issuecomment-811382256
        var map = new HashMap<String, AttributeValue.Builder>();
        key.forEach((k, v) -> map.put(k, v.toBuilder()));
        var json = JsonUtils.writeValueAsString(map);
        return Base64.getUrlEncoder().encodeToString(json.getBytes(StandardCharsets.UTF_8));
    }

The deserialization of this key is more complex:

    private Map<String, AttributeValue> decodeKey(String base64) {
        // We can't simply decode the JSON with Jackson as in
        //
        //     JsonUtils.readValue(json, new TypeReference<Map<String, AttributeValue.Builder>>() {});
        //
        // because `AttributeValue.Builder` is an interface and its implementation `AttributeValue.BuilderImpl` is package
        // private and therefore can't be statically referenced. Therefore, we have to manually create the `Map`.
        var json = new String(Base64.getUrlDecoder().decode(base64), StandardCharsets.UTF_8);
        var map = new HashMap<String, AttributeValue>();
        var tree = JsonUtils.readTree(json);
        tree.getFields().forEachRemaining(field -> {
            try {
                var builderObj = (AttributeValue.Builder) JsonUtils.readValue(
                        field.getValue().toString(),
                        Class.forName("software.amazon.awssdk.services.dynamodb.model.AttributeValue$BuilderImpl"));
                map.put(field.getKey(), builderObj.build());
            } catch (ClassNotFoundException e) {
                throw new IllegalStateException("Deserialization for `AttributeValue.Builder` not possible: " + e.getMessage());
            }
        });
        return map;
    }

The issue is that the type AttributeValue.BuilderImpl is not publicly visible. Making it public would make deserialization a lot shorter and also more future proof, since the BuilderImpl is not part of the API right now and therefore would break this code if changed. Creating the AttributeValue manually without relying on bean-based deserialization would also be quite a hassle.

Use Case

The use case is deserialization of a LastEvaluatedKey formatted as JSON to the Java type Map<String, AttributeValue>

Proposed Solution

Make AttributeValue.BuilderImpl public.

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

AWS Java SDK version used

2.17.201

JDK version used

11

Operating System and version

Ubuntu 20.04.4 LTS

@kiritsuku kiritsuku added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels May 31, 2022
@debora-ito
Copy link
Member

Thank you for reaching out @kiritsuku. We see how we can improve the interface it more user-friendly. This was added to our backlog, but its priority is not high.

For anyone who'd also like to see us support this feature, please add a 👍 to the original description.

@debora-ito debora-ito removed the needs-triage This issue or PR still needs to be triaged. label Jun 7, 2022
@yasminetalby yasminetalby added the p2 This is a standard priority issue label Nov 14, 2022
@chandrasekharpatra
Copy link

@kiritsuku One of the ways I found is to extract primayKey and sorkKey from lastEvaluatedKey and create custom object with these values, which can be serialized and sent back to web.

Once we get back the serialised page/pointer we can de-serialize it and extract the pk and sk and re-construct the lastEvaluatedKey with these two vaues as PK and SK.

This approach works because at the end of the day Dynamodb only cares about PK and SK so other fields can be ignored.

@kiritsuku
Copy link
Author

@chandrasekharpatra You are right, just serializing PK and SK is enough. In fact in some cases it is even enough to just serialize the SK since the PK may already be known to the application. That simplifies the serialization and deserialization of the marker for the next page quite a bit.

@AdeboyeML
Copy link

@kiritsuku Thanks for the Code Example of Serializing and Deserializing LastEvaluatedKey

@timwhunt
Copy link

timwhunt commented Feb 3, 2023

I just hit this same issue trying to create pagination token String that I can pass back and forth between and app and client. The plan was to use Gson to convert the Map<String, AttributeValue> key from DynamoDB into JSON, and then Base64 encode. But the LastEvaluatedKeys after the encode/decode round trip were rejected by DynamoDB as an exclusiveStartKey

software.amazon.awssdk.services.dynamodb.model.DynamoDbException: The provided starting key is invalid: One or more parameter values were invalid: An string set may not be empty

I think the root of the problem is that AttributeValue is built to check the class of lists/maps such as ns, bs, ss to tell if they hold values. The methods like hasNs() have logic like this (from decompiler).

    public final boolean hasNs() {
        return this.ns != null && !(this.ns instanceof SdkAutoConstructList);
    }

When Gson constructs an AttributeValue from JSON it uses ArrayList instead of SdkAutoConstructList.

Is it possible to change the logic in hasNs() and similar methods so they are not dependent on the class of the List? That would make encoding/decoding JSON much easier. As a workaround, I've written a custom AttributeValue JSON deserializer that checks the type value and then creates the AttributeValue using builtin methods like fromS().

BTW, just serializing the PK and SK is not a universal solution. In my case I'm using a GSI and the lastEvaluatedKey from DynamoDB included both the PK and SK of the GSI AND the PK from the table's index.

@msayson
Copy link

msayson commented May 18, 2024

Since May 2023, AWS's Java 2.x SDK includes an Enhanced Document API that simplifies converting pagination tokens between the AWS SDK's objects and JSON strings that can be passed over HTTP.

AWS blog post demonstrating use cases: https://aws.amazon.com/blogs/devops/introducing-the-enhanced-document-api-for-dynamodb-in-the-aws-sdk-for-java-2-x/

Sample code for converting between Map<String, AttributeValue> pagination tokens and JSON strings:

import software.amazon.awssdk.enhanced.dynamodb.document.EnhancedDocument;
import software.amazon.awssdk.services.dynamodb.model.AttributeValue;

import java.io.UncheckedIOException;

/**
  * Convert a JSON string representation of a DynamoDB pagination token to the format required by DynamoDB API calls.
  * @param paginationTokenJson JSON string representing the last paginated API call's last evaluated record key
  * @return Map<String, AttributeValue> exclusive start key for the next paginated DynamoDB scan/query API call
  * @throws UncheckedIOException exception thrown if fail to parse pagination token
  */
public Map<String, AttributeValue> convertToExclusiveStartKey(final String paginationTokenJson) throws UncheckedIOException {
    return EnhancedDocument.fromJson(paginationTokenJson).toMap();
}

/**
  * Convert a DynamoDB attribute value map to a JSON string.
  * @param attributeValueMap DynamoDB item key represented as a map from attribute names to attribute values
  * @return String JSON string representation of the DynamoDB item key
  */
public String convertToJson(final Map<String, AttributeValue> attributeValueMap) {
    return EnhancedDocument.fromAttributeValueMap(attributeValueMap).toJson();
}

Expanded discussion at https://www.marksayson.com/blog/serializing-deserializing-dynamodb-pagination-tokens/.

Can likely resolve this issue given the new classes that remove the need for custom serialization/deserialization code for DynamoDB pagination tokens.

@rchache
Copy link

rchache commented May 31, 2024

Wow that code snippet above is a lifesaver! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

8 participants