Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CQLPagingIterator for Amazon Keyspace #3390

Open
li-boxuan opened this issue Dec 15, 2022 · 5 comments
Open

Fix CQLPagingIterator for Amazon Keyspace #3390

li-boxuan opened this issue Dec 15, 2022 · 5 comments

Comments

@li-boxuan
Copy link
Member

li-boxuan commented Dec 15, 2022

Unfortunately, Amazon Keyspace diverges from Apache Cassandra when it comes to paging.

Due to the way Amazon Keyspace works, even if a page iterator returns empty results, there still can be a valid next page with data. This happens when ALLOW FILTERING is used. In JanusGraph, this means when a full scan is triggered. We should fix CQLPagingIterator in CQLKeyColumnValueStore.java.

At the moment, I am not 100% sure whether this would also require a fix from the DataStax java driver which implements paging internally as well.

C.C. @porunov since you did quite a bit of the CQLPagingIterator implementation.

References:
https://stackoverflow.com/questions/69940435/amazon-keyspaces-allow-filtering-without-all-keys-in-node-js
https://docs.aws.amazon.com/keyspaces/latest/devguide/working-with-queries.html#paginating-results
#3378 (reply in thread)

@porunov
Copy link
Member

porunov commented Feb 8, 2023

In this case I don't know how to determine that we finished reading data from Amazon Keyspace (at least I'm not sure how to do that with Datastax Java Driver).
I would imagine that we can improve CQLPagingIterator with providing a function which builds a new statement with the last page state (i.e. currentResultSet.getExecutionInfo().getPagingState()). That said, if all those statements return nothing then it's not clear that we finished or not. Thus, it will lead to infinite loop.
I guess as a workaround we could store the last used page state and the current page state. If they are equal then it means we reached the end (probably).
That means we will always have a single redundant call at the end to verify that we really reached the end.
Not sure how good this solution is. Also, I guess it could really be implemented on DataStax driver side.

@hexa00
Copy link

hexa00 commented Mar 30, 2023

@li-boxuan will that be a problem also for a SparkComputer iterating over all edges for example ?
I'm not sure if it's doing iteration differently?

@li-boxuan
Copy link
Member Author

@hexa00

That won't affect SparkGraphComputer. Spark traversal uses CqlRecordReader to load the data.

@xentripetal
Copy link

Some context on the datastax side, the issue is here

https://github.com/datastax/java-driver/blob/39bf5e363e79d0184b5897d65197202391967bae/core/src/main/java/com/datastax/oss/driver/internal/core/cql/MultiPageResultSet.java#L92

    @Override
    protected Row computeNext() {
      maybeMoveToNextPage();
      return currentRows.hasNext() ? currentRows.next() : endOfData();
    }

where it advances the page then ends the iterator if the next page has no data, regardless of if theres another page.

I'm surprised no one at AWS ran into this issue as this is their officially recommended java driver.

@porunov
Copy link
Member

porunov commented May 22, 2023

The issue could potentially be fixed in #3760 because the pagination approach is changed in that PR and we don’t use sync pagination anymore. I assume after that PR is merged the issue could be potentially fixed, but I personally didn’t use Amazon Keyspaces, so can’t tell for sure. Anyone who is interested in Amazon Keyspaces is free to try it afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants