Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataStore] CursorWindowAllocationException crash for large datasets #944

Closed
upachecog opened this issue Oct 31, 2020 · 7 comments
Closed
Labels
bug Something isn't working datastore DataStore category/plugins

Comments

@upachecog
Copy link

Hey guys, I have some issues and usage questions:

What I am doing is just calling the first query method on the DataStore of a single model, and at that moment Amplify starts to sync/download all the data from all the models.

Issue description:

The problem I noticed is that when the amount of data in the DynamoDB started to increase, the first synchronization started to fail.

We have two main models, Location and Activities. Each location has N activities and they will be assigned to each user on a daily basis.

We made some calculations and found that each user will have approximately 1350 activities assigned per month within 150 different locations. We want to previously populate the database with locations and activities for a full month for each user. So each month will have 1500 rows in total. With this amount of data, I found the following issues just in the first sync:

First sync: When the user logs-in for the first time and I perform the first query, after that everything seems to work fine.

First query:

fun getTaskList(
        onResponse: (value: List<Ubicacion>) -> Unit,
        onFailure: (value: DataStoreException) -> Unit
    ) {
        Amplify.DataStore.query(Ubicacion::class.java,
            Where.matches(Ubicacion.FECHA.eq(Utils.getCurrentDateString())),
            { matches ->
                Log.i(
                    "TaskDataStore",
                    "Obtained today's ${Utils.getCurrentDateString()} task list: $matches"
                )
                onResponse(matches.asSequence().toList())
            },
            { onFailure(it) }
        )
 }

Issues:

  1. In low-range smartphones (Samsung Galaxy A8, A10 & LGE LM-K410), I found this issue:
CursorWindowAllocationException
Cursor window allocation of 2097152 bytes failed. # Open Cursors=442 (# cursors opened by this proc=442)
  1. In mid-range smartphones, sometimes I have a TimeOutException like in this issue: AWSDataStorePlugin fail to initialize and app crashes #563

Expected result:

Amplify only syncs/download the locations and activities of this single day, for this user. So the app doesn't have memory issues. OR Amplify syncs/downloads by small chunks of data.

Current result

Amplify syncs/downloads all the information previously registered information and in low-mid-range smartphones the application crashes, even though I only queried today's locations.

Usage question:

Is there a way to implement this with the current version of the API or this would be something like a new feature?

@jamesonwilliams
Copy link
Contributor

Hi @upachecog!

calling the first query method on the DataStore of a single model, and at that moment Amplify starts to sync/download all the data from all the models.

As of the last few versions of the DataStore, this is the expected behavior. Whenever you invoke a DataStore API, if the sync engine is not already running, it will be started.

Expected result:
Amplify only syncs/download the locations and activities of this single day, for this user.

@richardmcclellan is implementing a featured called "selective sync," right now, which should address this problem. With it, you would be able to define filters to select the data returned by the DataStore's initial synchronization calls. In your case, you could leverage a timestamp of some kind, to only get recent data. This work is already complete for the JavaScript library, and you can see it here: aws-amplify/amplify-js#7001.

@jamesonwilliams jamesonwilliams added bug Something isn't working datastore DataStore category/plugins labels Nov 2, 2020
@richardmcclellan richardmcclellan changed the title Amplify Crashes on first synchronization when find too much data [DataStore] CursorWindowAllocationException crash for large datasets Nov 4, 2020
@cmllamosas
Copy link

cmllamosas commented Nov 10, 2020

@jamesonwilliams We add (find) other use case, we think this issue is related. Maybe u can help me clarify if it is.

https://sentry.io/share/issue/e5b30629b5654f96806593bde3b23b3b/

We are testing on the following device.

Architecture armeabi-v7a
Architectures [armeabi-v7a, armeabi]
Battery Level 96%
Boot Time 2020-10-30T23:27:52.668Z
Brand lge
Charging False
Connection Type wifi
Family LM-K410
Free Memory 1.3 GB
Free Storage 10.0 GB
Id fd392f59c8e37b6f
Language es_MX
Low Memory False
Manufacturer LGE
Memory Size 2.8 GB
Model LM-K410 (PKQ1.190522.001)
Model Id PKQ1.190522.001
Name LGE LM-K410
Online True
Orientation portrait
Screen Density 1.75
Screen DPI 280
Screen Height Pixels 1459
Screen Resolution 1459x720
Screen Width Pixels 720
Simulator False
Storage Size 16.4 GB
Timezone America/Mexico_City

When we enter data through appsync to dynamodb, it begins to crash, until it stops entering information and opening the application several times it stops crashing, I don't know if this is related to this issue

Use case Expected result:
1.- We assign tasks to specific user
2.- If the operation had errors we have to update, delete or create new tasks so that the user who uses the app has the updated information
3.- The user will see the new data related when syncs finished

Use case Current result:
1.- We assign tasks to specific user
2.- If the operation had errors we have to update, delete or create new tasks so that the user who uses the app has the updated information
3.- The user see the application crashed multiple times until it finish syncs

Questions:

1.- Is there a way to workaround this problem?
2.- Is there an approximate time to release this functionality?
3.- There is something else we can do to help?

We like to test our application in a controlled environment in production, but with these types of errors we are in uncertainty

file_log.txt

@richardmcclellan
Copy link
Contributor

richardmcclellan commented Nov 10, 2020

Hi @upachecog, here's a couple things you should check:

  • Make sure you're using the latest version of Amplify (1.5.0). Prior to 1.3.0, DataStore would only fetch the first 1000 items for each model. Now, all items are fetched, via AppSync's support for pagination.

  • Regarding the CursorWindowAllocationException - this sounds like a bug on our side, related to running out of memory on the device, so we will look into that. As a workaround, you could try is lowering the syncMaxRecords to limit the number of records which are synced to your device (default is 10000), like this:

Amplify.addPlugin(AWSDataStorePlugin(DataStoreConfiguration.builder()
        .syncMaxRecords(5000)
        .build()))
Amplify.configure(applicationContext)
  • DataStore.query only returns whatever is in the SQLite table when you call it - it doesn't wait for the sync with the backend to complete. Listen for the SYNC_QUERIES_READY event to know when the sync is complete, and then call DataStore.query:
// If you're only interested in the remote data:
private void queryForRemoteData() {
    start()
    queryAfterReady()
}

// If you'd like to get the local cached data first, followed by the remote data once it's available:
private void queryForLocalDataThenRemoteData() {
    query()
    queryAfterReady()
}

private void queryAfterReady() {
    Amplify.Hub.subscribe(HubChannel.DATASTORE, 
            hubEvent -> DataStoreChannelEventName.SYNC_QUERIES_READY.toString().equals(hubEvent.getName()), 
            hubEvent -> query())
    );
}

private void start() {
    Amplify.DataStore.start(
        () -> Log.d(TAG, "DataStore has been started.  Subscribe to `SYNC_QUERIES_READY` to know when sync is complete"), 
        error -> Log.e(TAG, "Failure starting DataStore")
    );
}

private void query() {
    Amplify.DataStore.query(Todo.class, 
        result -> { /* Display results on your UI */ }, 
        error -> { }
}

@richardmcclellan
Copy link
Contributor

@cmllamosas your report sounds like a different issue. Looks like some sort of network error occurred during the sync process, and shut down the DataStore sync process. To recover from this - calling any of the Datastore methods (query, observe, save, delete) should trigger a retry. Longer term, we plan to add automatic retry logic as well, to allow for automatic recovery.

@upachecog
Copy link
Author

upachecog commented Nov 22, 2020

@richardmcclellan Thanks for your answer and help.

  • Now I am using version 1.6.2 and I've implemented selective sync.

I have a couple of questions, though:

  1. I want to sync only the Locations I have to visit today, so I added this syncExpression:
.syncExpression(Location::class.java) {
     Location.DATE.eq(Utils.getCurrentDateString())
}

The question is: How can I filter my Activities based on the first query if I don't have a Date field in the Activity model but I have a @connection to the Location Model?

Based on the next query I found here:

Amplify.DataStore.query(Comment::class.java, Where.matches(Post.STATUS.eq(PostStatus.ACTIVE)),
    {
        while (it.hasNext()) {
            val comment: Comment = it.next()
            val content: String = comment.content
            Log.i("MyAmplifyApp", "Content: $content")
        }
    },
    { Log.e("MyAmplifyApp", "Query failed.", it) }
)

I tried to do something like the following, but of course it doesn't work:

.syncExpression(Activity::class.java) { 
    Location.DATE.eq(Utils.getCurrentDateString())
}
  1. There are some other models which I don't need to sync, actually I only use mutations there.
    The question is: Using selective sync, can I completely disable synchronization to these models?

@richardmcclellan
Copy link
Contributor

richardmcclellan commented Nov 22, 2020

How can I filter my Activities based on the first query if I don't have a Date field in the Activity model but I have a @connection to the Location Model?

This is not currently supported, and likely won't be. DataStore relies on the AppSync service, which is backed by DynamoDB. DynamoDB does not support joins across tables, by design. Instead, I'd consider how you can modify your schema so that you can filter Activities based on fields on the Activity object. Adding a Date field to the Activity model might be one way to do that.

The question is: Using selective sync, can I completely disable synchronization to these models?

This is not currently supported, though I like the idea! We could probably allow you to do something like:

.syncExpression(Location::class.java) { QueryPredicates.NONE }

Could you open a separate issue with this feature request if this would be useful for you?

As a workaround for now, you could apply a predicate that you know will not match any records, like so:

.syncExpression(Location::class.java) { Location.ID.eq("") }

This will would have the overhead of making a syncQuery API call, as well as subscribing to each operation for that model, but it would prevent any data from being synced into the local store.

@upachecog
Copy link
Author

upachecog commented Nov 25, 2020

@richardmcclellan I implemented the workaround as you suggested and it seems to work fine. And also I opened this issue #987 for the feature request.

Thank you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datastore DataStore category/plugins
Projects
None yet
Development

No branches or pull requests

4 participants