Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google cloud datastore slow (>2 sec) with simple query from App engine Flexible environment #154

Closed
rajeshshetty opened this issue Dec 1, 2016 · 28 comments

Comments

@rajeshshetty
Copy link

This is the (node.js) code I'm using to query (from a simple datastore with just a single entity):

Datastore = require('@google-cloud/datastore');
projectId = '';

datastoreClient = Datastore({
projectId: projectId
});

var query = datastoreClient.createQuery('Test').limit(1);

console.time('query');
query.run(function (err, test) {
if (err) {
console.log(err);
return;
}
console.timeEnd('query');
});

The first request takes > 2 sec and the subsequent requests are still faster.
If you wait 4 minutes or more between requests, query times are back to >2 secs.

@rajeshshetty rajeshshetty changed the title Google cloud datastore slow (>2 sec) with simple query from compute engine Google cloud datastore slow (>2 sec) with simple query from App engine Flexible environment Dec 1, 2016
@eddavisson
Copy link
Contributor

Do you see this performance consistently (i.e. each time you wait 4 minutes)?

@rajeshshetty
Copy link
Author

yes , Whenever datastore idle for more than 4 minutes , query times goes back to > 2 secs.

@eddavisson eddavisson reopened this Dec 5, 2016
@eddavisson
Copy link
Contributor

Sorry, didn't mean to close the issue.

2 secs is slower than we'd expect for most requests, even if the server-side caches are cold. I'm wondering if it may be a client connection setup cost. @stephenplusplus, do you have any insight?

@rajeshshetty, what region is your project in and where is your client running (e.g. GCE instance, etc.)?

@stephenplusplus
Copy link

The only delay we introduce is the authentication process of fetching a token. Once we have that, we won't waste many cycles dealing with auth again, until the token is expired. Is it possible after ~4 minutes, a token is invalidated?

If not, this might go deeper than the google-cloud-node layer and into the gRPC layer, which handles the opening and closing of channels. @murgatroid99 -- will a channel close itself after a certain amount of idling?

@eddavisson
Copy link
Contributor

I believe the OAuth tokens are good for 60 minutes by default.

@murgatroid99
Copy link

It's possible that this is a channel reconnection delay. @ctiller Would we expect the C core to disconnect and reconnect after that kind of idle period?

@rajeshshetty
Copy link
Author

The project is in us-central region and the client is running from India. We also tried creating a new project in asia-northeast region but it failed when deploying.

@GregorioDiStefano
Copy link

GregorioDiStefano commented Jan 4, 2017

I too am having very slow responses, regardless of time between requests.
I have ~6000 items in my Entity, and simply Filtering with ">=" is taking ~7-10 seconds.

What additional information can I provide in order to figure this out?

@eddavisson
Copy link
Contributor

@GregorioDiStefano The expected performance potentially depends on a number of factors:

  • How many results you're fetching (and how many batches, i.e. RunQuery requests)?
  • The exact query you're running.
  • The size of the entities you're fetching.
  • The location of your project and where the client is running.

@GregorioDiStefano
Copy link

Please disregard my comment :)

@ro-savage
Copy link

@eddavisson Is there any way to view or test query performance? I.E. Being able to run a query through a console / command line and see results, or a way to view query logs that include performance ?

@kkotak
Copy link

kkotak commented May 20, 2017

Hi,

We've just started using datastore and seeing abysmal response times between 2-10 seconds for an empty datastore, central region, calling from a Firebase cloud function running in the same region. Simple get query by Id with only one record in the datastore takes at least 2 seconds. We hope we're missing something obvious here. Is the cloud datastore not designed for real-time use?

@ghost
Copy link

ghost commented Jun 9, 2017

I'm also seeing significant latency (~1.5 seconds per query) in the central region on a small dataset when using cloud functions and when running the code locally.

@kkotak
Copy link

kkotak commented Jun 9, 2017

Looking at the documentation for Google Datastore and the UI for the Admin console (sparse and dated), it feels like Google is not taking it seriously or will be replacing it in the near future.

@wsh
Copy link
Contributor

wsh commented Jul 14, 2017

Hey there--I'm a member of the Cloud Datastore team. I promise we take this very seriously (I know, I work on it!).

@ro-savage one thing you might consider doing is firing up the Python console and using the driver directly. We don't have something specifically targeted for performance measurement that I'm aware of but I'll look into it and let you know if I find something.

@kkotak @richardowright could you provide more details on your queries, please? (Code, ideally.)

@ghost
Copy link

ghost commented Jul 14, 2017

example_code.zip

Hope this helps. It should be pretty easy to test by adjusting the deploy function and project name in db.js. An easy call to demonstrate it would be https://URL.cloudfunctions.net/person?address_zip=12345&last_name=Wright.

Also, as an FYI, I was able to improve performance significantly by authorizing with a key file (i.e. using keyFilename in the store) and increasing the size of the function.

@wizeird
Copy link

wizeird commented Jul 21, 2017

@wsh

I am also having an issue with speed with queries to the datastore from HTTP trigger functions.

It isn't limited to running queries though. Making http requests using the "request" module takes a long time too.

I don't think it's a matter of "cold starts", caching, authentication, or logging/other activities done during requests that could slow it down. I've searched around and saw many people having similar issues but none coming to solutions. Many suggestions such as the aforementioned but no solutions to the core issue.

I could understand a 500ms or even a second, but it takes many seconds...sometimes 10+ seconds datastore queries (very small dataset) and HTTP requests to external APIs. I've searched and I think the following people had issues maybe not exactly the same, but which probably have a root cause:
googleapis/google-cloud-node#2284
https://stackoverflow.com/questions/39878311/google-cloud-datastore-runquery-is-extremely-slow-for-app-engine-nodejs-app
https://stackoverflow.com/questions/42726870/firebase-cloud-functions-is-very-slow
https://stackoverflow.com/questions/42934796/cloud-functions-for-firebase-performance
googleapis/google-cloud-node#2374
https://stackoverflow.com/questions/40454958/google-cloud-datastore-slow-800ms-with-simple-query-from-compute-engine

I don't think it's the node.js datastore package but instead I think it's related to the infrastructure and/or code base that Node.js Google Cloud HTTP trigger functions, Node.js Firebase HTTP trigger functions, and Node.js Google Computer Engine applications have in common and use to make HTTP requests in general. Something that they have in common is causing slow HTTP requests to be made, I think.

@ro-savage
Copy link

ro-savage commented Jul 31, 2017

@wsh

I ran some tests on a new datastore. It contains only 1 kind, with 1 entity, with 1 value.

I then ran a lookup (by id) and a query (property = value) with both the JS library @google-cloud/datastore and straight HTTP requests (already authenticated, using a valid token).

This were all run separately. node datastore-test.js three times. (not 3 queries in the one file)

Look Up Results

/* Run 1 */
HTTP Call: 423.94127ms
JS Lib: 1939.758869ms

/* Run 2 */
HTTP Call: 139.06255ms
JS Lib: 1363.45864ms

/* Run 3 */
HTTP Call: 235.541575ms
JS Lib: 1403.158227ms

Query Results

/* Run 1 */
HTTP Call: 447.550193ms
JS Lib: 1794.338112ms

/* Run 2 */
HTTP Call: 262.554809ms
JS Lib: 1069.181354ms

/* Run 3 */
HTTP Call: 574.71431ms
JS Lib: 1418.149885ms

For reference, a straight post/res to the datastore that receives a 404 is around 20ms-70ms.

So this appears to me to show that (ignoring pingtime).
HTTP query Takes approximately 150ms for lookup and 400ms for query
JS Lib Takes approximately 1500ms for lookup and 1500ms for query.

With a datastore with a single entity.

This is really too slow for usages where you don't have a sever running continuously (e.g. google cloud functions. Which should be the perfect partner to datastore)

We used datastore because we liked a lot of the Google Cloud Platform and needed a noSQL database, and really liked the per query costs. But this is much too slow.

@ro-savage
Copy link

ro-savage commented Jul 31, 2017

To add to this. I've made the same lookup and queries but this time one after another in a single node process, rather than as seperate each time. (E.g. more similar to running on a server)

Lookup Results

// JS Library
JS Lib: 1891.605042ms
JS Lib: 28.008136ms
JS Lib: 159.737324ms
JS Lib: 159.813774ms
JS Lib: 27.84496ms

// HTTP calls
HTTP Call: 475.195449ms
HTTP Call: 187.042799ms
HTTP Call: 185.007734ms
HTTP Call: 190.83722ms
HTTP Call: 189.986801ms

Query Results

// JS Library
JS Lib: 1859.259317ms
JS Lib: 48.649343ms
JS Lib: 187.246999ms
JS Lib: 171.868148ms
JS Lib: 35.336654ms

// Http calls
HTTP Call: 406.35773ms
HTTP Call: 177.868567ms
HTTP Call: 181.318859ms
HTTP Call: 169.486534ms
HTTP Call: 212.036648ms

As we can see, its much faster with the JS Lib if there has been one request made (authenticated?).

Is there a way to speed up this first lookup? If you are using something like Google Cloud Functions, each request will be a different node process.

@fengli79
Copy link

fengli79 commented Aug 1, 2017

@ro-savage, If your HTTP test sets an OAuth token explicitly while the JS test uses the auth lib to get the default credentials, I think later should be slower. As it will use the default application credentials and need to make a remote call to get the OAuth token if it's not cached already.

An apple to apple comparison would be change the JS test to use the the same OAuth token too.

@wsh
Copy link
Contributor

wsh commented Aug 2, 2017

What @fengli79 said. As a general note, @wizeird, @ro-savage, and @richardowright: this repo is only for the low-level Java and Python Datastore clients. It seems like you all are using our Node libraries.

I'm closing this issue because this repo is the wrong place for this discussion, not because it isn't worth having. In particular, I (and the other folks on the Datastore team) don't know nearly as much as our colleagues who focus on Node and (in the case of Functions) our compute environments. Please re-file this issue in https://github.com/GoogleCloudPlatform/google-cloud-node.

@wsh wsh closed this as completed Aug 2, 2017
@ro-savage
Copy link

@wsh, this isn't an issue with node so much as with authentication. Which I assume is much lower level.

Who would be responsible for datastore auth when using cloud functions?

(See googleapis/google-cloud-node#2374)

@kohago
Copy link

kohago commented Sep 14, 2017

Because of the auto-scaling,it will become more faster after some minutes!!?
Or a magic?

I met the same problem , one simple selection took 2~3 seconds,too slow to use !
Then i took a look around the web,
...
After some minutes, I ran the test one more time,too slow to use
After some minutes, I ran the test one more time,too slow to use

Suddenly ,It became much more faster! more than 10 selection per second can be done!
I have changed nothing.

The test source(java):

 Datastore datastore = DatastoreOptions.getDefaultInstance().getService(); 
 String sql = "select __key__ from Some where __key__ = Key(....)";
 Query<?> query = Query.newGqlQueryBuilder(sql).setAllowLiteral(true).build(); 
 
 IntStream.rangeClosed(1, 1000).forEach(i -> {
      QueryResults<?> results = datastore.run(query);
      if (!results.hasNext())
          System.out.println("no value in datastore");
     else
          System.out.println(" value in datastore");
  }

@kohago
Copy link

kohago commented Sep 14, 2017

Cold start! Heat up!
#5

@charly37
Copy link

Same here. Very slow response time and it widely vary during the day (sometime very fast answer and sometime it take several seconds). We did not test the service before starting our migration hoping that it would works fine (which is usually the case for Google service) but now we will have to hold our migration.

@zappan
Copy link

zappan commented Jan 2, 2018

Experiencing the same issues, cross-linking an issue in the node.js datastore client library where they're tackling this for a reference: googleapis/nodejs-datastore#9

@lyx-x
Copy link

lyx-x commented Jan 2, 2019

It seems that @kohago is right. Fetching a long standing entity is much faster than getting a recently updated item, but I have no idea why (consistency?).

@Sylver11
Copy link

@wsh you are suggesting that this problem does not occur with the Python client. This is not correct. I experience exactly the same latency problem using the python client (google-cloud-firestore 2.11.1). Has this received any further attention within your team?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests