New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High latency #5

Closed
sudhirj opened this Issue Jun 5, 2013 · 55 comments

Comments

Projects
None yet
@sudhirj
Contributor

sudhirj commented Jun 5, 2013

b/8334662

I'm consistently seeing hight latency numbers when using the GCD - about 1.5 seconds on average for queries (10 items or less) and single item writes ( < 1kb).

My ping time to the API endpoint is about 50ms, so discounting a 100ms round trip, that still leaves more than a second to GCD latency. This simply won't work for a server environment, certainly not one that scales. This is very surprising because I was expecting latencies much closer to GAE: https://code.google.com/status/appengine/detail/hr-datastore/2013/06/05#ae-trust-detail-hr-datastore-query-latency

Can we get a dashboard like the GAE HR Datastore status board? Possibly measuring latencies from Google Compute Engine instances and a few different AWS Regions?

@proppy

This comment has been minimized.

Show comment
Hide comment
@proppy

proppy Jun 5, 2013

Member

We are actively working on reducing the network latency between Compute and Cloud Datastore, in particular by having a better collocation of the 2 services.

But the numbers you are getting are way higher than the one I'm seeing on average from Compute (<140ms median for a write of 1x 1k random entity, and <100ms median for runquery of 10 entities).

Can you share more details about how you measure the latency and from which location you are making the API call?

Thanks in advance.

PS: feel free to open a separate issue for the dashboard feature request.

Member

proppy commented Jun 5, 2013

We are actively working on reducing the network latency between Compute and Cloud Datastore, in particular by having a better collocation of the 2 services.

But the numbers you are getting are way higher than the one I'm seeing on average from Compute (<140ms median for a write of 1x 1k random entity, and <100ms median for runquery of 10 entities).

Can you share more details about how you measure the latency and from which location you are making the API call?

Thanks in advance.

PS: feel free to open a separate issue for the dashboard feature request.

@dgay42

This comment has been minimized.

Show comment
Hide comment
@dgay42

dgay42 Jun 5, 2013

It's worth noting: if there's not much GCD traffic, there's a significant "cold start" penalty on latency. To get realistic latency numbers, I would recommend having a background traffic process running at a few requests per second (doesn't matter much what requests, we've used a beginTransaction+commit pair for our own testing).

dgay42 commented Jun 5, 2013

It's worth noting: if there's not much GCD traffic, there's a significant "cold start" penalty on latency. To get realistic latency numbers, I would recommend having a background traffic process running at a few requests per second (doesn't matter much what requests, we've used a beginTransaction+commit pair for our own testing).

@sudhirj

This comment has been minimized.

Show comment
Hide comment
@sudhirj

sudhirj Jun 14, 2013

Contributor

I think I've accounted for the cold start, but I made only about a 100 requests in sequence. I might have hit it on a particularly 'cold' day, though, will try again. Also Google's definition of an active store probably has a lot more 0s on the requests/second count.

I measured by turning on the HTTP logger that's included in the Ruby example and looking at the output for sequential runs - definitely not a well thought out benchmark, but enough to show what's going on.

Contributor

sudhirj commented Jun 14, 2013

I think I've accounted for the cold start, but I made only about a 100 requests in sequence. I might have hit it on a particularly 'cold' day, though, will try again. Also Google's definition of an active store probably has a lot more 0s on the requests/second count.

I measured by turning on the HTTP logger that's included in the Ruby example and looking at the output for sequential runs - definitely not a well thought out benchmark, but enough to show what's going on.

@vierjp

This comment has been minimized.

Show comment
Hide comment
@vierjp

vierjp Jun 20, 2013

I tried Cloud Datastore API from Compute Engine.

instance type:f1-micro, zone:us-central1-a

I tried with this code.
https://github.com/vierjp/vier-gcd-test-client/blob/master/vier-gcd-test-client/src/main/java/ClientTest6.java

This code puts a small entity 1000 times and query latest 10 entities.

According to my experiment, it takes for about 150-200 ms on average to put a entity, and takes for about 500-700 ms to query 10 entities.

But it takes for 40-50 ms on average to put a entity from App Engine with Low-Level-API.

I expect that Cloud Datastore API will be improved its execution speed from Compute Engine.

vierjp commented Jun 20, 2013

I tried Cloud Datastore API from Compute Engine.

instance type:f1-micro, zone:us-central1-a

I tried with this code.
https://github.com/vierjp/vier-gcd-test-client/blob/master/vier-gcd-test-client/src/main/java/ClientTest6.java

This code puts a small entity 1000 times and query latest 10 entities.

According to my experiment, it takes for about 150-200 ms on average to put a entity, and takes for about 500-700 ms to query 10 entities.

But it takes for 40-50 ms on average to put a entity from App Engine with Low-Level-API.

I expect that Cloud Datastore API will be improved its execution speed from Compute Engine.

@sudhirj

This comment has been minimized.

Show comment
Hide comment
@sudhirj

sudhirj Jun 21, 2013

Contributor

I'm still hoping for better speed outside of Compute Engine - namely from AWS. The biggest draw for me is that I can now deploy servers all around with world with a multitude of stacks and frameworks and have them share a common high availability and high speed distributed datastore. Seems like a pipe dream, but definitely possibly if Google can improve the latencies on GCD.

Contributor

sudhirj commented Jun 21, 2013

I'm still hoping for better speed outside of Compute Engine - namely from AWS. The biggest draw for me is that I can now deploy servers all around with world with a multitude of stacks and frameworks and have them share a common high availability and high speed distributed datastore. Seems like a pipe dream, but definitely possibly if Google can improve the latencies on GCD.

@proppy

This comment has been minimized.

Show comment
Hide comment
@proppy

proppy Jul 1, 2013

Member

Can you try running your benchmarks again?

Member

proppy commented Jul 1, 2013

Can you try running your benchmarks again?

@vierjp

This comment has been minimized.

Show comment
Hide comment
@vierjp

vierjp Jul 3, 2013

Great.
I ran same benchmark on Compute Engine (instance type: f1-micro, zone: us-central1-a).
It takes for about 98-120 ms on average to put a entity.
(Last time, it took for about 150-200 ms)

7 03, 2013 4:01:16 AM put entities 117610 milliseconds.
7 03, 2013 4:03:51 AM put entities 108515 milliseconds.
7 03, 2013 4:08:33 AM put entities 97788 milliseconds.
(This code puts a small entity 1000 times)

vierjp commented Jul 3, 2013

Great.
I ran same benchmark on Compute Engine (instance type: f1-micro, zone: us-central1-a).
It takes for about 98-120 ms on average to put a entity.
(Last time, it took for about 150-200 ms)

7 03, 2013 4:01:16 AM put entities 117610 milliseconds.
7 03, 2013 4:03:51 AM put entities 108515 milliseconds.
7 03, 2013 4:08:33 AM put entities 97788 milliseconds.
(This code puts a small entity 1000 times)

@briandorsey

This comment has been minimized.

Show comment
Hide comment
@briandorsey

briandorsey Jul 3, 2013

Member

Also, it may be worth running your benchmark again on an n1-standard-4 or n1-highcpu-4 instance. Overall network throughput is higher on higher CPU instances. I wouldn't expect latency to be largely affected, but it's worth verifying with your workload.

Member

briandorsey commented Jul 3, 2013

Also, it may be worth running your benchmark again on an n1-standard-4 or n1-highcpu-4 instance. Overall network throughput is higher on higher CPU instances. I wouldn't expect latency to be largely affected, but it's worth verifying with your workload.

@vierjp

This comment has been minimized.

Show comment
Hide comment
@vierjp

vierjp Jul 3, 2013

I ran same benchmark again with GCE's n1-highcpu-4 instance.

GCE n1-highcpu-4, us-central1-a

-put
7 03, 2013 6:29:19 PM put entities 128545 milliseconds.
7 03, 2013 6:32:08 PM put entities 97222 milliseconds.
7 03, 2013 6:34:33 PM put entities 100826 milliseconds.
(This code puts a small entity 1000 times.)

-query
7 03, 2013 6:32:08 PM query entities 125 milliseconds.
7 03, 2013 6:34:33 PM query entities 740 milliseconds.

I felt that the network speed was increased while I was downloading files using wget and yum.

I think that the result of the benchmark of 'put' didn't change much.
However, query speed may have increased.
I tried 3 times and at that timing my GAE app's free quota was gone. :-(

I'll retry new benchmark for 'query' tomorrow.

vierjp commented Jul 3, 2013

I ran same benchmark again with GCE's n1-highcpu-4 instance.

GCE n1-highcpu-4, us-central1-a

-put
7 03, 2013 6:29:19 PM put entities 128545 milliseconds.
7 03, 2013 6:32:08 PM put entities 97222 milliseconds.
7 03, 2013 6:34:33 PM put entities 100826 milliseconds.
(This code puts a small entity 1000 times.)

-query
7 03, 2013 6:32:08 PM query entities 125 milliseconds.
7 03, 2013 6:34:33 PM query entities 740 milliseconds.

I felt that the network speed was increased while I was downloading files using wget and yum.

I think that the result of the benchmark of 'put' didn't change much.
However, query speed may have increased.
I tried 3 times and at that timing my GAE app's free quota was gone. :-(

I'll retry new benchmark for 'query' tomorrow.

@vierjp

This comment has been minimized.

Show comment
Hide comment
@vierjp

vierjp Jul 4, 2013

I tried on GCE's n1-highcpu-4 instance.

I queried latest 10 entities after I had put entities. (14 times)
To disable caches, I changed kind name every time.

It took 326 ms to query on average.
The execute times were in a range from 283 to 416 ms.

I think the speed of queries became faster about 20-30%.

vierjp commented Jul 4, 2013

I tried on GCE's n1-highcpu-4 instance.

I queried latest 10 entities after I had put entities. (14 times)
To disable caches, I changed kind name every time.

It took 326 ms to query on average.
The execute times were in a range from 283 to 416 ms.

I think the speed of queries became faster about 20-30%.

@obeleh

This comment has been minimized.

Show comment
Hide comment
@obeleh

obeleh Sep 12, 2013

I've inserted 100 blank entities in python. It took 23 seconds :(
PS: Europe-west-a small

obeleh commented Sep 12, 2013

I've inserted 100 blank entities in python. It took 23 seconds :(
PS: Europe-west-a small

@Alfus

This comment has been minimized.

Show comment
Hide comment
@Alfus

Alfus Sep 15, 2013

Did you insert them in a single batch request or in serially in individual requests?

Alfus commented Sep 15, 2013

Did you insert them in a single batch request or in serially in individual requests?

@obeleh

This comment has been minimized.

Show comment
Hide comment
@obeleh

obeleh Sep 16, 2013

Serially, on purpose. To see how long a single insert would take on average. 200 - 250 ms sounds very long to me. I understand that the replication is probably why it takes so long. But I expected it to be faster. Most results come in, in 5 minute intervals from multiple data sources. So I guess I'll have to queue that up.

obeleh commented Sep 16, 2013

Serially, on purpose. To see how long a single insert would take on average. 200 - 250 ms sounds very long to me. I understand that the replication is probably why it takes so long. But I expected it to be faster. Most results come in, in 5 minute intervals from multiple data sources. So I guess I'll have to queue that up.

@obeleh

This comment has been minimized.

Show comment
Hide comment
@obeleh

obeleh Jun 6, 2014

Are there any updates here?

obeleh commented Jun 6, 2014

Are there any updates here?

@Alfus

This comment has been minimized.

Show comment
Hide comment
@Alfus

Alfus Jun 9, 2014

Not yet. This is still a top priority.

Alfus commented Jun 9, 2014

Not yet. This is still a top priority.

@sorin7486

This comment has been minimized.

Show comment
Hide comment
@sorin7486

sorin7486 Nov 7, 2014

Any news on this?

sorin7486 commented Nov 7, 2014

Any news on this?

@ehrencrona

This comment has been minimized.

Show comment
Hide comment
@ehrencrona

ehrencrona Nov 23, 2014

I'm having the same problem. I've created a small script in Node to test get performance and no matter whether running on my local machine or on Google Compute Cloud (on the cheapest instance) I get on average 400 ms response times for a single get request in a tiny data store with just a handful of entries.

The response times vary from 200 ms up to several seconds (!). I've tried letting a get run every ten seconds or so for a longer period; the times do not improve.

Is this really normal? Would latency improve by running in AppEngine (though that seems extremely complicated using NodeJS)?

Even response times of 100 ms, mentioned earlier in this thread, would seem to make it impossible to use Datastore for anything remotely time-critical. But there are people actually using Datastore, right? How are others using it? Or am I doing something wrong?

For reference, my tiny timing script:

var gcloud = require('gcloud');

var dataset = gcloud.datastore.dataset({
    projectId: 'myProject',
    keyFilename: 'key.json'
});

var calls = 0;

setInterval(function() {
    for (var i = 0; i < 10; i++) {
        var call = 'get' + calls++;
        console.time(call);

        dataset.get(dataset.key(['Language', 'EN']),
            (function(call) {
                return function(err, entities, nextQuery) {
                    if (err) {
                        console.log(err);
                    }

                    console.timeEnd(call);
                }
            })(call)
        )
    }    
}, 2000);

This yields output like:

get232: 286ms
get235: 342ms
get237: 362ms
get238: 419ms
get239: 425ms
get236: 3734ms
get241: 203ms

Thankful for any help. With these response times I will need to rething my entire architecture.

ehrencrona commented Nov 23, 2014

I'm having the same problem. I've created a small script in Node to test get performance and no matter whether running on my local machine or on Google Compute Cloud (on the cheapest instance) I get on average 400 ms response times for a single get request in a tiny data store with just a handful of entries.

The response times vary from 200 ms up to several seconds (!). I've tried letting a get run every ten seconds or so for a longer period; the times do not improve.

Is this really normal? Would latency improve by running in AppEngine (though that seems extremely complicated using NodeJS)?

Even response times of 100 ms, mentioned earlier in this thread, would seem to make it impossible to use Datastore for anything remotely time-critical. But there are people actually using Datastore, right? How are others using it? Or am I doing something wrong?

For reference, my tiny timing script:

var gcloud = require('gcloud');

var dataset = gcloud.datastore.dataset({
    projectId: 'myProject',
    keyFilename: 'key.json'
});

var calls = 0;

setInterval(function() {
    for (var i = 0; i < 10; i++) {
        var call = 'get' + calls++;
        console.time(call);

        dataset.get(dataset.key(['Language', 'EN']),
            (function(call) {
                return function(err, entities, nextQuery) {
                    if (err) {
                        console.log(err);
                    }

                    console.timeEnd(call);
                }
            })(call)
        )
    }    
}, 2000);

This yields output like:

get232: 286ms
get235: 342ms
get237: 362ms
get238: 419ms
get239: 425ms
get236: 3734ms
get241: 203ms

Thankful for any help. With these response times I will need to rething my entire architecture.

@obeleh

This comment has been minimized.

Show comment
Hide comment
@obeleh

obeleh Nov 24, 2014

This is the same feeling I've been having for a long time now. "Why am I the only one with this problem aren't there hundreds or even thousands of others too building on the google platform? How did they solve this problem? Why is it I don't read anything about these questions on the rest of the internet?"

I feel as if the solutions available on the google cloud platform are built for stateless agents/machines that run slow (200ms - 10sec) but large numbers/operations.

I've had expected bigquery to eventually to get faster so that we could store chart data in it. But it responds between 2 and 10 secs. With GCD I expected the service behave more or less with the speeds of other databases. I would have been quite ok with 100ms.

You will probably have to adjust your design. I hope I'm wrong and I too have missed something. But so far no enlightenment has come.

obeleh commented Nov 24, 2014

This is the same feeling I've been having for a long time now. "Why am I the only one with this problem aren't there hundreds or even thousands of others too building on the google platform? How did they solve this problem? Why is it I don't read anything about these questions on the rest of the internet?"

I feel as if the solutions available on the google cloud platform are built for stateless agents/machines that run slow (200ms - 10sec) but large numbers/operations.

I've had expected bigquery to eventually to get faster so that we could store chart data in it. But it responds between 2 and 10 secs. With GCD I expected the service behave more or less with the speeds of other databases. I would have been quite ok with 100ms.

You will probably have to adjust your design. I hope I'm wrong and I too have missed something. But so far no enlightenment has come.

@Alfus

This comment has been minimized.

Show comment
Hide comment
@Alfus

Alfus Nov 24, 2014

We are working hard to solve this problem. We are implementing a new
serving stack that we expect will get the latency very close to the latency
you see from GAE. We launched the API into Beta without these improvements
because there are a lot of use cases not sensitive to this latency
issues (e.g. offline data processing).

As a stopgap solution, there are things you can do to mitigate the latency.
Specifically, tweaking the setting documented here:
https://cloud.google.com/appengine/docs/adminconsole/performancesettings
For example, increasing the front end class and reducing the pending
latency options might reduce the variability you are seeing.
You can also look at https://appengine.google.com/instances?&app_id=<your_app_id>&version_id=ah-builtin-datastoreservice to see more information
about what is happening.

Sorry for the inconvenience,

Alfred

On Sun Nov 23 2014 at 11:44:20 PM Sjuul Janssen notifications@github.com
wrote:

This is the same feeling I've been having for a long time now. "Why am I
the only one with this problem aren't there hundreds or even thousands of
others too building on the google platform? How did they solve this
problem? Why is it I don't read anything about these questions on the rest
of the internet?"

I feel as if the solutions available on the google cloud platform are
built for stateless agents/machines that run slow (200ms - 10sec) but large
numbers/operations.

I've had expected bigquery to eventually to get faster so that we could
store chart data in it. But it responds between 2 and 10 secs. With GCD I
expected the service behave more or less with the speeds of other
databases. I would have been quite ok with 100ms.

You will probably have to adjust your design. I hope I'm wrong and I too
have missed something. But so far no enlightenment has come.

Reply to this email directly or view it on GitHub
#5 (comment)
.

Alfus commented Nov 24, 2014

We are working hard to solve this problem. We are implementing a new
serving stack that we expect will get the latency very close to the latency
you see from GAE. We launched the API into Beta without these improvements
because there are a lot of use cases not sensitive to this latency
issues (e.g. offline data processing).

As a stopgap solution, there are things you can do to mitigate the latency.
Specifically, tweaking the setting documented here:
https://cloud.google.com/appengine/docs/adminconsole/performancesettings
For example, increasing the front end class and reducing the pending
latency options might reduce the variability you are seeing.
You can also look at https://appengine.google.com/instances?&app_id=<your_app_id>&version_id=ah-builtin-datastoreservice to see more information
about what is happening.

Sorry for the inconvenience,

Alfred

On Sun Nov 23 2014 at 11:44:20 PM Sjuul Janssen notifications@github.com
wrote:

This is the same feeling I've been having for a long time now. "Why am I
the only one with this problem aren't there hundreds or even thousands of
others too building on the google platform? How did they solve this
problem? Why is it I don't read anything about these questions on the rest
of the internet?"

I feel as if the solutions available on the google cloud platform are
built for stateless agents/machines that run slow (200ms - 10sec) but large
numbers/operations.

I've had expected bigquery to eventually to get faster so that we could
store chart data in it. But it responds between 2 and 10 secs. With GCD I
expected the service behave more or less with the speeds of other
databases. I would have been quite ok with 100ms.

You will probably have to adjust your design. I hope I'm wrong and I too
have missed something. But so far no enlightenment has come.

Reply to this email directly or view it on GitHub
#5 (comment)
.

@obeleh

This comment has been minimized.

Show comment
Hide comment
@obeleh

obeleh Nov 24, 2014

AE Frontend instance performance?

obeleh commented Nov 24, 2014

AE Frontend instance performance?

@Alfus

This comment has been minimized.

Show comment
Hide comment
@Alfus

Alfus Nov 24, 2014

Yes, the ah-builtin-datastoreservice version is what is serving the HTTP requests

Alfus commented Nov 24, 2014

Yes, the ah-builtin-datastoreservice version is what is serving the HTTP requests

@peterrham

This comment has been minimized.

Show comment
Hide comment
@peterrham

peterrham Feb 3, 2015

I'm measuring the latency between google cloud compute engine and google datastore.

I'm performing a simple lookup() using the python client library.

The google performance dashboard says that my requests are consuming aroud 18 milliseconds. I assume that this is a server side metric and not a round trip metric.

Can someone refer me to the service level agreements behind the minimum round trip response times I should expect currently between google compute engine and google datastore?

For a trivial lookup, i'm experiencing around 60 milliseconds. I would expect around 20 milliseconds.

Here's the code, not that some of these files are not strictly correlated. Each is intended to be indicative on its own. If someone can show me some code with better latencies, then that would be great.

https://github.com/peterrham/projects/blob/master/google_cloud/read.py

Here's an example output file:

https://github.com/peterrham/projects/blob/master/google_cloud/read.out

In this example, I'm getting 77 milliseconds. I'm not trying to be statistically significant here, I'm just looking for some indicative guidance.

Data store latency here is under 15 milliseconds:
http://code.google.com/status/appengine/detail/datastore/2015/02/02#ae-trust-detail-datastore-get-latency

I also have a sample tcpdump ascii text out put file:

https://github.com/peterrham/projects/blob/master/google_cloud/40ms.txt

using this command line: (the time stamps are the delta values in between the packet events)

/usr/sbin/tcpdump -r tcpdump.out -nnq -ttt > 40ms.txt

For example, the TCP initial SYN is acked in 1 millisecond, so network latency does not seem to be a problem.

However, the ack from the lookup() request is over 40 milliseconds after the request.

Server side latency from the app engine logs is 9 milliseconds:

2015-02-02 16:15:55.753 /datastore/v1beta2/Lookup 200 9ms 0kb module=default version=ah-builtin-datastoreservice
10.64.21.5 - - [02/Feb/2015:16:15:55 -0800] "POST /datastore/v1beta2/Lookup HTTP/1.1" 200 136 - - "ah-builtin-datastoreservice-dot-glowing-thunder-842.appspot.com" ms=10 cpu_ms=18 cpm_usd=0.000015 app_engine_release=1.9.17 instance=00c61b117c76349d57bd7ae2e3c635edd5c994da

any ideas? Are their any buffering configurations to setting to get the minimum latency?

peterrham commented Feb 3, 2015

I'm measuring the latency between google cloud compute engine and google datastore.

I'm performing a simple lookup() using the python client library.

The google performance dashboard says that my requests are consuming aroud 18 milliseconds. I assume that this is a server side metric and not a round trip metric.

Can someone refer me to the service level agreements behind the minimum round trip response times I should expect currently between google compute engine and google datastore?

For a trivial lookup, i'm experiencing around 60 milliseconds. I would expect around 20 milliseconds.

Here's the code, not that some of these files are not strictly correlated. Each is intended to be indicative on its own. If someone can show me some code with better latencies, then that would be great.

https://github.com/peterrham/projects/blob/master/google_cloud/read.py

Here's an example output file:

https://github.com/peterrham/projects/blob/master/google_cloud/read.out

In this example, I'm getting 77 milliseconds. I'm not trying to be statistically significant here, I'm just looking for some indicative guidance.

Data store latency here is under 15 milliseconds:
http://code.google.com/status/appengine/detail/datastore/2015/02/02#ae-trust-detail-datastore-get-latency

I also have a sample tcpdump ascii text out put file:

https://github.com/peterrham/projects/blob/master/google_cloud/40ms.txt

using this command line: (the time stamps are the delta values in between the packet events)

/usr/sbin/tcpdump -r tcpdump.out -nnq -ttt > 40ms.txt

For example, the TCP initial SYN is acked in 1 millisecond, so network latency does not seem to be a problem.

However, the ack from the lookup() request is over 40 milliseconds after the request.

Server side latency from the app engine logs is 9 milliseconds:

2015-02-02 16:15:55.753 /datastore/v1beta2/Lookup 200 9ms 0kb module=default version=ah-builtin-datastoreservice
10.64.21.5 - - [02/Feb/2015:16:15:55 -0800] "POST /datastore/v1beta2/Lookup HTTP/1.1" 200 136 - - "ah-builtin-datastoreservice-dot-glowing-thunder-842.appspot.com" ms=10 cpu_ms=18 cpm_usd=0.000015 app_engine_release=1.9.17 instance=00c61b117c76349d57bd7ae2e3c635edd5c994da

any ideas? Are their any buffering configurations to setting to get the minimum latency?

@peterrham

This comment has been minimized.

Show comment
Hide comment
@peterrham

peterrham Feb 3, 2015

Looks like the 40ms is related to Nagle's algorithm, although I do not think that it accounts for the delayed response which is over 40ms, but i'm not sure.

http://neophob.com/2013/09/rpc-calls-and-mysterious-40ms-delay/

peterrham commented Feb 3, 2015

Looks like the 40ms is related to Nagle's algorithm, although I do not think that it accounts for the delayed response which is over 40ms, but i'm not sure.

http://neophob.com/2013/09/rpc-calls-and-mysterious-40ms-delay/

@gcjc

This comment has been minimized.

Show comment
Hide comment
@gcjc

gcjc Mar 27, 2015

Hi - we are doing some app and backend testing (prior to launch) and are seeing this exact problem (looking at the ah-builtin-datastoreservice) we see times form 10-20ms to 2000ms. Do you have any timescale on a fix (or when you will apply any such fix already made to the current Beta channel)? Else we'll just migrate to Dynamo. Thanks.

gcjc commented Mar 27, 2015

Hi - we are doing some app and backend testing (prior to launch) and are seeing this exact problem (looking at the ah-builtin-datastoreservice) we see times form 10-20ms to 2000ms. Do you have any timescale on a fix (or when you will apply any such fix already made to the current Beta channel)? Else we'll just migrate to Dynamo. Thanks.

@cerdmann

This comment has been minimized.

Show comment
Hide comment
@cerdmann

cerdmann Mar 29, 2015

I second the motion to move to Dynamo.

cerdmann commented Mar 29, 2015

I second the motion to move to Dynamo.

@andrewferk

This comment has been minimized.

Show comment
Hide comment
@andrewferk

andrewferk Apr 12, 2015

I am also disappointed with the performance I'm seeing from GCD. A couple patterns I've noticed: 1) the first request from a new datastore connection is always slow, and 2) batch mutations (even a batch of 100 entities) is painfully slow.

andrewferk commented Apr 12, 2015

I am also disappointed with the performance I'm seeing from GCD. A couple patterns I've noticed: 1) the first request from a new datastore connection is always slow, and 2) batch mutations (even a batch of 100 entities) is painfully slow.

@obeleh

This comment has been minimized.

Show comment
Hide comment
@obeleh

obeleh Apr 13, 2015

In my experience it is better than it was when this issue was opened. I used to see a lot more response times between 500ms and 1000ms. Most of my requests are between 200ms and 500ms. Perhaps I've segmented my data better?

obeleh commented Apr 13, 2015

In my experience it is better than it was when this issue was opened. I used to see a lot more response times between 500ms and 1000ms. Most of my requests are between 200ms and 500ms. Perhaps I've segmented my data better?

@hbizira

This comment has been minimized.

Show comment
Hide comment
@hbizira

hbizira Apr 13, 2015

@cerdmann @gcjc I would advise against moving to dynamodb if you're concerned about keeping costs as low as possible. I did an evaluation after hitting this issue and in my opinion dynamodb has one major drawback. You have to provision your read/write capacity ahead of time. This makes it difficult to adjust to a sudden spike in traffic while keeping costs down. There are some auto scaling libraries that attempt to help with this but their adjustment time is not very fast and dynamo currently limits you to scaling back down only a few times a day.

I'm really hoping this issue gets fixed soon as there's a huge node.js community that could really take advantage of GCD.

hbizira commented Apr 13, 2015

@cerdmann @gcjc I would advise against moving to dynamodb if you're concerned about keeping costs as low as possible. I did an evaluation after hitting this issue and in my opinion dynamodb has one major drawback. You have to provision your read/write capacity ahead of time. This makes it difficult to adjust to a sudden spike in traffic while keeping costs down. There are some auto scaling libraries that attempt to help with this but their adjustment time is not very fast and dynamo currently limits you to scaling back down only a few times a day.

I'm really hoping this issue gets fixed soon as there's a huge node.js community that could really take advantage of GCD.

@gcjc

This comment has been minimized.

Show comment
Hide comment
@gcjc

gcjc Apr 13, 2015

@obeleh it is hard to predict when there will be problems (and it seems to be any operation).

We will likely knock up some longer running tests internally to see if we can spot any patterns. However, we just ran a very short test and it seems to have improved, most requests between 150-220ms. Anyone else seen any improvement, just wondering if any changes in the app engine front end have been made?

@hbizira we really do want to use GCD, moving to dynamo will be the last resort.

gcjc commented Apr 13, 2015

@obeleh it is hard to predict when there will be problems (and it seems to be any operation).

We will likely knock up some longer running tests internally to see if we can spot any patterns. However, we just ran a very short test and it seems to have improved, most requests between 150-220ms. Anyone else seen any improvement, just wondering if any changes in the app engine front end have been made?

@hbizira we really do want to use GCD, moving to dynamo will be the last resort.

@cerdmann

This comment has been minimized.

Show comment
Hide comment
@cerdmann

cerdmann Apr 13, 2015

@hbizira, thanks for the advice. I'm in the same boat as @gcjc in that we really want to use GCD, but looking into other options as the latency is killing us.

cerdmann commented Apr 13, 2015

@hbizira, thanks for the advice. I'm in the same boat as @gcjc in that we really want to use GCD, but looking into other options as the latency is killing us.

@jonface

This comment has been minimized.

Show comment
Hide comment
@jonface

jonface Jul 8, 2015

Just started using gcloud via nodejs and it's a bit disappointing. Getting from 200ms - 500ms approx latency. This is running on RH OpenShift (really AWS) in the US and EU. Tried my home connection too, all roughly the same.

I hope I'm doing something wrong :/

jonface commented Jul 8, 2015

Just started using gcloud via nodejs and it's a bit disappointing. Getting from 200ms - 500ms approx latency. This is running on RH OpenShift (really AWS) in the US and EU. Tried my home connection too, all roughly the same.

I hope I'm doing something wrong :/

@peterrham

This comment has been minimized.

Show comment
Hide comment
@peterrham

peterrham Jul 9, 2015

I got better results than that, but still not great. What do you mean
"AWS"? My timings were from a google compute host to google cloud datastore.

By the way, I'm trying out google "bigtable" which is in Beta. It promises
a great SLA sub 10 ms for 99 percentile I think. I have tried it out, but
have not measured the latency, but I believe it!

On Wed, Jul 8, 2015 at 4:54 PM, jonface notifications@github.com wrote:

Just started using gcloud via nodejs and it's a bit disappointing. Getting
from 200ms - 500ms approx latency. This is running on RH OpenShift (really
AWS) in the US and EU. Tried my home connection too, all roughly the same.

I hope I'm doing something wrong :/


Reply to this email directly or view it on GitHub
#5 (comment)
.

peterrham commented Jul 9, 2015

I got better results than that, but still not great. What do you mean
"AWS"? My timings were from a google compute host to google cloud datastore.

By the way, I'm trying out google "bigtable" which is in Beta. It promises
a great SLA sub 10 ms for 99 percentile I think. I have tried it out, but
have not measured the latency, but I believe it!

On Wed, Jul 8, 2015 at 4:54 PM, jonface notifications@github.com wrote:

Just started using gcloud via nodejs and it's a bit disappointing. Getting
from 200ms - 500ms approx latency. This is running on RH OpenShift (really
AWS) in the US and EU. Tried my home connection too, all roughly the same.

I hope I'm doing something wrong :/


Reply to this email directly or view it on GitHub
#5 (comment)
.

@jonface

This comment has been minimized.

Show comment
Hide comment
@jonface

jonface Jul 9, 2015

I was referring to Amazon Web Services EC2 which is the underlying infrastructure of OpenShift. Isn't BigTable more expensive and overkill?

I'll check my timings with wireshark. Also my datastore is empty, so it's not like it's got millions of items in it.

jonface commented Jul 9, 2015

I was referring to Amazon Web Services EC2 which is the underlying infrastructure of OpenShift. Isn't BigTable more expensive and overkill?

I'll check my timings with wireshark. Also my datastore is empty, so it's not like it's got millions of items in it.

@jonface

This comment has been minimized.

Show comment
Hide comment
@jonface

jonface Jul 9, 2015

OK, it's hard to tell exactly how long it's taking via wireshark due to the TLS but you can kind of guess.

TLS connection SYN to FIN - total 460ms
- TLS Application data at 126ms
- TLS Application data at 201ms
- TLS Application data at 399ms
- TLS Application data at 400ms
- TLS Application data at 455ms

What is interesting is that for every query, a new connection is setup/destroyed. Is this correct? Am I doing something wrong? Would it not be better to keep some connection pool and reuse connections?

Thanks

jonface commented Jul 9, 2015

OK, it's hard to tell exactly how long it's taking via wireshark due to the TLS but you can kind of guess.

TLS connection SYN to FIN - total 460ms
- TLS Application data at 126ms
- TLS Application data at 201ms
- TLS Application data at 399ms
- TLS Application data at 400ms
- TLS Application data at 455ms

What is interesting is that for every query, a new connection is setup/destroyed. Is this correct? Am I doing something wrong? Would it not be better to keep some connection pool and reuse connections?

Thanks

@jonface

This comment has been minimized.

Show comment
Hide comment
@jonface

jonface Jul 10, 2015

I realise I'm being impatient but what's the plan for this? Is this just a nodejs lib problem?

jonface commented Jul 10, 2015

I realise I'm being impatient but what's the plan for this? Is this just a nodejs lib problem?

@dhermes

This comment has been minimized.

Show comment
Hide comment
@dhermes

dhermes Aug 17, 2015

Member

@pcostell Will this be addressed with v1beta3?

Member

dhermes commented Aug 17, 2015

@pcostell Will this be addressed with v1beta3?

@pcostell

This comment has been minimized.

Show comment
Hide comment
@pcostell

pcostell Aug 17, 2015

Member

That is the goal, but we are still working on benchmarking v1beta3.

Member

pcostell commented Aug 17, 2015

That is the goal, but we are still working on benchmarking v1beta3.

@InfiniteRandomVariable

This comment has been minimized.

Show comment
Hide comment
@InfiniteRandomVariable

InfiniteRandomVariable Jan 1, 2016

@pcostell I am seriously considering this service. Please post an update about this issue since it has been a while. Thanks for the great work.

InfiniteRandomVariable commented Jan 1, 2016

@pcostell I am seriously considering this service. Please post an update about this issue since it has been a while. Thanks for the great work.

@pcostell

This comment has been minimized.

Show comment
Hide comment
@pcostell

pcostell Jan 8, 2016

Member

We are seeing much better latency numbers with v1beta3. v1beta3 has been a complete rewrite in our infrastructure and as such we are being very cautious about rolling it out. We hope to have it ready early this year.

Member

pcostell commented Jan 8, 2016

We are seeing much better latency numbers with v1beta3. v1beta3 has been a complete rewrite in our infrastructure and as such we are being very cautious about rolling it out. We hope to have it ready early this year.

@peterrham

This comment has been minimized.

Show comment
Hide comment
@peterrham

peterrham Jan 8, 2016

Great!

On Fri, Jan 8, 2016 at 2:09 PM, Patrick Costello notifications@github.com
wrote:

We are seeing much better latency numbers with v1beta3. v1beta3 has been a
complete rewrite in our infrastructure and as such we are being very
cautious about rolling it out. We hope to have it ready early this year.


Reply to this email directly or view it on GitHub
#5 (comment)
.

peterrham commented Jan 8, 2016

Great!

On Fri, Jan 8, 2016 at 2:09 PM, Patrick Costello notifications@github.com
wrote:

We are seeing much better latency numbers with v1beta3. v1beta3 has been a
complete rewrite in our infrastructure and as such we are being very
cautious about rolling it out. We hope to have it ready early this year.


Reply to this email directly or view it on GitHub
#5 (comment)
.

@alexfernandez

This comment has been minimized.

Show comment
Hide comment
@alexfernandez

alexfernandez Feb 8, 2016

@pcostell What numbers are you seeing? Anything below 20 ms? That is the performance goal that we have, and it is easily achievable in DynamoDB. This issue is a dealbreaker for us for moving to Google Cloud right now.

alexfernandez commented Feb 8, 2016

@pcostell What numbers are you seeing? Anything below 20 ms? That is the performance goal that we have, and it is easily achievable in DynamoDB. This issue is a dealbreaker for us for moving to Google Cloud right now.

@leonardaustin

This comment has been minimized.

Show comment
Hide comment
@leonardaustin

leonardaustin Feb 15, 2016

I thought I would share some instrumentation as I found this issue and have been following it with interest. Below is 95th percentile and mean (in milli-seconds) for PUT requests moving from v1beta2 to v1beta3. Both show an increase of ~10x (~25ms & ~10ms) - good work guys! It would also be nice to get it out of beta endpoints for us to use in production. One thing worth mentioning is that the v1beta3 seems to be missing the transaction endpoint.

95th
screen shot 2016-02-15 at 18 22 57

Mean
screen shot 2016-02-15 at 18 25 17

leonardaustin commented Feb 15, 2016

I thought I would share some instrumentation as I found this issue and have been following it with interest. Below is 95th percentile and mean (in milli-seconds) for PUT requests moving from v1beta2 to v1beta3. Both show an increase of ~10x (~25ms & ~10ms) - good work guys! It would also be nice to get it out of beta endpoints for us to use in production. One thing worth mentioning is that the v1beta3 seems to be missing the transaction endpoint.

95th
screen shot 2016-02-15 at 18 22 57

Mean
screen shot 2016-02-15 at 18 25 17

@eddavisson

This comment has been minimized.

Show comment
Hide comment
@eddavisson

eddavisson Feb 16, 2016

Contributor

Hi @leonardaustin, can you share details about what service(s) you're using? We haven't actually launched v1beta3 yet, so I'm wondering if we're talking about two different things.

Contributor

eddavisson commented Feb 16, 2016

Hi @leonardaustin, can you share details about what service(s) you're using? We haven't actually launched v1beta3 yet, so I'm wondering if we're talking about two different things.

@leonardaustin

This comment has been minimized.

Show comment
Hide comment
@leonardaustin

leonardaustin Feb 16, 2016

@eddavisson Sure, I forked https://github.com/GoogleCloudPlatform/gcloud-golang and changed the url from v1beta2 to v1beta3.

leonardaustin commented Feb 16, 2016

@eddavisson Sure, I forked https://github.com/GoogleCloudPlatform/gcloud-golang and changed the url from v1beta2 to v1beta3.

@dhermes

This comment has been minimized.

Show comment
Hide comment
@dhermes

dhermes Feb 16, 2016

Member

gcloud-python has "already" made the switch (in a branch)

All features from v1beta3 are present, though the URIs are different.

Member

dhermes commented Feb 16, 2016

gcloud-python has "already" made the switch (in a branch)

All features from v1beta3 are present, though the URIs are different.

@obeleh

This comment has been minimized.

Show comment
Hide comment
@obeleh

obeleh Feb 17, 2016

Is there a list of changes I should see or can I just change the url and I'm set?

obeleh commented Feb 17, 2016

Is there a list of changes I should see or can I just change the url and I'm set?

@eddavisson

This comment has been minimized.

Show comment
Hide comment
@eddavisson

eddavisson Feb 17, 2016

Contributor

The v1beta3 API is not yet available to customers. We will be sure to post announcement here when it is.

Contributor

eddavisson commented Feb 17, 2016

The v1beta3 API is not yet available to customers. We will be sure to post announcement here when it is.

@obeleh

This comment has been minimized.

Show comment
Hide comment
@obeleh

obeleh Feb 18, 2016

If possible I would like to sign up for the public beta of the beta?

obeleh commented Feb 18, 2016

If possible I would like to sign up for the public beta of the beta?

@faizalkassamalisc

This comment has been minimized.

Show comment
Hide comment
@faizalkassamalisc

faizalkassamalisc Mar 14, 2016

@pcostell, is v1beta3 still on track to "be released this quarter" as per #34? :)

image

faizalkassamalisc commented Mar 14, 2016

@pcostell, is v1beta3 still on track to "be released this quarter" as per #34? :)

image

@dmcgrath

This comment has been minimized.

Show comment
Hide comment
@dmcgrath

dmcgrath Mar 15, 2016

Contributor

@faizalkassamalisc, we're busy reticulating splines and hopefully will have an update for you soon. Thanks for your patience!

@obeleh, the beta will be public and just requires you using the appropriate API clients. There will be no other sign-up beyond the normal project creation you do currently.

Contributor

dmcgrath commented Mar 15, 2016

@faizalkassamalisc, we're busy reticulating splines and hopefully will have an update for you soon. Thanks for your patience!

@obeleh, the beta will be public and just requires you using the appropriate API clients. There will be no other sign-up beyond the normal project creation you do currently.

@pcostell pcostell closed this Apr 4, 2016

@alexfernandez

This comment has been minimized.

Show comment
Hide comment
@alexfernandez

alexfernandez Apr 4, 2016

Is there a more technical page with latencies expressed as milliseconds? Thanks!

alexfernandez commented Apr 4, 2016

Is there a more technical page with latencies expressed as milliseconds? Thanks!

@dmcgrath

This comment has been minimized.

Show comment
Hide comment
@dmcgrath

dmcgrath Apr 4, 2016

Contributor

We don't have a page with latency numbers since it's a moving target as we continually improve the platform as well as depending upon both the location your Cloud Datastore was set up for as well as from where you are accessing it.

Keep in mind you cannot compare DynamoDB with Cloud Datastore directly as we're a functionally different service and in most cases are serving customers in a Multi-Regional instance vs merely Regional.

Contributor

dmcgrath commented Apr 4, 2016

We don't have a page with latency numbers since it's a moving target as we continually improve the platform as well as depending upon both the location your Cloud Datastore was set up for as well as from where you are accessing it.

Keep in mind you cannot compare DynamoDB with Cloud Datastore directly as we're a functionally different service and in most cases are serving customers in a Multi-Regional instance vs merely Regional.

@rajeshshetty

This comment has been minimized.

Show comment
Hide comment
@rajeshshetty

rajeshshetty Nov 10, 2016

Why the first request from a datastore is always slow?
Is there a way to fix this.

rajeshshetty commented Nov 10, 2016

Why the first request from a datastore is always slow?
Is there a way to fix this.

@StdInOut

This comment has been minimized.

Show comment
Hide comment
@StdInOut

StdInOut Oct 2, 2017

Hello Guys, this is probably the wrong place for ask this question but I didn't find any better place.

Could datastore using local disks play the same role of redis for session cache? Everything would be much easier and cheaper to our company.

StdInOut commented Oct 2, 2017

Hello Guys, this is probably the wrong place for ask this question but I didn't find any better place.

Could datastore using local disks play the same role of redis for session cache? Everything would be much easier and cheaper to our company.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment