Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance of PHP(+xdebug) after sleep and waking up because of clock changing from TSC to HPET #3455

Open
anton-siardziuk opened this issue Jan 12, 2019 · 24 comments

Comments

@anton-siardziuk
Copy link

commented Jan 12, 2019

  • I have tried with the latest version of my channel (Stable or Edge)
  • I have uploaded Diagnostics
  • Diagnostics ID: 327AD075-DF8F-491F-AF4D-200FBC3FB830/20190112205307

Expected behavior

Test scenario should always take 30s to complete

Actual behavior

Test scenario takes 30s to complete after Docker for Mac restart, and takes 1m30s-2m to complete after sleep and wakeup.

Information

  • macOS Version: 10.14.1 (18B75)

I have some PHP test scenario for my application I can run to check how good or bad performance is. I also have a linux laptop to compare with. This test in docker on linux completes in 30s. When I restart Docker for Mac, it runs 30s on Docker for Mac as well, but after I put MacBook to sleep and wake it up, the very same test takes 1m30s-2m. And it would take this long until I restart Docker for Mac. After restart, this test takes 30s again. I can reproduce it very reliably every time I try. The very same issue was reported a year ago here: EugenMayer/docker-sync#524

Diagnostic logs

Docker for Mac: Version 2.0.0.0-mac81 (29211)

Steps to reproduce the behavior

  1. Run something with fixed run time, see time X
  2. Put Mac to sleep, wake it up
  3. Run it again, see time 3X
  4. Restart Docker for Mac
  5. Run it again, see time X
@anton-siardziuk

This comment has been minimized.

Copy link
Author

commented Jan 13, 2019

I started working on a way to easily reproduce this issue outside of my project. My first call was to try to run some C compilation so I created docker-compose.yml file like this:

version: '3.3'
services:
  benchmark:
    image: php:7.2.14-cli-stretch
    command: docker-php-ext-install json pdo pdo_mysql

And run it like that

time docker-compose up

Where docker-php-ext-install basically compiles those extensions.
With no luck: compile time is the same before and after sleep (~25s).
So it is not CPU who works slower. Will try to investigate further. My next idea is to run some strace and compare results from fast and slow tests, but it might take some time to dig through these results. Any ideas how to pin it down are welcome.

@anton-siardziuk

This comment has been minimized.

Copy link
Author

commented Jan 13, 2019

More info on this problem. Ran a test php script through strace and the difference between good and bad performance straces is millions (literally) calls of gettimeofday when performance is bad. I also found this clock related logs:

2019-01-13 13:54:39.119934+0300  localhost com.docker.driver.amd64-linux[35411]: proxy << POST /v1.39/exec/a177f7c2bbbd2fc5816387e002da09680c89d4b828e59dc7627513a9e4e4b737/resize?
h=41&w=179 (181.771536ms)
2019-01-13 13:56:40.584949+0300  localhost com.docker.vpnkit[35410]: TCP 192.168.65.3:53910 > 212.211.132.250:80 closing flow due to idle port disconnection
2019-01-13 13:56:40.585113+0300  localhost com.docker.vpnkit[35410]: TCP 192.168.65.3:33558 > 212.211.132.250:80 closing flow due to idle port disconnection
2019-01-13 14:01:47.719443+0300  localhost kernel[0]: PM notification timeout (pid 35411, com.docker.drive)
2019-01-13 14:01:47.719449+0300  localhost kernel[0]: PM notification timeout (pid 35411, com.docker.drive)
2019-01-13 14:06:09.957178+0300  localhost com.docker.hyperkit[35416]: [ 1616.560466] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
2019-01-13 14:06:09.964637+0300  localhost com.docker.hyperkit[35416]: [ 1616.579018] clocksource:                       'hpet' wd_now: 3bd42798 wd_last: 3ad23599 mask: ffffffff
2019-01-13 14:06:09.967614+0300  localhost com.docker.hyperkit[35416]: [ 1616.581360] clocksource:                       'tsc' cs_now: bf689879560 cs_last: b761a0f445c mask: ffffffffffffffff
2019-01-13 14:06:09.972656+0300  localhost com.docker.hyperkit[35416]: [ 1616.587694] Hangcheck: hangcheck value past margin!
2019-01-13 14:06:09.990049+0300  localhost com.docker.hyperkit[35416]: [ 1616.602045] clocksource: Switched to clocksource hpet
2019-01-13 14:06:09.997985+0300  localhost com.docker.driver.amd64-linux[35411]: Requesting time sync
2019-01-13 14:06:10.420056+0300  localhost com.docker.driver.amd64-linux[35411]: cannot read sysevent: EOF
@anton-siardziuk

This comment has been minimized.

Copy link
Author

commented Jan 13, 2019

So after a lot of gdb and strace and google, I think I understand what is going on. Linux on wakeup thinks that super fast but not that reliable TSC clock no longer up to date and switches to super reliable but not that fast HPET clock. First one works without any syscalls - in userspace, but HPET works though syscall, hence millions of gettimeofday sys calls when performance is bad. And PHP applications in development mode generate millions calls of current time (because of how debugger in PHP works). That's why this clock switch is very noticeable - everything is 3 times slower after wakeup.

@anton-siardziuk anton-siardziuk changed the title Bad performance after waking up Bad performance of PHP(+xdebug) after sleep and waking up because of clock changing from TSC to HPET Jan 14, 2019
@barat

This comment has been minimized.

Copy link

commented Jan 23, 2019

Nice catch - I have similar issues with big Symfony (+ES) project. Until sleep - everything seems to be OK, after wakeup - 10x degraded performance. I Use docker-sync to eliminate bind mount issues (thinking about mutagen tho), but this thing will happen even without any mounts. Hope, that someone from dev team will take a closer look at it. Now, Docker Desktop is really problematic software to work it - requires workarounds, constant restarts, has timedrifts ...

@ThHareau

This comment has been minimized.

Copy link

commented Jan 28, 2019

I'm having this issue also, on fedora since a while actually (almost 6 months).

@Blackskyliner Blackskyliner referenced this issue Jan 28, 2019
3 of 3 tasks complete
@remcoverton

This comment has been minimized.

Copy link

commented Feb 19, 2019

We are having exactly the same issue and it's blocking us from migrating local development to a docker setup. We are on the latest stable version: 2.0.0.3 (31259), Mac 10.14.3.

For testing purposes: I have created a small container that runs the following https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-slower-on-ec2/ example c script:

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

int
main(int argc, char *argv[])
{
        struct timeval tv;
        int i = 0;
        for (; i<100; i++) {
                gettimeofday(&tv,NULL);
        }

        return 0;
}
#!/bin/bash

while true; do
    strace -ce gettimeofday /gettimeofdaytest/test
    sleep 10
done

After closing the Mac and waiting a minute or 2 the script starts outputting trace information indicating the calls are now system calls. The source of the issue seems to also be that unstable TSC Clocksource.

linuxkit-025000000001:~# echo 'tsc' > /sys/devices/system/clocksource/clocksourc e0/current_clocksource [ 4483.670063] clocksource: Override clocksource tsc is unstable and not HRT compatible - cannot switch while in HRT/NOHZ mode

When I restart docker desktop the problem is gone again.

@tonivdv

This comment has been minimized.

Copy link

commented Feb 27, 2019

Was going to create a new support ticket until I saw this one. I exactly have the same issue, and also with a php project with a lot of mounted files. Only a docker for mac restart fixes the issue! Knowing that most mac user put their mac in sleep mode this will be very annoying.

Hope you can find a solution to this.

@traed

This comment has been minimized.

Copy link

commented Mar 11, 2019

Same here. Hoping for a quick fix!

@dchassin

This comment has been minimized.

Copy link

commented Mar 20, 2019

@m00t: kudos to you. We are experiencing the same problem with gridlabd. A complete validation run normally takes about 400 seconds. After sleep it typically takes around 2000 seconds. Your conclusion is entirely consistent with our findings. After a lot of playing around with linux builds on different VM platforms we decided it had to be in the docker system itself. For example, after the same sleep interval (overnight) centos:7 docker is slow, but a VMware Centos 7 is ok.

The only solution we have now is to restart the docker daemon.

@thejoshsmith

This comment has been minimized.

Copy link

commented Apr 27, 2019

I've noticed this exact behavior, although I've only just had time to investigate. Typically we run a stack of PHP, MySQL and Apache proxied through Nginx and in my test (which is based off a profiler read-out), I'm using CraftCMS with PHP 7.2.

This is a screenshot of the profiler with Docker daemon restarted AND the container stack freshly recreated:

Screen Shot 2019-04-27 at 11 31 45 PM

~1.5s to load the page (I am using Mutagen for keeping files in sync).

After putting my computer to sleep and waking it up again, the same page reload yielded the following results:

Screen Shot 2019-04-27 at 10 12 40 PM

~6 seconds!?!

Finally, after restarting Docker I was getting the following output:

Screen Shot 2019-04-27 at 10 16 05 PM

~3 seconds... Still not at the original 1.5 seconds.

It wasn't until I restarted the docker daemon and re-created my containers that the page load time came back down to 1.5 seconds (Mutagen was left running the entire time).

@remcoverton

This comment has been minimized.

Copy link

commented May 2, 2019

I have updated to the latest docker ( edge ) version and the problem still persists in exactly the same manner as described above.

2.0.4.0

  • Engine: 19.03.0-beta3
  • Compose: 1.24.0
  • Kubernetes: v1.14.1
  • Notary: 0.6.1
  • Credential Helper: 0.6.0
  • Machine: 0.16.1
@remcoverton

This comment has been minimized.

Copy link

commented Jun 13, 2019

Tested on the latest docker Edge version, it still falls back to hpet.

2.0.5.0

docker-desktop:# [ 450.623931] clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 450.627869] clocksource: 'hpet' wd_now: cf3bfb7 wd_last: c111b9b mask: ffffffff
[ 450.630547] clocksource: 'tsc' cs_now: 21d88179225 cs_last: 21c9504986d mask: ffffffffffffffff
[ 450.634050] clocksource: Switched to clocksource hpet
docker-desktop:
# cat /sys/devices/system/clocksource/clocksource0/current_clock
source
hpet

@tomasinchoo

This comment has been minimized.

Copy link

commented Jun 30, 2019

Until this issue is resolved in stable version, it is impossible to use docker on Mac.

@dchassin

This comment has been minimized.

Copy link

commented Jun 30, 2019

We’ve been able to use docker on macs as long as we don’t let the Mac sleep. If it sleeps we have to restart docker.

@tomasinchoo

This comment has been minimized.

Copy link

commented Jun 30, 2019

We’ve been able to use docker on macs as long as we don’t let the Mac sleep. If it sleeps we have to restart docker.

Yea, I know, but that is not much of a solution, especially on Mac which sleeps a lot and has limited charges. I don't even use Mac, but we have a team that does and I really can't even understand why. It is so limited and weak.
Unfortunately I'm the one handling docker images for them. :(

@dchassin

This comment has been minimized.

Copy link

commented Jun 30, 2019

Indeed. It’s been a real source of frustration for most of us because we use docker alot on laptops. Sleep is very common even when running long jobs. It has to be fixed soon.

@smaudet

This comment has been minimized.

Copy link

commented Jul 11, 2019

Slightly different problem but I think it might be a problem with OSX itself...

Running VBox for a linux VM (so similar situation to Docker) - on OSX sleep and then resume, the whole computer is unusable, requiring a restart...

This is some shoddy OSX resume/restart logic I think. Anyway to downgrade OSX to a version which worked?

@tonivdv

This comment has been minimized.

Copy link

commented Jul 12, 2019

@smaudet Never had that issue with vbox. I used for a very long time docker toolbox with nfs enabled and I never had the issue described here. So I doubt it's related.

@dchassin

This comment has been minimized.

Copy link

commented Jul 12, 2019

Slightly different problem but I think it might be a problem with OSX itself...

If I had to guess, it's an interaction between the three systems, and not something that is exclusively the fault of any one system. It works fine on VMware Linux on OSX, but it doesn't work on Docker Linux on OSX, where Linux and OSX are the same versions for both cases. Perhaps when OSX wakes from a sleep, Linux isn't being told what happened and it doesn't reset the clock stability check properly on Docker, whereas on VMware it is. Since VMware can do it but Docker can't, that suggests OSX isn't to blame alone for this problem.

@smaudet

This comment has been minimized.

Copy link

commented Jul 12, 2019

@dchassin the curious thing is there isn't an issue on the windows or linux host side of things... perhaps it is partly related to the software? I.e. VMWare catches the sleep event while Docker and VBox don't...

Could be hardware too, I've not ruled that out yet (would try to BootCamp windows or linux host and see if the problem persists).

@carlos-granados

This comment has been minimized.

Copy link

commented Sep 15, 2019

I have created an issue in the Xdebug repo asking for a way to disable this call to getTimeOfDay as this is just used to timestamp stack entries and I think that 99% of the applications that use Xdebug don't need this info

@joniklas

This comment has been minimized.

Copy link

commented Sep 16, 2019

Thanks @carlos-granados for your initiative. This could decrease the symptoms of the slow clocksource problem.

Since getTimeOfDay is used in many other cases than debugging: It would be great, if this issue on Macs could be solved at the core - so inside docker. (Also there are other low level calls wich are slow in debugging. E.g. getPid.)

@derickr

This comment has been minimized.

Copy link

commented Sep 16, 2019

Is there a different clock source that can be used that does not have this problem?

@Dolondro

This comment has been minimized.

Copy link

commented Sep 16, 2019

Is there a different clock source that can be used that does not have this problem?

There are three clock sources available most of the time. tsc, hpet and acpi_pm. If TSC desyncs (causing this issue) then we're left with only hpet and acpi_pm.

I'll try to get around to running some benchmarks with acpi instead, but from my reading about different clock sources I suspect anything non-TSC will end up with the same issue.

As a side note, here's a couple of helpful bash scripts that we've been using internally lately to at least increase visibility of the issue, which may or may not be useful to some others experiencing this issue.

Verifying if this is currently effecting your containers:

YELLOW="\033[0;33m"
RED="\033[31m"
RESET="\033[0m"
DOCKER_CLOCK_SOURCE=$(docker exec -ti {my_docker_container} /bin/bash -c 'cat /sys/devices/system/clocksource/clocksource0/current_clocksource | tr -d "[:space:]"')
if [[ ${DOCKER_CLOCK_SOURCE} != "tsc" ]]; then
    echo -e "${RED}WARNING!${RESET} - ${YELLOW}Clock drift detected. Please restart Docker Desktop to avoid severe performance issues${RESET}"
fi

Automatically killing and restarting Docker for Desktop via CLI:

echo "Killing Docker"
osascript -e 'quit app "Docker"';
echo "Restarting Docker"
open -a Docker
echo "Waiting for Docker to start"
while ! docker system info > /dev/null 2>&1; do printf "."; sleep 1; done; printf "\n";
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.