Extremely slow image pulls in Singapore region #2390

jenademoodley · 2024-06-14T13:03:00Z

Problem description

When pulling an image from a VM in southeast asia (specifically SIngapore) the image pull takes an extremely long time.
It is worth noting that not all images are effected (the nginx image downloads as expected), and also not all layers. For example, the grafana/oncall:v1.3.115 image has issues with layers 7, 10, and 12:
This issue is observed in AWS EC2 instances, as well as Google Cloud VMs. As such issue is not isolated to any cloud-provider and appears to be on DockerHub's side
The issue is only observed in Singapore. Testing in Dublin on AWS (eu-west-1 region) the image pull works as expected

Task List

This is NOT a security issue
I do NOT have a Docker subscription
I have looked through other issues and they do NOT apply to me

The text was updated successfully, but these errors were encountered:

sjainaajtak · 2024-06-14T13:11:59Z

We are also observing same issue on our EC2 server in Singapore region. The issue if for our private dockerhub repositories but public repositories are working fine. We have checked from Mumbai (ap-south-1) and the image pull works there.

guillemsf · 2024-06-14T14:33:03Z

It also fails to me in Singapore region ap-southeast-1, I wrote to dockerhub support a few hours ago. It works fine to me in us-east-1 and in eu-west-1

binman-docker · 2024-06-14T19:10:04Z

Hi folks, we're looking into this. If you open a support ticket, we can (securely) collect some information that will help us troubleshoot the issue with our CDN provider. Thanks!

guillemsf · 2024-06-14T20:05:58Z

Hi, I opened it 9 hours ago :/

this is the thread thread::UL33nXfrEZOcIDe5SD0qHAQ::

abirdatta · 2024-06-15T05:17:52Z

We are also facing exact same issue of from AWS Fargate service in both Singapore and Jakarta region. Also the issue is intermittent, and affecting specific layers of an image, in our case for private org/repo images.
We also raised a support ticket yesterday thread::mfp2QWSzGzeaAA0rpbUiLQQ::

jenademoodley · 2024-06-15T07:39:24Z

Created a support case as well should any further information be required.

Testing again today and I can see there has been some improvement. Previously image grafana/oncall:v1.3.115 had an issue with layer 7 but that layer was able to be pulled from Singapore without any issues. Other layers are still facing issues though (layers 12 and 13).

Edit: Adding the thread ID for the support case
thread::erfRmjZBH2OxgLSL5QAqxgQ::

kweefc · 2024-06-15T08:36:52Z

i have same issue under Malaysia East. pulling very slow speed around 30-50kb/s only even i got 300mbps line.

pull 26mb for 30min

NavindrenBaskaran · 2024-06-15T12:57:04Z

i'm experiencing slow docker image pull from our AWS ap-southeast-1 region as well.

angyts · 2024-06-15T23:13:03Z

Oh my, yes, same for us. its driving us crazy.

claytonchew · 2024-06-16T13:21:40Z

👍 Thanks @jenademoodley for encountering this issue and raising this.

The slow down is becoming a major issue across our clusters located in ap-southeast-1, we had to pulled back our scheduled major deployments in these area.

gabrielsim · 2024-06-17T16:16:17Z

Also experiencing the same issue when pulling from a residential ISP in Singapore (M1). Some layers are quick to download whilst the others are barely moving at all.

nsheaps · 2024-06-17T16:18:39Z

We did some traceroute-ing from an eks cluster in ap-southeast-1 after experiencing failed image pulls for over an hour on some images. We've seen a variety of symptoms including:

EOF during pulls
Authentication errors to our private repositories before pulls
Affecting both public and private repositories
Slow download speeds at about 1MB/s
as mentioned before, some images pulls were fine
affected all of our AZs in the ap-southeast-1 aws region (Singapore)
packet loss

During the traceroute, we noted that ap-southeast-1 calls to docker.io (not necessarily the registry URL), were getting routed to aws us-east-1 (based on public IP). Is there a location closer that it should have been routed to?

After discussing with AWS, their suggestion was to use something like ECR or hosting the image registry ourselves. Until we have this set up, we've effectively cut off the ap-southeast-1 region from our deploys and customer interactions, since we currently have a deployment mechanism that waits for deployments in k8s to become ready before proceeding (and times out after 1hr, even if it's just rolling 3 pods). We are also going to start testing docker's registry image as a pull through cache, and host in us-east-1. So far image pulls have been successful, even though in theory it's similar network pathing. ap-southeast-3 was our fallback plan due to it's close proximity to ap-southeast-1 but it sounds like that might not be feasible based on @abirdatta 's testing in Jakarta.

MTR report from eks node in ap-southeast-1

[root@ip-xxxxxxxxxxxx /]# mtr --report --report-cycles 100 -Tz docker.io
Start: 2024-06-11T17:18:43+0000
HOST: ip-xxxxxxxxxxx.ap-southeas Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???    ???                 100.0   100    0.0   0.0   0.0   0.0   0.0
  2. AS???    ???                 100.0   100    0.0   0.0   0.0   0.0   0.0
  3. AS???    ???                 100.0   100    0.0   0.0   0.0   0.0   0.0
  4. AS???    ???                 100.0   100    0.0   0.0   0.0   0.0   0.0
  5. AS???    241.0.13.133         0.0%   100    0.3   0.3   0.3   0.4   0.0
     AS???    241.0.13.128
     AS???    241.0.13.129
     AS???    241.0.13.138
     AS???    241.0.13.142
     AS???    241.0.13.132
     AS???    241.0.13.141
     AS???    241.0.13.136
  6. AS???    240.0.184.15         1.0%   100  2859. 1862. 208.4 9190. 2066.4
     AS???    240.0.184.3
     AS???    240.0.236.33
     AS???    240.0.184.1
     AS???    240.0.184.2
     AS???    240.0.184.35
     AS???    240.0.236.5
     AS???    240.0.184.32
  7. AS???    242.2.213.33         0.0%   100  229.8 227.7 208.9 248.8  10.0
     AS???    242.2.213.161
     AS???    242.3.84.33
     AS???    242.2.212.33
     AS???    242.3.85.33
     AS???    242.3.84.161
     AS???    242.3.85.161
  8. AS???    240.0.40.3           0.0%   100  230.1 226.5 209.1 240.2   7.8
     AS???    240.3.12.67
     AS???    240.4.112.66
     AS???    240.3.84.65
     AS???    240.3.12.98
     AS???    240.3.84.67
     AS???    240.0.56.97
     AS???    240.3.12.65
  9. AS???    241.0.4.195         76.0%   100  206.4 223.1 206.4 238.1   7.9
     AS???    241.0.4.215
     AS???    241.0.4.196
     AS???    241.0.4.209
     AS???    240.0.36.30
     AS???    241.0.4.198
     AS???    241.0.4.79
     AS???    241.0.4.95
 10. AS14618  ec2-44-193-181-103. 18.0%   100  211.9 230.5 211.9 241.1   7.2
     AS???    240.0.36.57
     AS???    240.0.36.50
     AS???    240.0.36.52

MTR report from my local laptop in the Boston area

sudo mtr --report --report-cycles 100 -Tz docker.io
Password:
Start: 2024-06-11T13:42:56-0400
HOST: xxxxxxxxx.local   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???    192.168.0.1          1.0%   100    4.0   3.4   2.5   6.5   0.6
  2. AS???    xxxxxxxxx          2.0%   100   14.0  14.8   7.4  29.3   3.1
        xxxxxxxxx
     AS???    xxxxxxxxxx
  3. AS7922   po-306-1210-rur902.  2.0%   100   14.0  14.5   7.8  26.3   2.7
        po-306-1209-rur901.westroxbury.ma.boston.comcast.net
     AS7922   po-306-1209-rur901.westroxbury.ma.boston.comcast.net
  4. AS7922   po-2-rur902.westrox  2.0%   100   12.8  15.0   9.3  36.3   3.2
        po-200-xar02.westroxbury.ma.boston.comcast.net
     AS7922   po-200-xar02.westroxbury.ma.boston.comcast.net
  5. AS7922   be-334-ar01.needham  1.0%   100   15.6  16.7   7.1  34.6   4.1
        po-200-xar02.westroxbury.ma.boston.comcast.net
     AS7922   po-200-xar02.westroxbury.ma.boston.comcast.net
  6. AS7922   be-334-ar01.needham  4.0%   100   17.3 351.1  11.7 7020. 1246.4
        be-1003-pe02.onesummer.ma.ibone.comcast.net
     AS7922   be-1003-pe02.onesummer.ma.ibone.comcast.net
        be-1005-pe11.onesummer.ma.ibone.comcast.net
     AS7922   be-1005-pe11.onesummer.ma.ibone.comcast.net
  7. AS7922   be-1003-pe02.onesum 62.0%   100   21.8 649.4   9.9 5022. 1497.3
        be-1005-pe11.onesummer.ma.ibone.comcast.net
     AS7922   be-1005-pe11.onesummer.ma.ibone.comcast.net
  8. AS???    ???                 100.0   100    0.0   0.0   0.0   0.0   0.0
  9. AS???    ???                 100.0   100    0.0   0.0   0.0   0.0   0.0
 10. AS???    ???                 100.0   100    0.0   0.0   0.0   0.0   0.0
 11. AS???    ???                 100.0   100    0.0   0.0   0.0   0.0   0.0
 12. AS14618  ec2-3-224-227-198.c 66.0%   100   30.6  29.8  22.8  39.2   3.1

MTR report from eks node in us-east-1

[root@ip-xxxxxxxxxx /]# mtr --report --report-cycles 100 -Tz docker.io
Start: 2024-06-11T17:19:07+0000
HOST: ip-xxxxxxxxxx.ec2.interna Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???    ???                 100.0   100    0.0   0.0   0.0   0.0   0.0
  2. AS14618  ec2-44-219-3-189.co  0.0%   100    1.1   1.4   0.6   3.6   0.7

EDIT: Submitted support case with docker that linked to this thread. Case ID 00106860

EDIT2: it's also worth noting that this may be more than just docker image pulls, I frequently also get disconnected from the k8s api when trying to access the cluster api itself in ap-southeast-1 from the us-east-1 area (local, not within AWS)

EDIT: I'm starting to see k8s api EOFs and Internal errors in ap-south-1 as well (no reports on image pull issues) and worth calling out this report of undersea cable cuts in vietnam https://www.reuters.com/world/asia-pacific/three-vietnams-five-undersea-internet-cables-are-down-2024-06-17/

gabrielsim · 2024-06-18T00:56:13Z

Also experiencing the same issue when pulling from a residential ISP in Singapore (M1). Some layers are quick to download whilst the others are barely moving at all.

Issue has recovered for me, pulls are quick to download again

guillemsf · 2024-06-18T05:39:37Z

Not in ap-southeast-1, this is really painful and it's even worse to not have any answer after 4 days from docker the support team.

abirdatta · 2024-06-18T05:41:37Z

pulls are still extremely slow for us in Singapore(ap-southeast-1) and Jakarta(ap-southeast-3) region. downloads of some specific layers are taking long time. things were better yesterday and over weekend.

RaveSplash · 2024-06-18T08:08:33Z

Are there any workaround for this, for example changing the ec2 instance to another region?

rolandjitsu · 2024-06-18T09:08:28Z

Same issue here. It happens locally in our office and on all GCP/AWS machines located in asia-southeast (Singapore). Any image we pull from docker.io takes hours now.

We got around it by using GCP's artifact registry for some of the images that we customised.

brianchen003 · 2024-06-18T09:26:46Z

Same issue here. It happens locally in our office and on all GCP/AWS machines located in asia-southeast-b (Singapore). Any image we pull from docker.io takes hours now.

We got around it by using GCP's artifact registry for some of the images that we customised.

noginahalchetan · 2024-06-18T09:26:53Z

Facing same issue on all my EKS clusters running in Singapore region since Friday. The problem is specific to Docker registry. For quay.io it works fine.

narthanaj · 2024-06-18T09:50:01Z

facing same issue :/

ChypherAtWork · 2024-06-18T09:51:05Z

Yes, I am facing the same issue. Does anyone have answer please share it I am stuck in between.

rolandjitsu · 2024-06-18T09:51:35Z

We should probably avoid spamming and just +1 the issue to show how many of us are facing the issue.

Yaga07 · 2024-06-18T09:52:45Z

+1

Keval-kanpariya · 2024-06-18T09:52:46Z

Same issue here +1

ManavKakani · 2024-06-18T09:53:07Z

+1

Pradipkhuman · 2024-06-18T09:56:00Z

same issue +1

ahmadfadlydziljalal · 2024-06-18T10:39:24Z

Same issue here +1

It takes forever for just pull my public repo in docker hub.

dzil@potts:~/app$ ping hub.docker.com
PING prodextdefgreen-k0uuibjyui-4ec6503f7037d339.elb.us-east-1.amazonaws.com (44.193.181.103) 56(84) bytes of data.

pacharapold · 2024-06-18T12:17:36Z

+1

liuyehcf · 2024-06-18T12:36:57Z

+1

monelgordillo · 2024-06-18T16:02:03Z

I am seeing the same issue.

1902sysad · 2024-06-19T00:45:33Z

same issue +1

kaikiat · 2024-06-19T02:57:24Z

+1

tanyudii · 2024-06-19T08:46:02Z

+1

zoechou · 2024-06-20T03:30:13Z

+1

RaveSplash · 2024-06-20T07:06:23Z

It's getting better guys !!!

freshgeek mentioned this issue Jun 15, 2024

easzlab image repo is down easzlab/kubeasz#1378

Open

monelgordillo mentioned this issue Jun 18, 2024

Secret in US region, and ECS cluster in Asia pacific region aws/amazon-ecs-agent#4209

Closed

Extremely slow image pulls in Singapore region #2390

Extremely slow image pulls in Singapore region #2390

Comments

jenademoodley commented Jun 14, 2024

Problem description

Task List

sjainaajtak commented Jun 14, 2024

guillemsf commented Jun 14, 2024 • edited Loading

binman-docker commented Jun 14, 2024

guillemsf commented Jun 14, 2024

abirdatta commented Jun 15, 2024

jenademoodley commented Jun 15, 2024 • edited Loading

kweefc commented Jun 15, 2024

NavindrenBaskaran commented Jun 15, 2024 • edited Loading

angyts commented Jun 15, 2024

claytonchew commented Jun 16, 2024

gabrielsim commented Jun 17, 2024

nsheaps commented Jun 17, 2024 • edited Loading

gabrielsim commented Jun 18, 2024 • edited Loading

guillemsf commented Jun 18, 2024 • edited Loading

abirdatta commented Jun 18, 2024 • edited Loading

RaveSplash commented Jun 18, 2024

rolandjitsu commented Jun 18, 2024

brianchen003 commented Jun 18, 2024

noginahalchetan commented Jun 18, 2024

narthanaj commented Jun 18, 2024

ChypherAtWork commented Jun 18, 2024

rolandjitsu commented Jun 18, 2024

Yaga07 commented Jun 18, 2024

Keval-kanpariya commented Jun 18, 2024

ManavKakani commented Jun 18, 2024

Pradipkhuman commented Jun 18, 2024

ahmadfadlydziljalal commented Jun 18, 2024

pacharapold commented Jun 18, 2024

liuyehcf commented Jun 18, 2024

monelgordillo commented Jun 18, 2024

1902sysad commented Jun 19, 2024

kaikiat commented Jun 19, 2024

tanyudii commented Jun 19, 2024

zoechou commented Jun 20, 2024

RaveSplash commented Jun 20, 2024

guillemsf commented Jun 14, 2024 •

edited

Loading

jenademoodley commented Jun 15, 2024 •

edited

Loading

NavindrenBaskaran commented Jun 15, 2024 •

edited

Loading

nsheaps commented Jun 17, 2024 •

edited

Loading

gabrielsim commented Jun 18, 2024 •

edited

Loading

guillemsf commented Jun 18, 2024 •

edited

Loading

abirdatta commented Jun 18, 2024 •

edited

Loading