Fail on multiple replicas for 1.09 #89

ssdowd · 2017-05-03T17:55:00Z

Using the example from Kubernetes.md, I've created a deployment with 2 replicas of the cloudsql-proxy. With versions 1.05 and 1.08, it works. With version 1.09 it fails if replicas is greater than 1 with this message

Error from server (BadRequest): container "cloudsqlproxy" in pod "cloudsqlproxy-3735439449-p58dv" is waiting to start: trying and failing to pull image

The kubernetes dashboard shows this error:

Failed to pull image "b.gcr.io/cloudsql-docker/gce-proxy:1.09": failed to register layer: rename /var/lib/docker/image/overlay/layerdb/tmp/layer-289391082 /var/lib/docker/image/overlay/layerdb/sha256/305e4867f6737b619a7ab334876d503b12fa391a3d28478752575b49d6857e69: directory not empty
Error syncing pod, skipping: failed to "StartContainer" for "cloudsqlproxy" with ErrImagePull: "failed to register layer: rename /var/lib/docker/image/overlay/layerdb/tmp/layer-289391082 /var/lib/docker/image/overlay/layerdb/sha256/305e4867f6737b619a7ab334876d503b12fa391a3d28478752575b49d6857e69: directory not empty"

Setting replicas to 1 or using version 1.08 both make it work.

Example config:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: cloudsqlproxy
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: cloudsqlproxy
    spec:
      containers:
      - image: b.gcr.io/cloudsql-docker/gce-proxy:1.09
        name: cloudsqlproxy
        command:
        - /cloud_sql_proxy
        - -dir=/cloudsql
        - -instances=project:us-central1:db-test=tcp:3306
        - -credential_file=/credentials/credentials.json
        ports:
        - name: sqlpxy-prt-wp
          containerPort: 3306
        volumeMounts:
        - mountPath: /cloudsql
          name: cloudsql
        - mountPath: /credentials
          name: service-account-token
          readOnly: true
        - mountPath: /etc/ssl/certs
          name: ssl-certs
          readOnly: true
      volumes:
      - name: cloudsql
        emptyDir:
      - name: service-account-token
        secret:
          secretName: cloudsql-instance-credentials
      - name: ssl-certs
        hostPath:
          path: /etc/ssl/certs

The text was updated successfully, but these errors were encountered:

Carrotman42 · 2017-05-03T20:22:17Z

Please check the URL that you're pulling from. It should not start with "b." Try "http://gcr.io/cloudsql-docker/gce-proxy:1.09"

…

On May 3, 2017 10:55 AM, "Sean Dowd" ***@***.***> wrote: Using the example from Kubernetes.md, I've created a deployment with 2 replicas of the cloudsql-proxy. With versions 1.05 and 1.08, it works. With version 1.09 it fails if replicas is greater than 1 with this message Error from server (BadRequest): container "cloudsqlproxy" in pod "cloudsqlproxy-3735439449-p58dv" is waiting to start: trying and failing to pull image The kubernetes dashboard shows this error: Failed to pull image "b.gcr.io/cloudsql-docker/gce-proxy:1.09": failed to register layer: rename /var/lib/docker/image/overlay/layerdb/tmp/layer-289391082 /var/lib/docker/image/overlay/layerdb/sha256/305e4867f6737b619a7ab334876d503b12fa391a3d28478752575b49d6857e69: directory not empty Error syncing pod, skipping: failed to "StartContainer" for "cloudsqlproxy" with ErrImagePull: "failed to register layer: rename /var/lib/docker/image/overlay/layerdb/tmp/layer-289391082 /var/lib/docker/image/overlay/layerdb/sha256/305e4867f6737b619a7ab334876d503b12fa391a3d28478752575b49d6857e69: directory not empty" Setting replicas to 1 or using version 1.08 both make it work. Example config: apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cloudsqlproxy spec: replicas: 2 template: metadata: labels: app: cloudsqlproxy spec: containers: - image: b.gcr.io/cloudsql-docker/gce-proxy:1.09 name: cloudsqlproxy command: - /cloud_sql_proxy - -dir=/cloudsql - -instances=project:us-central1:db-test=tcp:3306 - -credential_file=/credentials/credentials.json ports: - name: sqlpxy-prt-wp containerPort: 3306 volumeMounts: - mountPath: /cloudsql name: cloudsql - mountPath: /credentials name: service-account-token readOnly: true - mountPath: /etc/ssl/certs name: ssl-certs readOnly: true volumes: - name: cloudsql emptyDir: - name: service-account-token secret: secretName: cloudsql-instance-credentials - name: ssl-certs hostPath: path: /etc/ssl/certs — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#89>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAiy7yHt8-Sz1ruQMEkA15dD1Mqyr3iNks5r2L93gaJpZM4NPve6> .

ssdowd · 2017-05-03T21:06:25Z

Tried it both ways. Why would it work with replicas: 1 and not with replicas: 2 if the URL was incorrect?

Carrotman42 · 2017-05-05T19:10:30Z

I didn't have any good ideas why it isn't working period, so it was mostly a guess: I know that the b.gcr.io-prefixed URIs will be going away at some point so I wanted to make sure it had nothing to do with this problem.

Your efforts to narrow it down to a difference between 1.08 and 1.09 is much appreciated, especially because there is a very small number of commits that it possibly could be.

My guess now is that it's somehow related to setting the base container to Alpine 3.5 (added in aaacabb). @apelisse do you have any thoughts on how this could affect @ssdowd's setup? I don't have a lot of experience with Kubernetes but the pasted config looks sane to me.

Looking at the error message, I'm not sure what docker image the sha "305e4867f6737b619a7ab334876d503b12fa391a3d28478752575b49d6857e69" is associated with, though. I pulled all of the proxy images from 1.07 through 1.09 and both of b.gcr.io and gcr.io, and none of the images on my machine have a layer with that hash (although I could be missing something, someone with more docker/kubernetes experience may want to double check my work).

apelisse · 2017-05-05T19:36:14Z

Seems related to moby/moby#23184.

Here's my guess at what's going on:

One of the kubernetes node has an incomplete/corrupted left-over of the 1.09 image
Increasing the number of replicas means that the new pod is scheduled on that node

Suggestion: Make sure it's always failing on the same node(s), connect on the failing node(s), manually try pulling the image (make sure it fails), delete the directory that shouldn't be there, and try again.

ssdowd · 2017-05-05T20:51:07Z

@apelisse That was it - had to remove 2 directories from /var/lib/docker/image/overlay/layerdb/sha256/ to get a pull to work on 1 node. 1.09 is now working for me. Thanks.

Manual auth of cloud_sql_proxy to work around issues getting metadata by IP address

ssdowd closed this as completed May 5, 2017

yosatak pushed a commit to yosatak/cloud-sql-proxy that referenced this issue Feb 26, 2023

Merge pull request GoogleCloudPlatform#89 from dazuma/auth-sql

3c4c8af

Manual auth of cloud_sql_proxy to work around issues getting metadata by IP address

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail on multiple replicas for 1.09 #89

Fail on multiple replicas for 1.09 #89

ssdowd commented May 3, 2017

Carrotman42 commented May 3, 2017 via email

ssdowd commented May 3, 2017 •

edited

Carrotman42 commented May 5, 2017 •

edited

apelisse commented May 5, 2017

ssdowd commented May 5, 2017

Fail on multiple replicas for 1.09 #89

Fail on multiple replicas for 1.09 #89

Comments

ssdowd commented May 3, 2017

Carrotman42 commented May 3, 2017 via email

ssdowd commented May 3, 2017 • edited

Carrotman42 commented May 5, 2017 • edited

apelisse commented May 5, 2017

ssdowd commented May 5, 2017

ssdowd commented May 3, 2017 •

edited

Carrotman42 commented May 5, 2017 •

edited