Pods test failures after master upgrade 0.19.3 → 0.21.2 #11355

mbforbes · 2015-07-16T02:18:10Z

Conditions

e2e test version: 0.19.3
nodes version: 0.19.3
master version: 0.21.2

Tests

[FLAKE] Pods should be restarted with a docker exec "cat /tmp/health" liveness probe
[FLAKE] Pods should not be restarted with a docker exec "cat /tmp/health" liveness probe
[FLAKE] Pods should be restarted with a /healthz http liveness probe
[FAIL] Pods should be updated

Flake history

### Output #### Pods should be restarted with a docker exec "cat /tmp/health" liveness probe

Pod "liveness-exec" is forbidden: no API token found for service account
e2e-test-5d05436c-2a5a-11e5-916e-42010af01555/default, retry after the token
is automatically created and added to the service account

/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/
go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/pods.go:451
creating pod liveness-exec
Expected error:
    <*errors.StatusError | 0xc208bcac80>: {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {SelfLink: "", ResourceVersion: ""},
            Status: "Failure",
            Message: "Pod \"liveness-exec\" is forbidden: no API token found for service account
e2e-test-5d05436c-2a5a-11e5-916e-42010af01555/default, retry after the token
is automatically created and added to the service account",
            Reason: "Forbidden",
            Details: {
                Name: "liveness-exec",
                Kind: "Pod",
                Causes: nil,
                RetryAfterSeconds: 0,
            },
            Code: 403,
        },
    }
    Pod "liveness-exec" is forbidden: no API token found for service account
e2e-test-5d05436c-2a5a-11e5-916e-42010af01555/default, retry after the token
is automatically created and added to the service account
not to have occurred

Pods should not be restarted with a docker exec "cat /tmp/health" liveness probe

Pod "liveness-exec" is forbidden: no API token found for service
account e2e-test-10b92925-2b3f-11e5-8ace-42010af01555/default, retry after the token
is automatically created and added to the service account

/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/
go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/pods.go:477
creating pod liveness-exec
Expected error:
    <*errors.StatusError | 0xc20965a280>: {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {SelfLink: "", ResourceVersion: ""},
            Status: "Failure",
            Message: "Pod \"liveness-exec\" is forbidden: no API token found for service
account e2e-test-10b92925-2b3f-11e5-8ace-42010af01555/default, retry after the token
is automatically created and added to the service account",
            Reason: "Forbidden",
            Details: {
                Name: "liveness-exec",
                Kind: "Pod",
                Causes: nil,
                RetryAfterSeconds: 0,
            },
            Code: 403,
        },
    }
    Pod "liveness-exec" is forbidden: no API token found for service
account e2e-test-10b92925-2b3f-11e5-8ace-42010af01555/default, retry after the token
is automatically created and added to the service account
not to have occurred

Pods should be restarted with a /healthz http liveness probe

Pod "liveness-http" is forbidden: no API token found for service account
e2e-test-50513bd8-2a76-11e5-948c-42010af01555/default, retry after the token
is automatically created and added to the service account

/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/pods.go:504
creating pod liveness-http
Expected error:
    <*errors.StatusError | 0xc2081ffe00>: {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {SelfLink: "", ResourceVersion: ""},
            Status: "Failure",
            Message: "Pod \"liveness-http\" is forbidden: no API token found for service account e2e-test-1969a2b3-2a04-11e5-9a2a-42010af01555/default, retry after the token is automatically created and added to the service account",
            Reason: "Forbidden",
            Details: {
                Name: "liveness-http",
                Kind: "Pod",
                Causes: nil,
                RetryAfterSeconds: 0,
            },
            Code: 403,
        },
    }
    Pod "liveness-http" is forbidden: no API token found for service account e2e-test-1969a2b3-2a04-11e5-9a2a-42010af01555/default, retry after the token is automatically created and added to the service account

Pods should be updated

may not update fields other than container.image

/go/src/github.com/GoogleCloudPlatform/kubernetes/output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/pods.go:338
failed to update pod: Pod "pod-update-08769420-2b44-11e5-b3e2-42010af01555" is invalid: spec: invalid value '{Volumes:[{Name:default-token-7sezo VolumeSource:{HostPath: EmptyDir: GCEPersistentDisk: AWSElasticBlockStore: GitRepo: Secret:<>(0xc209b18b30){SecretName:default-token-7sezo} NFS: ISCSI: Glusterfs: PersistentVolumeClaim: RBD:}}] Containers:[{Name:nginx Image:gcr.io/google_containers/nginx:1.7.9 Command: Args: WorkingDir: Ports:[{Name: HostPort:0 ContainerPort:80 Protocol:TCP HostIP:}] Env: Resources:{Limits:map[cpu:100m] Requests:map[]} VolumeMounts:[{Name:default-token-7sezo ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount}] LivenessProbe:<_>(0xc209d2da70){Handler:{Exec: HTTPGet:<*>(0xc20898c190){Path:/index.html Port:8080 Host: Scheme:HTTP} TCPSocket:} InitialDelaySeconds:30 TimeoutSeconds:1} ReadinessProbe: Lifecycle: TerminationMessagePath:/dev/termination-log ImagePullPolicy:IfNotPresent SecurityContext:}] RestartPolicy:Always TerminationGracePeriodSeconds: ActiveDeadlineSeconds: DNSPolicy:ClusterFirst NodeSelector: ServiceAccountName: NodeName:gke-gke-upgrade-13e30713-node-xg2o HostNetwork:false ImagePullSecrets:}': may not update fields other than container.image

(#11343 improves this error message)

The text was updated successfully, but these errors were encountered:

liggitt · 2015-07-16T02:45:05Z

Is the failing e2e code at the 19.3 level? If so, it's probably missing the e2e fixes in #10523 which should help the liveness-exec check

For the last one, I think the rejected spec mutation might be the scheme in the liveness probe defaulting to HTTP when it was previously empty (added in #9965)

liggitt · 2015-07-16T02:58:22Z

the rejected spec mutation might be the scheme in the liveness probe defaulting to HTTP

Nevermind, the defaulting would have happened in the old podspec as well... so there wouldn't be a diff. I can see now the submitted podspec serviceAccountName is empty, because the 19.3 e2e client code didn't know about it. Here's the sequence:

pod is created by 19.3 e2e client code, probably with no service account
master service account admission sets pod.spec.serviceAccountName to default
19.3 e2e client code gets the pod, tries to update it, and in the process drops the serviceAccountName field (because its v1 serialization has serviceAccount instead)
master ValidatePodUpdate code fails, because it sees an update trying to change serviceAccountName from "default" to ""

zmerlynn · 2015-07-16T03:37:02Z

Yes, I believe the failing e2e code is at the 0.19.3 level as well. I think @mbforbes is reattempting pivoting the e2e versions after the master upgrades as well, but we might run into other issues if the nodes are version skewed. (My hope is fewer.)

bgrant0607 · 2015-07-16T16:28:40Z

Filed #11380 for the serviceAccount issue.

bgrant0607 · 2015-07-16T18:15:58Z

Since a couple issues have popped up already, I decided to look through API changes between 0.19 and 0.20:
https://github.com/GoogleCloudPlatform/kubernetes/commits/master/pkg/api/v1/types.go

pod.status.reason: Additive. kubectl will just print less information if it isn't populated.
ObservedGeneration: I believe the client code should work when this field defaults to 0
PersistentVolumeClaim: Go change, not a json change.

So hopefully there aren't more incompatibilities lurking.

roberthbailey · 2015-07-16T18:24:56Z

Thanks for checking Brian. With luck we won't need more patch releases into the 0.20 branch.

mbforbes · 2015-07-16T21:20:02Z

Important: running a "pure" 0.19.3 (master, nodes, and e2e code all at 0.19.3) has zero flakes for the three flaky pod tests above.

This seems to imply that #10523 didn't prevent these, and the 0.21.2 master is causing the flakes. Could the flakes also be related to the serviceAccount → serviceAccountName issue?

liggitt · 2015-07-16T21:35:41Z

The field name change caused the pod update failure.

The liveness exec failures were caused by the e2e code in that test not using the standard method for constructing a test namespace, which meant the e2e test wasn't waiting for the namespace to be ready before creating pods. In 19.3, the admission plugin would let pods get created before their service account and API token was ready, but that was fixed post 19.3

mbforbes · 2015-07-16T23:01:38Z

@liggitt but by that logic, wouldn't all 0.19.3 e2es see the same flakiness in those tests? In #11355 (comment) I'm describing that we have a Jenkins job (pictured) that's running pure 0.19.3 and never seeing those tests flake. (Sorry if I'm confused—attempting to understand!)

liggitt · 2015-07-17T05:07:15Z

@mbforbes admission control changed since 19.3 to require the service account token to exist before admitting a pod. Prior to that, pods that didn't make use of the token (like the liveness-exec pod) were getting admitted when they shouldn't have, but didn't notice because they never attempted to use the API token they were supposed to have available.

In a pure 19.3 env, liveness-exec pods get admitted (incorrectly) before the namespace is ready, but didn't fail their test because they weren't depending on anything service account token related.

In a 19.3 e2e against a 20.0+ master, liveness-exec pods get rejected (correctly) because the namespace's service accounts and their tokens aren't finished initializing yet. The liveness-exec e2e test in particular needed to use the common method to correctly get a test namespace. The e2e test fixes in https://github.com/GoogleCloudPlatform/kubernetes/pull/10523/files#diff-92d176a1025dcbee0981bb7f16cda942 are applicable.

mbforbes · 2015-07-17T05:36:20Z

@liggitt ahhhhh I understand now. Thank you so much for being patient with me.

n1603 · 2015-08-01T04:44:57Z

can you please tell what is the solution to this. I am seeing same

kubectl create -f nginx.yml
Error from server: error when creating "nginx.yml": Pod "nginx" is forbidden: no API token found for service account default/default, retry after the token is automatically created and added to the service account

liggitt · 2015-08-01T04:51:21Z

@n1603 make sure you're starting the apiserver and controller manager with the service account arguments needed to auto-generate service account tokens. See local-up-cluster.sh for an example.

n1603 · 2015-08-01T05:58:08Z

thanks, I am using fedora22 atomic. apiserver and controller manager are running. i must be missing someting, but not sure,

n1603 · 2015-08-02T05:07:34Z

any further suggestions on fixing this ?

liggitt · 2015-08-02T17:21:04Z

Can you show the command lines options you are using to start the apiserver and controller manager?

n1603 · 2015-08-03T02:33:32Z

Used below command to start services

for SERVICES in docker kube-proxy.service kubelet.service etcd.service kube-apiserver.service kube-controller-manager.service kube-scheduler.service; do systemctl restart $SERVICES; systemctl enable $SERVICES; systemctl status $SERVICES ; done

Here is how "kube-apiserver.service" looks like in my install
#cat "/usr/lib/systemd/system/kube-apiserver.service"

[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
EnvironmentFile=-/etc/kubernetes/config
EnvironmentFile=-/etc/kubernetes/apiserver
User=kube
ExecStart=/usr/bin/kube-apiserver
$KUBE_LOGTOSTDERR
$KUBE_LOG_LEVEL
$KUBE_ETCD_SERVERS
$KUBE_API_ADDRESS
$KUBE_API_PORT
$KUBELET_PORT
$KUBE_ALLOW_PRIV
$KUBE_SERVICE_ADDRESSES
$KUBE_ADMISSION_CONTROL
$KUBE_API_ARGS
Restart=on-failure
Type=notify
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

$cat "/usr/lib/systemd/system/kube-controller-manager.service"

[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
EnvironmentFile=-/etc/kubernetes/config
EnvironmentFile=-/etc/kubernetes/controller-manager
User=kube
ExecStart=/usr/bin/kube-controller-manager
$KUBE_LOGTOSTDERR
$KUBE_LOG_LEVEL
$KUBE_MASTER
$KUBE_CONTROLLER_MANAGER_ARGS
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

liggitt · 2015-08-03T19:29:33Z

@n1603 looks like the systemd unit files include the ServiceAccount admission controller without specifying the needed signing key. Not sure what to do about that, since those files don't really have a setup script that can create that key...

To get your setup working, you can do the same thing local-up-cluster.sh is doing:

Generate a signing key:

openssl genrsa -out /tmp/serviceaccount.key 2048

Update /etc/kubernetes/apiserver:

KUBE_API_ARGS="--service_account_key_file=/tmp/serviceaccount.key"

Update /etc/kubernetes/controller-manager:

KUBE_CONTROLLER_MANAGER_ARGS="--service_account_private_key_file=/tmp/serviceaccount.key"

n1603 · 2015-08-04T16:45:08Z

This works, thanks a lot.

hw-qiaolei · 2015-08-09T09:27:47Z

@liggitt thanks for your answer, I encountered the same issue and get solved by following your steps.

irfanjs · 2016-09-28T12:21:01Z

hi,
i followed the following steps :
1: executed command : openssl genrsa -out /tmp/serviceaccount.ket 2048
2; modified the /etc/kubernetes/apiserver file to add following :

KUBE_API_ARGS="--service_account_key_file=/tmp/serviceaccount.key"
3: modified the /etc/kubernetes/controller-manager and add following:
KUBE_CONTROLLER_MANAGER_ARGS="--service_account_private_key_file=/tmp/serviceaccount.key"

4: restarted the kube-apiserver and kube-controller-manager services to restart the services
service kube-apiserver restart
service kube-controller-manager restart

when i run the command : kubectl get events getting following error

No API token found for service account "default" retry after the token is automatically created and added to the service account

How the environment is setup :
i followed the kubernetes the hard way : [https://github.com/kelseyhightower/kubernetes-the-hard-way].
one kube master
one kube node
both these servers are on CentOS 7.2
kubectl version : server version : 1.3 and client version : 1.3

kubectl get pods command does not return anything
kubectl get nodes return the hostname of kube node server and the status is ready.

please suggest what could be additional checks i need to do to get it resolved.
Regards

liggitt · 2016-09-28T13:29:05Z

there's a typo in your openssl command above, I assume the file was actually created without the typo

what service account and secrets show up in your namespace?

kubectl get serviceaccounts
kubectl get secrets

irfanjs · 2016-09-28T17:47:36Z

thanks Liggitt.
i did not understand what that typo is. i created the key file exactly as per the command. please let me know
command : kubectl get secrets does not return anything
command : kubectl get serviceaccounts returns default`as an account and secrets is 0ards

regards

liggitt · 2016-09-28T17:53:28Z

the typo was "serviceaccount.ket" vs "serviceaccount.key"

do you have the logs from the controllermanager?

irfanjs · 2016-09-28T19:02:33Z

sorry to say but i dont have /var/log/kube-controller-manager.log file

i ran the command : journalctl -u -f kube-controller-manager and it gives the same error "No API ...."

irfanjs · 2016-09-29T02:22:42Z

please suggest.
regards

irfanjs · 2016-09-30T12:42:50Z

this is resolved. i created secrets manually and attach it to "default" service account. after that, restarts the controller manager and pods started running.
thanks again all.

kewaike · 2017-02-03T02:43:12Z

@irfanjs,how to create the secrets and attach it to "default" service account ?

xaidanwang · 2017-10-24T02:13:25Z

I also want know how to create the secrets and attach it to "default" service account ?

kundanatre · 2018-06-06T11:40:50Z

Service Account show 0 secrets attached with it.

NAME      SECRETS   AGE
default   0         6d

Is this problem resolved in kubernetes 1.10?

Still it is complaining for API Token, how does API token gets generated?
do we need to generate it manually as it mentioned by @liggitt.
Even though I have followed same steps by providing correct params "--service-account-private-key-file" and "--service-account-key-file" in respective files; it does not make any difference.

openssl genrsa -out /etc/kubernetes/serviceaccount.key 2048

In apiserver config file

KUBE_API_ARGS="--service-account-key-file=/etc/kubernetes/secret/serviceaccount.key"
KUBE_ADMISSION_CONTROL="--admission-control=NamespaceLifecycle,LimitRanger,ResourceQuota"

Or this apporach

KUBE_API_ARGS="--service-account-key-file=/etc/kubernetes/secret/serviceaccount.key"
KUBE_ADMISSION_CONTROL="--admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota

In controller-manager config file
KUBE_CONTROLLER_MANAGER_ARGS="--service-account-private-key-file=/etc/kubernetes/secret/serviceaccount.key --allocate-node-cidrs=true --attach-detach-reconcile-sync-period=1m0s --cluster-cidr=100.64.0.0/16 --cluster-name=k8s.virtual.local --leader-elect=true --kubeconfig=/etc/kubernetes/kubeConfig --service-cluster-ip-range=100.65.0.0/24"

systemctl restart kube-controller-manager

systemctl restart kube-apiserver

vi ReplicationController.yaml

apiVersion: v1
kind: ReplicationController
metadata:
  name: nginx
spec:
  replicas: 2
  selector:
    app: nginx
  template:
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80

kubectl create -f ReplicationController.yaml

kubectl describe rc nginx

Name:         nginx
Namespace:    kube-system
Selector:     app=nginx
Labels:       app=nginx
Annotations:  <none>
Replicas:     0 current / 2 desired
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=nginx
  Containers:
   nginx:
    Image:        nginx
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type             Status  Reason
  ----             ------  ------
  ReplicaFailure   True    FailedCreate
Events:
  Type     Reason        Age                      From                    Message
  ----     ------        ----                     ----                    -------
  Warning  FailedCreate  <invalid> (x18 over 8m)  replication-controller  Error creating: No API token found for service account "default", retry after the token is automatically created and added to the service account

j-ibarra · 2018-06-21T02:31:58Z

@liggitt thaks, I encountered the same issue and get solved by following your steps. but now the

kubectl logs busybox-9b9f89599-w5ln4

error: You must be logged in to the server (the server has asked for the client to provide credentials ( pods/log busybox-9b9f89599-w5ln4))

mbforbes added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 16, 2015

mbforbes mentioned this issue Jul 16, 2015

Rename pod.spec.serviceAccount -> pod.spec.serviceAccountName for v1 #10081

Merged

davidopp added team/master sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jul 16, 2015

bgrant0607 mentioned this issue Jul 16, 2015

Add serviceAccount to retain backward compatibility with 0.19 #11380

Closed

This was referenced Jul 17, 2015

Automated cherry pick of #11389 #11417

Merged

Automated cherry pick of #11389 #11418

Merged

mbforbes closed this as completed Jul 17, 2015

biswars mentioned this issue Aug 18, 2015

Heapster does not get the data pulled from the kubernetes source kubernetes-retired/heapster#277

Closed

liggitt mentioned this issue Aug 24, 2015

Pods are not created with present setup of kubernetes-1.0.0 in atomicapp/dev projectatomic/adb-atomic-developer-bundle#75

Closed

liggitt mentioned this issue Aug 24, 2015

Updates kube-apiserver configuration for KUBE_ADMISSION_CONTROL projectatomic/adb-atomic-developer-bundle#76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods test failures after master upgrade 0.19.3 → 0.21.2 #11355

Pods test failures after master upgrade 0.19.3 → 0.21.2 #11355

mbforbes commented Jul 16, 2015

liggitt commented Jul 16, 2015

liggitt commented Jul 16, 2015

zmerlynn commented Jul 16, 2015

bgrant0607 commented Jul 16, 2015

bgrant0607 commented Jul 16, 2015

roberthbailey commented Jul 16, 2015

mbforbes commented Jul 16, 2015

liggitt commented Jul 16, 2015

mbforbes commented Jul 16, 2015

liggitt commented Jul 17, 2015

mbforbes commented Jul 17, 2015

n1603 commented Aug 1, 2015

liggitt commented Aug 1, 2015

n1603 commented Aug 1, 2015

n1603 commented Aug 2, 2015

liggitt commented Aug 2, 2015

n1603 commented Aug 3, 2015

liggitt commented Aug 3, 2015

n1603 commented Aug 4, 2015

hw-qiaolei commented Aug 9, 2015

irfanjs commented Sep 28, 2016

liggitt commented Sep 28, 2016

irfanjs commented Sep 28, 2016

liggitt commented Sep 28, 2016

irfanjs commented Sep 28, 2016

irfanjs commented Sep 29, 2016

irfanjs commented Sep 30, 2016

kewaike commented Feb 3, 2017

xaidanwang commented Oct 24, 2017

kundanatre commented Jun 6, 2018 •

edited

j-ibarra commented Jun 21, 2018

Pods test failures after master upgrade 0.19.3 → 0.21.2 #11355

Pods test failures after master upgrade 0.19.3 → 0.21.2 #11355

Comments

mbforbes commented Jul 16, 2015

Conditions

Tests

Flake history

Pods should not be restarted with a docker exec "cat /tmp/health" liveness probe

Pods should be restarted with a /healthz http liveness probe

Pods should be updated

liggitt commented Jul 16, 2015

liggitt commented Jul 16, 2015

zmerlynn commented Jul 16, 2015

bgrant0607 commented Jul 16, 2015

bgrant0607 commented Jul 16, 2015

roberthbailey commented Jul 16, 2015

mbforbes commented Jul 16, 2015

liggitt commented Jul 16, 2015

mbforbes commented Jul 16, 2015

liggitt commented Jul 17, 2015

mbforbes commented Jul 17, 2015

n1603 commented Aug 1, 2015

liggitt commented Aug 1, 2015

n1603 commented Aug 1, 2015

n1603 commented Aug 2, 2015

liggitt commented Aug 2, 2015

n1603 commented Aug 3, 2015

liggitt commented Aug 3, 2015

n1603 commented Aug 4, 2015

hw-qiaolei commented Aug 9, 2015

irfanjs commented Sep 28, 2016

liggitt commented Sep 28, 2016

irfanjs commented Sep 28, 2016

liggitt commented Sep 28, 2016

irfanjs commented Sep 28, 2016

irfanjs commented Sep 29, 2016

irfanjs commented Sep 30, 2016

kewaike commented Feb 3, 2017

xaidanwang commented Oct 24, 2017

kundanatre commented Jun 6, 2018 • edited

j-ibarra commented Jun 21, 2018

kundanatre commented Jun 6, 2018 •

edited