Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chectl: Failed to connect to Kubernetes API. Unauthorized #15331

Open
Tracked by #20404 ...
rjbaucells opened this issue Nov 26, 2019 · 22 comments
Open
Tracked by #20404 ...

chectl: Failed to connect to Kubernetes API. Unauthorized #15331

rjbaucells opened this issue Nov 26, 2019 · 22 comments
Labels
area/chectl Issues related to chectl, the CLI of Che kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. severity/P1 Has a major impact to usage or development of the system.

Comments

@rjbaucells
Copy link

chectl fails to start server on k8s (coreos tectonic) cluster with authentication enabled.

chectl --version

chectl/0.0.20191121-next.89a1444 darwin-x64 node-v10.17.0

Steps to reproduce

chectl server:start

    → Failed to connect to Kubernetes API. Unauthorized
    👀  Looking for an already existing Che instance
 ›   Error: Failed to connect to Kubernetes API. Unauthorized

Runtime

  • kubernetes:
kubectl version

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-14T04:24:34Z", GoVersion:"go1.12.13", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6+coreos.2", GitCommit:"0c227501efd8f0c62e5f75049ad7abb5a1d801ac", GitTreeState:"clean", BuildDate:"2019-02-02T03:18:42Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

kubectl configuration file in default location: ~/.kube/config

kubectl get nodes

NAME                       AGE
k-master01.domain.local   594d
k-node01.domain.local     594d
k-node02.domain.local     594d
k-node03.domain.local     594d
k-node04.domain.local     594d
k-node05.domain.local     594d
k-node06.domain.local     594d
@rjbaucells rjbaucells added the kind/bug Outline of a bug - must adhere to the bug report template. label Nov 26, 2019
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Nov 26, 2019
@tolusha
Copy link
Contributor

tolusha commented Nov 27, 2019

@rjbaucells
Try os login before starting che server.

@benoitf benoitf added area/chectl Issues related to chectl, the CLI of Che status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering. and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Nov 27, 2019
@rjbaucells
Copy link
Author

What is os login?

@tolusha
Copy link
Contributor

tolusha commented Nov 28, 2019

I mean, at first you might need to login to your k8s cluster.

#14665

@nils-mosbach
Copy link

nils-mosbach commented Dec 11, 2019

Same here. kubectl works fine, chectl fails (unauthorized).
Kubernetes Version: v1.16.3

tried with following versions:
chectl/7.4.0
chectl/0.0.20191127-next.97b31fb

$ chectl server:start --platform=k8s --installer=helm --multiuser
[00:04:55] Verify Kubernetes API [started]
[00:04:56] Verify Kubernetes API [failed]
[00:04:56] → Failed to connect to Kubernetes API. E_K8S_API_UNAUTHORIZED - Message: must authenticate
 »   Error: Failed to connect to Kubernetes API. E_K8S_API_UNAUTHORIZED -
 »   Message: must authenticate

while digging a little deeper, this might be caused by the checkKubeApi-Function

I was using a rancher created k8s cluster.

apiVersion: v1
kind: Config
clusters:
- name: "dev"
  cluster:
    server: "https://rancher.k8s.local/k8s/clusters/c-9fvlj"
    certificate-authority-data: "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM3akNDQ\..."
- name: "dev-production-1"
  cluster:
    server: "https://10.49.70.175:6443"
    certificate-authority-data: "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN3akNDQ\ ..."

users:
- name: "dev"
  user:
    token: "kubeconfig-user-rxjjf.c-9fvlj:2ccg6s2jgjs...mw"

contexts:
- name: "dev"
  context:
    user: "dev"
    cluster: "dev"
- name: "dev-production-1"
  context:
    user: "dev"
    cluster: "dev-production-1"

current-context: "dev"

The endpoint seems to be fine, but getDefaultAccountServiceToken() is returning the default kubernetes service account. I guess in my case that might be caused by ranchers authentication handling.

https://rancher.k8s.local/k8s/clusters/c-9fvlj/healthz,
 »   eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2...
>> base 64 decode
{"alg":"RS256","kid":""}{"iss":"kubernetes/serviceaccount","kubernetes.io/serviceaccount/namespace":"default","kubernetes.io/serviceaccount/secret.name":"default-token-q7wz4","kubernetes.io/serviceaccount/service-account.name":"default","kubernetes.io/serviceaccount/service-account.uid":"95e84a44-1af1-43b2-85c8-eb4fd6c0bf93","sub":"system:serviceaccount:default:default"}�0q3S>)A�Qf5Wp�F4 �4�#F-rt
}&@�r�vh�ұB�SްW�$F�/_3IT*...

In order to get it running I changed the token to the one in my kubecfg file and everything deploys fine. token="kubeconfig-user-rxjjf.c-9fvlj:2ccg6s2jgjs...mw" instead of getDefaultAccountServiceToken()

Would it be an option to use the token given by the kubeconfig file as an alternative, if service account authentication fails (401)?

@tmpkn
Copy link

tmpkn commented Feb 11, 2020

@nils-mosbach how did you manage to change it to your own token? I've searched the code for getDefaultAccountServiceToken, but couldn't find anything.

@nils-mosbach
Copy link

nils-mosbach commented Feb 11, 2020

In die current chectl master branch its in /src/api/kube.ts on line 1071.

For testing purposes I hard coded my token instead of the function call.

async checkKubeApi() {
    const currentCluster = this.kc.getCurrentCluster()
    if (!currentCluster) {
      throw new Error('Failed to get current Kubernetes cluster: returned null')
    }

    /**
     I changed the following line to something like 
     const token = "MY_SERVICE_ACCOUNT_TOKEN";
    **/
    const token = await this.getDefaultServiceAccountToken()

    const agent = new https.Agent({
      rejectUnauthorized: false
    })
    let endpoint = ''
    try {
      endpoint = `${currentCluster.server}/healthz`
      // ...
    }
  }

Replicating the steps of getDefaultServiceAccountToken() ...

> kubectl get serviceaccounts
NAME      SECRETS   AGE
default   1         59d

> kubectl describe serviceaccounts default        
Name:                default
Namespace:           default
Labels:              <none>
Annotations:         <none>
Image pull secrets:  <none>
Mountable secrets:   default-token-q7wz4
Tokens:              default-token-q7wz4
Events:              <none>

> kubectl describe secret default-token-q7wz4
Name:         default-token-q7wz4
Namespace:    default
Labels:       <none>
Annotations:  field.cattle.io/projectId: c-9fvlj:p-d648t
              kubernetes.io/service-account.name: default
              kubernetes.io/service-account.uid: 95e84a44-1af1-43b2-85c8-eb4fd6c0bf93

Type:  kubernetes.io/service-account-token

Data
====
ca.crt:     1017 bytes
namespace:  7 bytes
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9....

While I was writing this, the function getDefaultServiceAccountToken got refactored in the current master. So I pulled the latest release 0.0.20200211-next.1744186, but still the same.

> chectl server:start --platform k8s --multiuser --self-signed-cert --domain k8s.local --chenamespace dev
 × Verify Kubernetes API
    → Failed to connect to Kubernetes API. E_K8S_API_UNAUTHORIZED - Message: must authenticate
    👀  Looking for an already existing Eclipse Che instance
 »   Error: Failed to connect to Kubernetes API. E_K8S_API_UNAUTHORIZED - Message: must authenticate

@r9labs
Copy link

r9labs commented Mar 18, 2020

Is there a way to have this utilize the RBAC construct, or at least inherit the token from the kubeconfig file? I'm encountering this issue too

@mgiammarco
Copy link

Hi,
I am using Rancher too and I have the same problem.
I have tried to modify the file kube.js (I suppose transpiled from kube.ts but I get again must authenticate error.
Can you help me?

@tolusha tolusha added severity/P1 Has a major impact to usage or development of the system. and removed status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering. team/deploy labels Apr 19, 2020
@tolusha tolusha added this to the Backlog - Deploy milestone Apr 19, 2020
@mgiammarco
Copy link

Do I need to encode the secret with base64?

@mgiammarco
Copy link

Can someone kindly help me?
It is a month that I have built a k8s/rancher cluster just to install Eclipse/CHE and I have not reached that goal.

@moster33
Copy link

moster33 commented May 1, 2020

Can someone kindly help me?
It is a month that I have built a k8s/rancher cluster just to install Eclipse/CHE and I have not reached that goal.

me too

@Nephelo
Copy link

Nephelo commented May 3, 2020

I'm running into the same issue, installing che on Rancher.

As far as I understand the main issue is how Rancher proxies the kube-api server. @nils-mosbach pointed to the code of chectl, which queries the token of the default service account in the default namespace. This token is used by chectl to authenticate to the kube-api Server. Unfortunately this API token is only valid for "internal" requests to the API server.

Rancher adds additional authentication and authorization mechanisms in front of the Kube API server. Therefore the tokens are validated by rancher (not by the Kube API server of the cluster). Since the service account token is not known by Rancher, the request of chectl is not forwarded to the internal Kube API server.

If possible in your environemnt try to directly access the internal Kube API server (e.g. by adding an additional NodePort Service to the Kubernetes API). At least for installing che.

@tolusha
Copy link
Contributor

tolusha commented May 5, 2020

We've added --skip-kubernetes-health-check flag to skip that kind of pre-flight check.
So, please update to the latest version:
chectl update next
and try again.

@tolusha tolusha removed this from the Backlog - Deploy milestone May 6, 2020
@Nephelo
Copy link

Nephelo commented May 13, 2020

@tolusha Thanks! That solved my issue.

@asthomasdk
Copy link

I am also running into this issue. However, in my case, I am connecting to the k8s cluster using client certificates with a named user instead of the default accounts.

I am able to access and work with the cluster using kubectl from the command line without any problems. But using chectlany commands to check namespaces, check cluster health or other things using the default access is failing.

Is there a way I can ensure that chectl uses the correct context to get around this?
Right now I am unable to install Che on the customer cluster I am working on.

The customer cluster is using kubernetes 1.15, and this is not an issue, based on my testing on my own 1.15 based cluster.

@asthomasdk
Copy link

I have tried skipping the health check - but it seems other cluster access commands are failing (like checking if the che namespace exists - which it does as I need to have it available with a certificate created before I start the installation.

@asavin-cl
Copy link

I also have the same issue, I use PKS to connect to Kubernetes Cluster.

@tolusha
Copy link
Contributor

tolusha commented Jul 28, 2020

@asavin-cl
Which version of chectl do you use?
Have you tried --skip-kubernetes-health-check flag ?

@moster33
Copy link

Can someone kindly help me?
It is a month that I have built a k8s/rancher cluster just to install Eclipse/CHE and I have not reached that goal.

Hi, I installed in rancher as flow step:
1.Enabled rancher "Authorized Cluster Endpoint"
2.connect rancher via kubectl use context k8s master config
3.run kubectl proxy
-->service start on 127.0.0.1:8001
4.add context to kubectl config file like this

- cluster:
    certificate-authority-data: DATA+OMITTED
    server: http://localhost:8001

5.open new cmd window
6.switch kubectl context to new added in step 4
6.excute chectl commond to install che

@che-bot
Copy link
Contributor

che-bot commented Feb 24, 2021

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 24, 2021
@tolusha tolusha added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 25, 2021
@tolusha tolusha mentioned this issue Jul 7, 2021
24 tasks
@tolusha tolusha mentioned this issue Jul 26, 2021
31 tasks
@tolusha tolusha added sprint/next team/deploy and removed lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. labels Aug 17, 2021
@tolusha tolusha mentioned this issue Sep 3, 2021
29 tasks
@tolusha tolusha added this to the 7.37 milestone Sep 8, 2021
@tolusha tolusha mentioned this issue Sep 27, 2021
27 tasks
@tolusha tolusha modified the milestones: 7.37, 7.38 Sep 28, 2021
@tolusha tolusha mentioned this issue Oct 18, 2021
25 tasks
@tolusha tolusha removed this from the 7.38 milestone Oct 20, 2021
@che-bot
Copy link
Contributor

che-bot commented Apr 19, 2022

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2022
@tolusha tolusha added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/chectl Issues related to chectl, the CLI of Che kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests