-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Route DNS hostnames not routeable in airgap scenario so che fails to start #15187
Comments
Alternatively, you may be able to override these by setting |
The same problem happens much sooner, when installing with TLS with self-signed certificate - While extracting the certificate, operator creates temporary route, but this route is not accessible:
|
@rhopp It may be a fix for the issue you mentioned eclipse-che/che-operator@db15bdb but I'm not sure |
I've been able to succesfully start che-server using k8s internal dns name of the keycloak service (in my case But then (as expected) dashboard wasn't able to load (with typical message |
@rhopp @tomgeorge Would it be possible to check with OpenShift teams whether it is expected that typical airgaped OpenShift 4.2 installations would not allow PODs to access external routes ? |
By default, on AWS, GCP, and Azure, if cluster DNS zone configuration was provided to the OpenShift installer, OpenShift will manage wildcard DNS records for ingress in the configured zones (assuming ingress is being exposed by a LoadBalancer Service, which is the default on those platforms.) On other platforms, or if cluster DNS zone configuration is omitted, wildcard DNS records for ingress are not managed and it's up to the cluster owner to configure DNS to expose ingress (if desired.) I hope that helps clarify some of the DNS management behavior. I can provide more specific details if someone can help me understand how the problematic clusters are being created (e.g. through the OpenShift installer IPI flow, UPI, etc.) |
Thanks to @ironcladlou for looking into this with me. The issue appears to be that the traffic is rejected by the LB or when on the way back to the node. We should look at the way this cluster was configured in QE and see if it matches the installation procedure in https://docs.openshift.com/container-platform/4.2/installing/installing_restricted_networks/installing-restricted-networks-aws.html |
Response from Jianlin Liu, who has knowledge of how the cluster is configured:
|
If I understand the setup correctly, if you really want to use Routes on an internal subnet (i.e. routes can be accessed only within the private subnet), with OpenShift 4.2 you can try replacing the default ingresscontroller with an internally-scoped variant that provisions the LB on the cluster's private subnet, e.g.
(See these Kubernetes docs for more detail on how this works) |
@rhopp @jianlinliu could you please clarify what is the expected & recommended OCP installation setup/config in the airgap mode regarding DNS / LB? If I understand correctly we face this issue since the dns resolution of routes on the QA cluster is happening in public internet and the only way to communicate is using sevicename + port combo. What I do not understand is how come OCP in the |
This document is the best thing that we have for airgap/restricted network installations: https://docs.openshift.com/container-platform/4.2/installing/installing_restricted_networks/installing-restricted-networks-aws.html The issue is not actually the DNS resolution but rather that there is no route for traffic to exit the cluster and return through the ELB. After looking at the templates in http://git.app.eng.bos.redhat.com/git/openshift-misc.git/plain/v3-launch-templates/functionality-testing/aos-4_2/hosts/upi_on_aws-cloudformation-templates/ it looks like it is using Route53 for DNS resolution. I went through the cloudformation templates that are used in this installation and compared them from the ones in the docs and found that the only differences were in the VPC/Networking configuration. The documented cloudformation stack had:
The template used in cluster provisioning did not have these resources. Additionally, the template used in installation had:
Could the lack of aws
So it seems like a difference in configuration from the documented way, and the behavior of AWS ELB's where the traffic must leave the AWS network and come back in. I wonder how hard it would be to refactor che to use service hostnames wherever possible, and keep the public-facing route to client-side things? |
Latest info from my side: |
PR with docs update has been merged - eclipse-che/che-docs#944 merged. Closing |
Describe the bug
Depending on the network topology or DNS servers, a fully disconnected installation in some instances will not be able to resolve route URLs inside the cluster. This manifests in an issue with the Che server pod trying to retrieve the openid configuration at
$PUBLIC_KEYCLOAK_URL/auth/realms/che/.well-known/openid-configuration
.I don't know exactly how OpenShift does DNS in different environments. I would think that in-cluster traffic would be able to resolve a route properly, but it does not appear to be the case in all scenarios.
curl $KEYCLOAK_ROUTE_URL/auth/realms/che/.well-known/openid-configuration
times out, butcurl keycloak.namespace.svc:8080/auth/realms/che/.well-known/openid-configuration
succeedsChe version
Steps to reproduce
Start a Che installation in a disconnected environment.
Expected behavior
Runtime
kubectl version
)oc version
)minikube version
andkubectl version
)minishift version
andoc version
)docker version
andkubectl version
)Screenshots
Installation method
Environment
Additional context
The text was updated successfully, but these errors were encountered: