Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test route url fix #66

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions Gopkg.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions Gopkg.toml
Original file line number Diff line number Diff line change
Expand Up @@ -153,9 +153,9 @@ required = [
name = "github.com/prometheus/procfs"
revision = "6ed1f7e1041181781dd2826d3001075d011a80cc"

[[override]]
[[constraint]]
name = "github.com/sirupsen/logrus"
version = "0.11.5"
version = "v1.4.2"

[[override]]
name = "github.com/spf13/pflag"
Expand Down
33 changes: 17 additions & 16 deletions pkg/controller/che/k8s_helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,20 @@ import (
"context"
"crypto/tls"
"encoding/pem"
"io"
"net/http"
"time"

"errors"
orgv1 "github.com/eclipse/che-operator/pkg/apis/org/v1"
"github.com/eclipse/che-operator/pkg/deploy"
"github.com/eclipse/che-operator/pkg/util"
routev1 "github.com/openshift/api/route/v1"
"github.com/sirupsen/logrus"
"io"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/fields"
"k8s.io/apimachinery/pkg/watch"
"k8s.io/client-go/kubernetes"
"net/http"
"sigs.k8s.io/controller-runtime/pkg/client/config"
)

Expand Down Expand Up @@ -287,23 +286,24 @@ func (cl *k8s) GetDeploymentPod(name string, ns string) (podName string, err err
// There's an easier way which is to read tls secret in default (3.11) or openshift-ingress (4.0) namespace
// which however requires extra privileges for operator service account
func (r *ReconcileChe) GetEndpointTlsCrt(instance *orgv1.CheCluster, url string) (certificate []byte, err error) {
testRoute := &routev1.Route{}
var requestURL string
var testRoute *routev1.Route
if len(url) < 1 {
testRoute = deploy.NewTlsRoute(instance, "test", "test", 8080)
logrus.Infof("Creating a test route %s to extract router crt", testRoute.Name)
if err := r.CreateNewRoute(instance, testRoute); err != nil {
logrus.Errorf("Failed to create test route %s: %s", testRoute.Name, err)
return nil, err
testRoute = r.GetEffectiveRoute(instance, "test")
if testRoute == nil {
testRoute = deploy.NewTlsRoute(instance, "test", "test", 8080)
logrus.Infof("Creating a test route %s to extract router crt", testRoute.Name)
if err := r.CreateNewRoute(instance, testRoute); err != nil {
logrus.Errorf("Failed to create test route %s: %s", testRoute.Name, err)
return nil, err
}
}
// sometimes timing conditions apply, and host isn't available right away

if len(testRoute.Spec.Host) < 1 {
sleshchenko marked this conversation as resolved.
Show resolved Hide resolved
time.Sleep(time.Duration(1) * time.Second)
testRoute := r.GetEffectiveRoute(instance, "test")
requestURL = "https://" + testRoute.Spec.Host
return nil, errors.New("Unable to get test route host for fetching certificate")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to be sure that the operator won't restart processing the custom resource immediately, you could also return the following:

return reconcile.Result{Requeue: true, RequeueAfter: time.Second * 1}, err

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm not able to return reconcile.Result without modifying the signature. It seems to be a helper class and return here seems not good, at least there is no such any method in this class.

Do you think it is OK to send reconcile.Result back though k8s_helper.go#GetEndpointTlsCrt -> create.go#CreateTLSSecret -> che_controller.go#Reconcile ?

Or the following fragment in che_controller would be good enough?

if err := r.CreateTLSSecret(instance, "", "self-signed-certificate"); err != nil {
  return reconcile.Result{Requeue: true, RequeueAfter: time.Second * 1}, err
}

But it would mean that we wait one second on any error, like failed to get namespace, failed to create route, failed to fetch host...
Then why we do not wait 1 second in other places like: when we failed to create service account https://github.com/eclipse/che-operator/blob/9682f3448fe240c216aa22d9d3f56cc056b01494/pkg/controller/che/che_controller.go#L315-L317

BTW Seems that even with return reconcile.Result{}, err there is ~ 1 second between k8s_helper.go#GetEndpointTlsCrt invocations
Screenshot_20190905_104935
the first one - is the first invocation
2nd and 3rd are the second invocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm not able to return reconcile.Result without modifying the signature.

Oh, yes, I didn't pay attention to the fact it was inside a utility function.
You might simply add a retryLater boolean return value, and use it in the che_controller reconcile function.

Then why we do not wait 1 second in other places like: when we failed to create service account

Well the existing logic in the controller (that was already there when I started on it) seems to be the following:

  • return only an error when it is an unexpected error,
  • return a Result (with retry) + error when it's an error that we expect in some scenarios and on which we explicitly want to restart the reconciliation after a while.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that we should retry if quite unexpected exception occurred like failed to create route?
It's unexpected by maybe k8s/openshift API was temporary unavailable...
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if the CreateTLSSecret function returns continueLater with true, then the calling reconcile method should return reconcile.Result{Requeue: true, RequeueAfter: time.Second * 1}, err else it should return nil, err if an unexpected error occured.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I understand it, let me rephrase my question:
CreateTLSSecret may returns error in different situations:

  1. Failed to get an existing self-signed secret.
  2. Failed to get existing route (like when API is not available)
  3. Failed to create new route (like when API is not available)
  4. The fetched route does not have host (it's clear that we should retry in such case)
  5. Failed to request route host to retrieve TLS certificate
  6. Failed to create a secret with fetched TLS self-signed certificate.

Which of this errors you consider as unexpected and which expected (like 4 about route) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me you would only use continueLater=true in situation 4. We know that in some cases the host can take some time to be added to the route object. It's expected.

Possibly 5 could also be expected (and return continueLater=true, since sometimes the route may take some time to be setup.

But main point is the 4 IMO

Does it seem consistent to you ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this PR brings some fix and does not do code base worse.
Your proposed solution totally makes sense but it's more about refactoring the current architecture but not about fixing the issue I initially tried to use.
Sorry, but I'm not able to invest now in operator architecture.

@davidfestal If you agree that this PR is OK to merge - please approve.
If you think that it brings more issues than solves - feel free to close it.

}
requestURL = "https://" + testRoute.Spec.Host

requestURL = "https://" + testRoute.Spec.Host
} else {
requestURL = url
}
Expand All @@ -329,11 +329,12 @@ func (r *ReconcileChe) GetEndpointTlsCrt(instance *orgv1.CheCluster, url string)
certificate = append(certificate, crt...)
}

if len(url) < 1 {
if testRoute != nil {
logrus.Infof("Deleting a test route %s to extract routes crt", testRoute.Name)
if err := r.client.Delete(context.TODO(), testRoute); err != nil {
logrus.Errorf("Failed to delete test route %s: %s", testRoute.Name, err)
}
}
logrus.Info("Self-signed certificate is automatically fetched from test route")
return certificate, nil
}

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 16 additions & 4 deletions vendor/github.com/sirupsen/logrus/alt_exit.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions vendor/github.com/sirupsen/logrus/doc.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading