Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict the installation of Eclipse Che only to the 'eclipse-che' namespace #17187

Closed
2 of 5 tasks
ibuziuk opened this issue Jun 18, 2020 · 36 comments
Closed
2 of 5 tasks
Labels
area/install Issues related to installation, including offline/air gap and initial setup kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. kind/task Internal things, technical debt, and to-do tasks to be performed. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. severity/P1 Has a major impact to usage or development of the system. status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach

Comments

@ibuziuk
Copy link
Member

ibuziuk commented Jun 18, 2020

Is your task related to a problem? Please describe.

The only namespace for Eclipse Che installation should be 'eclipse-che'. It should be prohibited to install Eclipse Che to any other namespace.

Describe the solution you'd like

When Eclipse Che is installed via Operatorhub it should be possible to install it only in the 'eclipse-che' namespace

List of subtasks:

Describe alternatives you've considered

Continue allowing namespace selection during the installation

Additional context

Pros:

Cons:

  • it will be allowed to have only 1 Eclipse Che per cluster - the following flow will not be possible - Restrict the installation of Eclipse Che only to the 'eclipse-che' namespace #17187 (comment)
  • we need to tackle upgrades:
    • if Eclipse Che was initially installed NOT in the 'eclipse-che' namespace after the upgrade it should be in the 'eclipse-che' (previous namespace should be deleted)
    • if there are multiple Eclipse Che installed on the cluster upgrades should fail with the clear error message.
@ibuziuk ibuziuk added kind/task Internal things, technical debt, and to-do tasks to be performed. area/install Issues related to installation, including offline/air gap and initial setup severity/P1 Has a major impact to usage or development of the system. labels Jun 18, 2020
@ibuziuk
Copy link
Member Author

ibuziuk commented Jun 23, 2020

@sleshchenko could you please clarify how do you do the trick with the installation of the web terminal operator to the concrete namespace?

@nickboldt
Copy link
Contributor

I trust there will be a way to override (set the specific namespace) and disable (not restrict) so that we can:

  • set a different default for CRW vs. Che
  • disable telemetry for our own tests (we're not the customer, we're just testing internally)
  • allow installing multiple instances (again for testing since we don't have an unlimited # of clusters into which we can install ONLY one CRW instance)

WDYT?

@davidfestal
Copy link
Contributor

@ibuziuk the web terminal operator is using the AllNamespace install mode. This makes it a global operator that will be installed by OperatorHub in the openshift-operators namespace by default.

cf this doc: https://docs.openshift.com/container-platform/4.4/operators/olm-adding-operators-to-cluster.html#olm-installing-from-operatorhub-using-web-console_olm-adding-operators-to-a-cluster

Mainly the following quote:

All namespaces on the cluster (default) installs the Operator in the default openshift-operators namespace to watch and be made available to all namespaces in the cluster. This option is not always available.

@ibuziuk ibuziuk added the status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach label Jun 24, 2020
@l0rd
Copy link
Contributor

l0rd commented Jun 24, 2020

@ibuziuk something I was discussing this morning with @davidfestal: we may use AllNamespaces for Che as well and then deploy wsmaster in eclipse-che namespace.

@nickboldt has raised some interesting points. I don't think we should make the namespace configurable though:

  • set a different default for CRW vs. Che

This can be a build option but not a runtime option (i.e. users/customers should not change that).

  • disable telemetry for our own tests (we're not the customer, we're just testing internally)

I think the recommendation here is to receive telemetry even for our own tests. They can be useful, we need to verify that sending metrics works fine and they are easy to filter out. cc @spaparaju

  • allow installing multiple instances (again for testing since we don't have an unlimited # of clusters into which we can install ONLY one CRW instance)

If would discourage this: why do we want to tests scenarios that nobody will ever test? To find, analyse and fix bugs that nobody will ever find? We should not support/test multiple instances of Che on the same Kubernetes cluster. cc @rhopp

@rhopp
Copy link
Contributor

rhopp commented Jun 24, 2020

If would discourage this: why do we want to tests scenarios that nobody will ever test? To find, analyse and fix bugs that nobody will ever find? We should not support/test multiple instances of Che on the same Kubernetes cluster. cc @rhopp

I would love to avoid running multiple instances on the same cluster, but as Nick said, we don't have enough resources for doing so. If this will become hard-restriction (now it's soft restriction in sense it is possible to install multiple instances on single cluster, but it's discouraged or officially not supported) it will be quite a hurdle for QE to overcome.

@l0rd
Copy link
Contributor

l0rd commented Jun 24, 2020

I would love to avoid running multiple instances on the same cluster, but as Nick said, we don't have enough resources for doing so. If this will become hard-restriction (now it's soft restriction in sense it is possible to install multiple instances on single cluster, but it's discouraged or officially not supported) it will be quite a hurdle for QE to overcome.

If we want user to restrict users to one namespace and one instance of Che per cluster we should enforce this restriction when testing as well. @rhopp is this an upstream or downstream tests constraint? Is there an issue that describes the problem (i.e. in what circumstances we need to run multiple instances of Che and what options have been considered)?

@tolusha
Copy link
Contributor

tolusha commented Jul 7, 2020

Sorry, the Pros aren't convincing enough.

@ibuziuk ibuziuk self-assigned this Jul 7, 2020
@ibuziuk
Copy link
Member Author

ibuziuk commented Jul 7, 2020

@tolusha could you clarify a clear use-case (with configuration) for having 2+ manageable instancies of Eclipse Che on the same cluster (instances for testing are not taken into account)?

@tolusha
Copy link
Contributor

tolusha commented Jul 8, 2020

@ibuziuk
I don't have any use-cases except for testing purpose.
For this point of view it make sense to me

@skabashnyuk
Copy link
Contributor

@tolusha could you clarify a clear use-case (with configuration) for having 2+ manageable instancies of Eclipse Che on the same cluster (instances for testing are not taken into account)?

Che-server development.

@ibuziuk
Copy link
Member Author

ibuziuk commented Jul 8, 2020

@skabashnyuk Che-server development e.g. usage of the che-dev OSD v4 Cluster case?

@skabashnyuk
Copy link
Contributor

@skabashnyuk Che-server development e.g. usage of the che-dev OSD v4 Cluster case?

I'm sorry. I didn't understand your question.

@ibuziuk
Copy link
Member Author

ibuziuk commented Jul 8, 2020

@skabashnyuk could you clarify your point about Che-server development in regards to the fixed namespace enforcement?

@skabashnyuk
Copy link
Contributor

@ibuziuk you probably right. This topic is about che + operator. che-server development does not require it.

@l0rd
Copy link
Contributor

l0rd commented Jul 8, 2020

@ibuziuk you probably right. This topic is about che + operator. che-server development does not require it.

Right: we want to avoid that a user can select any namespace when deploying Che using the operator or chectl. But the che-server should be able to run in any namespace. It should NOT give for granted that it will run in eclipse-che namespace.

There are at least 3 reasons why we want deployments to happen in one fixed namespace:

  • To avoid multiple instances of Che in the same cluster (to avoid conflicts and waste of resources)
  • To avoid the common deployment failure when it gets installed in default namespace
  • To retrieve operand metrics when installed on OpenShift

@ibuziuk this is a change that needs to be communicated on different channels (che-dev, Red Hat internal mailing lists etc...) before it becomes effective.

@ibuziuk
Copy link
Member Author

ibuziuk commented Jul 9, 2020

@ibuziuk this is a change that needs to be communicated on different channels (che-dev, Red Hat internal mailing lists etc...) before it becomes effective.

@l0rd got it. As the initial step we are going to define ''eclipse-che" as the suggested namespace on the operator / olm end:

image

This should target the second bullet (avoid the common deployment failure when it gets installed in default namespace) and make the overall UX cleaner and more predictable
PR eclipse-che/che-operator#328

@l0rd
Copy link
Contributor

l0rd commented Jul 9, 2020

@ibuziuk nice!

@nickboldt
Copy link
Contributor

More reasons to reconsider this as a new standard / use cases for more than one CRW install on the same cluster:

  • prototyping (trying different settings/configs in parallel, eg., with and without oauth, or with/out external SSO/postgres)
  • cautious migration (eg., 2.1 and 2.2 in parallel to verify nothing explodes or regresses in the new release)
  • new customer experiments
  • QE testing (we only have a small set of available OCP instances – one per version)
  • OCP and CRW load testing (how many CRW instances can we run on the same cluster before it consumes all the resources and crashes?) – this is handy for validating new architectures too, such as Z and Power

https://issues.redhat.com/browse/CRW-467?focusedCommentId=14209983&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14209983

Also,

is this an upstream or downstream tests constraint?

It's both -- we use the 3 QE OCP instances for testing Che and CRW alike.

@nickboldt nickboldt modified the milestones: 7.16, 7.19 Jul 9, 2020
@ibuziuk
Copy link
Member Author

ibuziuk commented Jul 9, 2020

@nickboldt if there are no blockers/showstoppers during the implementation we are going to enable this for 7.18.0

@ibuziuk ibuziuk added the kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. label Jul 10, 2020
@ibuziuk ibuziuk modified the milestones: 7.19, Backlog - Epics Jul 10, 2020
@nickboldt
Copy link
Contributor

Another use case suggested by Dom on the Community Call:

  • one cluster contains BOTH a stable deployment (eg., 7.14.x), and
  • a test deployment (7.16.0-SNAPSHOT)

@SDAdham
Copy link

SDAdham commented Jul 13, 2020

thanks @nickboldt 👍 , can we please list the cons as well to this if possible?

@ibuziuk
Copy link
Member Author

ibuziuk commented Jul 15, 2020

@SDAdham added to the description. Could you clarify the use-case with stable and snapshot (nightly?) deployment a bit more.
How the installation is managed and updated? How the test deployments is used? What config is used for this setup

@l0rd
Copy link
Contributor

l0rd commented Jul 15, 2020

@nickboldt @SDAdham @ibuziuk we should discourage scenarios with 2 versions of CRW (stable and test) installed via the operator on the same cluster because both will share the same CheCluster CRD and that may have some disruptive side effects on the stable instance.

@l0rd
Copy link
Contributor

l0rd commented Jul 15, 2020

We should continue supporting multiple instances of Che on the same cluster using helm though.

@ibuziuk
Copy link
Member Author

ibuziuk commented Jul 15, 2020

@l0rd just to clarify by helm support of multiple instances, do you mean chectl + helm installer?

@ibuziuk ibuziuk changed the title [operator] Restrict the installation of Eclipse Che only to the 'eclipse-che' namespace Restrict the installation of Eclipse Che only to the 'eclipse-che' namespace Jul 15, 2020
@SDAdham
Copy link

SDAdham commented Jul 16, 2020

@SDAdham added to the description. Could you clarify the use-case with stable and snapshot (nightly?) deployment a bit more.
How the installation is managed and updated? How the test deployments is used? What config is used for this setup

I don't see any reason to limit the installations to one instance per cluster. I'll speak for K8S deployments as that's what I'm using che on, but I'm sure the concept should remain the same for openshift, etc...

Use case: Running prod and dev/test environment of Che on the same cluster.

In regards to how-tos, I'm not part of the architecture of che to know the best approach, but I can imagine:

  • Che is deployed per namespace.
  • Master node of Che which is on for example Che namespace should be responsible of following up and taking care of the rest of it's own workspaces by managing their namespaces.
  • Managed deployment like chectl should be keeping a record of Master node (Che instance) as it already does afaik, and the master node should be taking care of it's workspaces created by users.

If on K8S, a namespace can't make changes to another namespaces (i.e master node che running in namespace che cannot make changes to workspaces running in xyz-che namespaces. Then chectl should request for the namespaces to upgrade via the master node

Any custom installation other than managed installation like chectl should be not be supported and not taken care into the consideration for the auto-update procedures.

@SDAdham
Copy link

SDAdham commented Jul 16, 2020

@nickboldt @SDAdham @ibuziuk we should discourage scenarios with 2 versions of CRW (stable and test) installed via the operator on the same cluster because both will share the same CheCluster CRD and that may have some disruptive side effects on the stable instance.

What "disruptive side effects" can happen as a result of having multiple instances?

@l0rd
Copy link
Contributor

l0rd commented Jul 16, 2020

@SDAdham fields of the CheCluster CRD can be added/removed/updated during an update and that may make one of your 2 CheCluster CR incompatible with the new CRD. In other words, you can isolate your Che instances in 2 different namespaces, but the operator Custom Resource Definition is defined at cluster level: if you update it both Che instances will be affected.

@ibuziuk yes helm charts installed via chectl: no CRD, no cluster privileges required, configurable namespace

@RickJWagner
Copy link
Contributor

+1 for @nickboldt ideas about the need for more than one instance per cluster.
Users with a running CRW instance will often want to 'look at' the upgrade version before adapting it. (Testing the new version, User Acceptance period, etc.) Users would not be happy having to blindly trust us with the new version-- they are going to want a way to install a second CRW/Che version beside the first.
It should be noted that OCP clusters are expensive, some users have only 1 at their disposal. (Also, some users are restricted to just one cluster because of authentication mechanisms, etc.)

@rhopp
Copy link
Contributor

rhopp commented Aug 25, 2020

@l0rd @ibuziuk Is there any chance there will be some "backdoor" option to select different namespace as per Nick's and Rick's suggestions? (It would greatly help QE as well)

@l0rd
Copy link
Contributor

l0rd commented Oct 5, 2020

As discussed a few weeks back with @RickJWagner a good compromise for users that are cautious about updates is to implement operand rollback in case of unsuccessful upgrade. Hence this issue is blocked by #18043.

@ibuziuk ibuziuk removed their assignment Oct 12, 2020
@nickboldt
Copy link
Contributor

I don't see any explicit mention of rollback in #18043

Will that be implemented as a new chectl command too? Seems like it might be useful & logical to have UI features implemented as CLI features too. cc: @tolusha

@che-bot
Copy link
Contributor

che-bot commented May 4, 2021

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 4, 2021
@che-bot che-bot closed this as completed May 18, 2021
@benoitf benoitf added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 18, 2021
@benoitf benoitf reopened this May 18, 2021
@ibuziuk
Copy link
Member Author

ibuziuk commented Nov 16, 2021

I think this issue should be closed due to the all-namespace installation mode switch - #20602 / #20614

@ibuziuk
Copy link
Member Author

ibuziuk commented Nov 19, 2021

Closing as outdated

@ibuziuk ibuziuk closed this as completed Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/install Issues related to installation, including offline/air gap and initial setup kind/epic A long-lived, PM-driven feature request. Must include a checklist of items that must be completed. kind/task Internal things, technical debt, and to-do tasks to be performed. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. severity/P1 Has a major impact to usage or development of the system. status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach
Projects
None yet
Development

No branches or pull requests