Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to present the Operator Framework to the TOC as an incubating project #303

Merged
merged 2 commits into from Jul 9, 2020

Conversation

@erinaboyd
Copy link
Contributor

@erinaboyd erinaboyd commented Oct 2, 2019

Please let us know if there is a preferred SIG to present to first.
Thanks in advance!

@jbeda
Copy link
Contributor

@jbeda jbeda commented Oct 2, 2019

Hi Eryn!

Some quick questions on stuff that is missing from the proposal.

  • What is the governance model for the project? How are new maintainers decided? How is the roadmap/direction/scope of the project decided?
  • What is the level of contribution outside of Red Hat?
  • Usually for projects at the incubating level we will want to look at usages of it in production. But with this I think it is more appropriate to look at projects that take a dependency on it and make sure it is working for them. Do you have a list of users? Can the give us data on their experience?
  • Does this also include the operatorhub.io web site? Would we be looking to move the hosting and management of that under open governance/management?

Thanks!

Loading

@amye amye added this to To do (no presentation yet) in Initial Project Triage & Sandbox Projects Backlog Oct 3, 2019
@caniszczyk caniszczyk added this to To do (no presentation yet) in TOC Project Backlog 2019 Q3 Oct 4, 2019
@quinton-hoole
Copy link
Contributor

@quinton-hoole quinton-hoole commented Oct 4, 2019

I believe that this falls within the scope of SIG App Delivery and should be channelled through there.

Loading

@chris-short
Copy link

@chris-short chris-short commented Oct 7, 2019

Hi @jbeda,

Can you clarify the ask here a little?

Do you have a list of users? Can the give us data on their experience?

Are you looking for a list of users you can verify experiences with? I'm a little confused on this one.

Loading

@chris-short
Copy link

@chris-short chris-short commented Oct 7, 2019

Also, for the wider group, the intent is 100% to move operatorhub.io into CNCF. But, how does the infrastructure handoff work? Build pipelines, storage, etc. all have to be maintained somewhere. Is there a model to follow here? We understand this will take some time to hand off but, we can start working that piece now if it helps.

Loading

@lucperkins
Copy link
Contributor

@lucperkins lucperkins commented Oct 7, 2019

@chris-short I would recommend getting in touch with the Helm folks for more info on that. Helm Hub strikes me as a directly analogous model to follow on that. There is definitely precedent for it and hasn’t presented any significant institutional hurdles (to my knowledge).

Loading

@caniszczyk
Copy link
Contributor

@caniszczyk caniszczyk commented Oct 7, 2019

@chris-short each project does it their own way, depending on the size, if it's more complex something like helm hub or https://github.com/kubernetes/community/tree/master/wg-k8s-infra are good models to follow

Loading

@jbeda
Copy link
Contributor

@jbeda jbeda commented Oct 7, 2019

Wrt users -- I'm thinking looking at operators that use the SDK and are production ready. Seeing wide usage across projects/companies would be an indication, IMO, of "production ready". Not sure how other folks on the TOC would see this.

Loading

@garethr
Copy link

@garethr garethr commented Oct 9, 2019

A quick way of tracking some (ie. public on GitHub) usage

https://github.com/search?utf8=%E2%9C%93&q=github.com%2Foperator-framework%2Foperator-sdk+extension%3Amod+filename%3Ago&type=Code&ref=advsearch&l=&l=

Currently 354 instances of any version of github.com/operator-framework/operator-sdk in go.mod files.

Note tracking this over time might be more instructive than the absolute number.

Loading

@robszumski
Copy link

@robszumski robszumski commented Oct 10, 2019

The link I have used in the past is below. I know that Go module handling has changed recently that throws this off.

https://github.com/search?l=Go&q=github.com%2Foperator-framework%2Foperator-sdk%2Fpkg%2Fsdk&type=Code

"Showing 2,788 available code results" but that does double or triple count some projects.

Loading

@dmueller2001
Copy link

@dmueller2001 dmueller2001 commented Oct 10, 2019

github search doesn't account for enterprises building with operator-sdk in private repos for internal only use

Loading

@dmesser
Copy link

@dmesser dmesser commented Oct 11, 2019

This signature captures Go-based Operator-SDK projects: https://github.com/search?q=github.com%2Foperator-framework%2Foperator-sdk+filename%3Ago.mod+filename%3AGopkg.toml+filename%3Aglide.yaml&type=Code - it does not depend on Go modules, which the SDK only moved to in one of the recent 0.9.x releases

To find additional Helm-based Operator-SDK projects: https://github.com/search?q=filename%3ADockerfile+helm-operator&type=Code

And additional Ansible-based Operator-SDK projects: https://github.com/search?q=filename%3ADockerfile+ansible-operator&type=Code

Loading

@amye amye moved this from To do (no presentation yet) to In progress (due diligence/presentation) in Initial Project Triage & Sandbox Projects Backlog Oct 15, 2019
@flickerfly
Copy link

@flickerfly flickerfly commented Oct 22, 2019

We're building a number of internal operators on Ansible that are not yet out in the open. Certainly appreciate the operator-sdk tool for that. It has allowed us to go from general concept to working prototype rapidly and this has been a great way to build excitement as people actually see what can be done.

We're also deploying an internal version of OperatorHub.io with just our operators. That was surprisingly simple to do, allows us to be sure our Operators or moving toward a place of public consumption and eases internal documentation desires. We're planning to look into what rebranding can be done on that soon so it isn't too confusing, but still provides a familiar feel.

Loading

@caniszczyk
Copy link
Contributor

@caniszczyk caniszczyk commented Nov 5, 2019

Do you have rough idea on the costs of hosting OperatorHub currently... maybe RPS and other useful data for estimating the infrastructure needed for CNCF

Loading

@erinaboyd
Copy link
Contributor Author

@erinaboyd erinaboyd commented Nov 5, 2019

Loading

@dmesser
Copy link

@dmesser dmesser commented Nov 7, 2019

  1. Will the Hub host non-operator framework operators?

Right now it requires the packaging format for correct display and easy installation. Once we have a discussion and consensus in upstream (e.g. sig-app-delivery) around a common metadata format for Operators we should relax that - maybe at the expense of the quick install method.

  1. How does the OLM work with non-framework operators?

OLM currently requires the CSV and externally shipped CRDs for management. How the Operator is created, e.g. which programming language, is not important. We have ideas around adding support for other packaging formats (CNAB, helm, plain manifests) and are putting code in place that allows us to be flexible in the near future.

  1. What is the current community accepted definition of an operator?

A custom controller with its own CRDs.

Loading

@devdattakulkarni
Copy link

@devdattakulkarni devdattakulkarni commented Nov 8, 2019

We believe that out of the four things that constitute Operator framework - Operator SDK, operatorhub.io, OLM, Operator Metering - SDK and hub have definite value to the community. We had done analysis of open source Operators a while back in which Operator SDK was seen to be used quite a lot (https://medium.com/@cloudark/analysis-of-open-source-kubernetes-operators-f6be898f2340). Similarly, operatorhub provides a good central place with searching capabilities for discovering 3rd-party Operators.

We are not so sure about OLM, since Operator packaging and installation are already covered by tools like Helm. The CSV CRD defined by OLM mandates a specific format for Operator installation definition. It is not clear why that is the best way of defining Operator packaging. It is also questionable whether there is really a need for new format for Operator installation definition, given Operator is just a pattern leveraging Kubernetes extensibility. Kubernetes YAMLs along with tools like Helm already cover the requirements for Operator packaging and installation. This also aligns with our observations in the field - Operator writers are not defining CSVs for their Operators to the extent that they naturally do Helm charts.

Given this observation a thought/suggestion is - may be include only Operator SDK as an incubating project and not the entire Operator framework. Note Operator Hub currently is tied to OLM.

Loading

@garethr
Copy link

@garethr garethr commented Nov 8, 2019

Loading

@dmesser
Copy link

@dmesser dmesser commented Nov 9, 2019

We are not so sure about OLM, since Operator packaging and installation are already covered by tools like Helm. The CSV CRD defined by OLM mandates a specific format for Operator installation definition. It is not clear why that is the best way of defining Operator packaging. It is also questionable whether there is really a need for new format for Operator installation definition, given Operator is just a pattern leveraging Kubernetes extensibility. Kubernetes YAMLs along with tools like Helm already cover the requirements for Operator packaging and installation. This also aligns with our observations in the field - Operator writers are not defining CSVs for their Operators to the extent that they naturally do Helm charts.

Thanks for this analysis and consideration. OLM does come with a packaging method today which is currently revamped and made open to other formats and tools, like Helm. Packaging is however just one of the responsibilities OLM fulfills. A main use case is to manage the lifecycle of potentially many Operators on shared clusters, in order to allow tenants to use them safely. Since Operators are cluster-wide extensions there is a need for safeguards that fences these extensions of each other and make them discoverable to end users. At the same time cluster admins want to rely on over-the-air updates to Operators being long-running workloads, which is where collision prevention is needed on ownership of APIs and Webhooks - all of which are core parts of every Operator. OLM has been born out of the need for a component that does this on-cluster and scales to dozens of Operators installed, updated and used there.

Loading

@devdattakulkarni
Copy link

@devdattakulkarni devdattakulkarni commented Nov 12, 2019

We are not so sure about OLM, since Operator packaging and installation are already covered by tools like Helm. The CSV CRD defined by OLM mandates a specific format for Operator installation definition. It is not clear why that is the best way of defining Operator packaging. It is also questionable whether there is really a need for new format for Operator installation definition, given Operator is just a pattern leveraging Kubernetes extensibility. Kubernetes YAMLs along with tools like Helm already cover the requirements for Operator packaging and installation. This also aligns with our observations in the field - Operator writers are not defining CSVs for their Operators to the extent that they naturally do Helm charts.

Thanks for this analysis and consideration. OLM does come with a packaging method today which is currently revamped and made open to other formats and tools, like Helm. Packaging is however just one of the responsibilities OLM fulfills. A main use case is to manage the lifecycle of potentially many Operators on shared clusters, in order to allow tenants to use them safely. Since Operators are cluster-wide extensions there is a need for safeguards that fences these extensions of each other and make them discoverable to end users. At the same time cluster admins want to rely on over-the-air updates to Operators being long-running workloads, which is where collision prevention is needed on ownership of APIs and Webhooks - all of which are core parts of every Operator. OLM has been born out of the need for a component that does this on-cluster and scales to dozens of Operators installed, updated and used there.

Thanks for clarifying the additional responsibilities that OLM performs. While the points about orchestrating the set of Operators in a cluster is valid (since CRDs/Operators are cluster-wide control points), the key question is - is it necessary to apply the reconciliation loop based approach for managing the lifecycle of Operators themselves? While it is appealing - an Operator to manage other Operators - CRDs are not the only objects in Kubernetes that are cluster scoped and whose creation and management needs to be carefully handled by cluster admins. Today, cluster admins are predominantly using Helm for installing all types of objects
(cluster-scoped or per-namespace). Has there been any thought about working with Helm community to evolve a generic approach towards managing cluster scoped objects? My concern is OLM will end up reinventing lot of the things that Helm is already doing. For end users (cluster admins) it is better if there is a standard approach for managing any cluster-scoped object.

Loading

@jzelinskie
Copy link

@jzelinskie jzelinskie commented Nov 14, 2019

I've worked on both Helm and OLM.

The CSV CRD defined by OLM mandates a specific format for Operator installation definition. It is not clear why that is the best way of defining Operator packaging. It is also questionable whether there is really a need for new format for Operator installation definition, given Operator is just a pattern leveraging Kubernetes extensibility.

A large problem that OLM is attempting to solve is dependency resolution. Because CRDs (and Operator deployments) can be cluster-scoped, it is possible to have incompatible CRDs installed (CRD versions used to only be cosmetic). Managing OLM catalogs is closer in practice to repository mirrors for Linux distributions where every version presented in a catalog is compatible with each other. This is in contrast to Helm, which is moving towards a global "namespace" where you can obtain charts from any registry similar to container images. OLM's CSV format actually inspired the upstream Application Definition and is extremely similar barring the data used for dependency resolution. A goal of the OLM team has been to eventually converge with the community on this.

Catalogs can largely be considered orthogonal to the core controller that manages lifecycle events of other controllers. It is optional behavior and used to be encapsulated in a completely separate binary.

Has there been any thought about working with Helm community to evolve a generic approach towards managing cluster scoped objects? My concern is OLM will end up reinventing lot of the things that Helm is already doing.

If you refer to the design discussion around Helm v3 and CRD management, you'll find that the lifecycle related work for CRDs/Operators is explicitly out of scope until the community has experimented enough that Helm feels confident in moving forwards with something. From the Helm perspective, OLM is a testbed for CRD lifecycle management. From the OLM perspective, Helm is a future for packaging content.

is it necessary to apply the reconciliation loop based approach for managing the lifecycle of Operators themselves?

It is clearly an idea worth pursuing from the perspective of Red Hat the portion of Operator authors that have packaged their software with OLM. Even the aforementioned App Definition, which was designed not to use a controller, now does. I think it could be a notable goal that something like OLM wouldn't need to be a controller in the future because Kubernetes will eventually be able to provide the level of resource validation. For now, I'd consider this a compromise for practicality's sake.

Loading

@gerred
Copy link
Contributor

@gerred gerred commented Nov 20, 2019

Thanks @garethr! I had an initial response, but want to read and understand this whole thread and discuss with the KUDO team - I’ll link the issue once we open it.

Loading

@robszumski
Copy link

@robszumski robszumski commented Dec 3, 2019

Capturing some of the questions for longer answers from the TOC call on Dec 3:

Is it an operator if it has no reconcile loop?
Plans for other SDKs?
lots of potential for other SDKs, python, java for example..
Totally extensible SDK model
Would love to see roadmap for next 6 months
Reacting to monitoring alerts is not really defined today

Loading

@erinaboyd
Copy link
Contributor Author

@erinaboyd erinaboyd commented Mar 24, 2020

@brendandburns @kgamanji polite nudge for status, please.

Loading

@brendandburns
Copy link

@brendandburns brendandburns commented Mar 24, 2020

Loading

@brendandburns
Copy link

@brendandburns brendandburns commented Apr 2, 2020

@erinaboyd and others

First, many apologies for the length of time that it has take to get this far, this has ended up twistier and thornier than I think we imagined it would be, and also there have been other distractions, but I know this has been frustrating for folks involved.

The ToC has decided, given the different discussions and our general concern about how we think about various different hubs that we would prefer to have operator framework join CNCF as a Sandbox project rather than an Incubating project.

We have likewise recommended that whatever work is done on CNCF Hub that it also be done as a open sandbox project.

Placing these projects in Sandbox provides the opportunities for them to be involved with the CNCF while the ToC figures out how it wants to approach the technical, community and design requirements that are needed for *Hub type projects.

Please let me know if there are any questions about this.

Loading

@dmesser
Copy link

@dmesser dmesser commented Apr 2, 2020

@brendandburns Thanks for getting back. Given the concerns are mostly around OperatorHub, what do you think about the following scenario: OperatorHub becomes a sandbox project and as you propose, the community will take it from there with the ToCs efforts around coordinating the various hub type projects. Meanwhile, OLM and SDK as projects are well isolated from OperatorHub, so they could admitted as incubators as proposed?

Loading

@erinaboyd
Copy link
Contributor Author

@erinaboyd erinaboyd commented Apr 2, 2020

@brendandburns thank you for your feedback! I am deferring to the operator community (@dmesser ) above as a request on how to move forward. I think it's a good compromise. Please let us know if that is possible.
Thanks!

Loading

@brendandburns
Copy link

@brendandburns brendandburns commented Apr 3, 2020

@dmesser if you are interested in splitting OperatorSDK and Operator Lifecycle Manager out into separate projects we would be willing to consider them for incubation. They would need to go through the standard due-diligence process with the SIGs (I think SIG-App-Delivery would probably be the right SIG to review and due-diligence)

Loading

@dmesser
Copy link

@dmesser dmesser commented Apr 6, 2020

@brendandburns Understood. SDK and OLM have already been reviewed by sig-app-delivery and have undergone the due diligence as part of the submission of OF as a whole. sig-app-delivery also sponsored and voted for the inclusion of OF. Would we need to repeat this process, despite none of the proposal aspects have changed, other than OperatorHub aiming for sandbox whereas the rest looks at incubating?

Loading

@erinaboyd
Copy link
Contributor Author

@erinaboyd erinaboyd commented Apr 7, 2020

@brendandburns bubbling this back up, tia!

Loading

@brendandburns
Copy link

@brendandburns brendandburns commented Apr 7, 2020

ok, given that, we can probably take this forward to the ToC. Is there a ToC sponsor (sorry, I don't remember the details here)

Loading

@erinaboyd
Copy link
Contributor Author

@erinaboyd erinaboyd commented Apr 7, 2020

@brendandburns I think @kgamanji volunteered to be the sponsor.
Can you please, confirm @kgamanji ? Thanks all!

Loading

@erinaboyd
Copy link
Contributor Author

@erinaboyd erinaboyd commented Apr 15, 2020

@brendandburns @kgamanji did you guys have any more questions? Can you please provide the team an update on this when you have a chance? Thanks :)

Loading

@erinaboyd
Copy link
Contributor Author

@erinaboyd erinaboyd commented Apr 29, 2020

@brendandburns @kgamanji @lizrice cane someone please provide an update on this?

Loading

@kgamanji
Copy link

@kgamanji kgamanji commented May 1, 2020

Apologies for the delay in response. I have stepped forward to start the DD for Operator Framework sub-projects.

Just to confirm we have the same birds-eye view of the current state:

  • OperatorHub will be proposed as a sandbox project
  • SDK and OML moving forward for incubation

At this stage, it is unclear to me if SDK and OML are to be considered as separate projects or as one project with 2 sub-components?

Also, is it possible to update the DD document to reflect the current submission? e.g. some of the links to the user groups usign OF in production are 404ing

Loading

@dmesser
Copy link

@dmesser dmesser commented May 4, 2020

@kgamanji @brendandburns @lizrice
Yes, I believe the stated approach is what we came to consensus on with Brendan, though it was also at the understanding that all "hub" projects would be on hold and if I am not mistaken Helm just went through graduation. So maybe that changes things for the Operator Hub. I would like to understand more around the TOCs plan here.

Beyond that, the DD document is updated to reflect the proposals for OLM/SDK and OperatorHub.io. All 3 components really belong to the same top-level project, Operator Framework. The 404 links were due to a migration of a content platform and should be fixed now, thanks for pointing out.

Loading

@lizrice
Copy link
Contributor

@lizrice lizrice commented May 5, 2020

Here's the comment on Helm Hub where I've tried to explain the TOC position

Loading

@kgamanji
Copy link

@kgamanji kgamanji commented May 6, 2020

Thank you for sharing that @lizrice

At this stage, I am proceeding with the DD for OLM and SDK as one submission for incubation.

Also, I have reached out to end used for feedback on using OF. Will update once I have more details.

Loading

@kgamanji
Copy link

@kgamanji kgamanji commented May 15, 2020

I have completed the review of the DD doc and Operator Framework sub-projects. Thank you for all your patience throughout!

Here are my remarks:

SDK and OLM for incubation:

  • both sub-projects meet the incubation criteria, with a healthy contributor base and usage by the end-users in production environments
  • healthy number of committers
  • a roadmap / collection of issues to help the projects grow organically to meet the community requirements

Would like to have more outline on the following (if possible/applicable):
Note: I have left comments to the below in the DD doc as well

  • SDK - is there a proposal or reference on how the OCI will be adopted to as a shipping format for artifacts?
  • OLM - is there a reference to the collaboration to define cluster-addon definition?

Once, the above have been outlined I am happy to move the SDK and OML to a TOC vote.


OperatorHub for sandbox:

  • is it possible to add the code of conduct to the repository? This is one of the criteria for sandbox projects
  • once the above is fulfilled, happy to be one of the sponsor for the project (unless we adopt the new sandbox evaluation process beforehand)

Loading

@ecordell
Copy link

@ecordell ecordell commented May 15, 2020

OLM - is there a reference to the collaboration to define cluster-addon definition?

There are two projects related to cluster-addons:

These are still under discussion/review as the goal is to solve shared problems, not operator-framework problems.

SDK - is there a proposal or reference on how the OCI will be adopted to as a shipping format for artifacts?

The bundle KEP linked above addresses the packaging relationship with OCI artifacts. OCI artifacts seem like a good choice, but there are roadblocks to using them today. They are on the radar and an area of active interest (and would've been the choice if not for the issues outlined in the KEP).

Loading

@dmesser
Copy link

@dmesser dmesser commented May 18, 2020

@kgamanji Thanks for getting back. OperatorHub will adopt the CNCF Code of Conduct as part of the rest of Operator Framework. This https://github.com/operator-framework/community/pull/11/files should be merged soon.

Loading

@kgamanji
Copy link

@kgamanji kgamanji commented May 20, 2020

Thank you for the updates @ecordell @dmesser!

I have now opened the public comment period: https://lists.cncf.io/g/cncf-toc/message/4643

Loading

@amye amye moved this from In Public Comment Period to Needs TOC Vote in Incubation Projects Backlog Jun 4, 2020
@caniszczyk
Copy link
Contributor

@caniszczyk caniszczyk commented Jul 9, 2020

Operator Framework SDK and OLM sub-projects have applied to join as incubating projects

+1 Binding: note: Quorum is 10 as Jeff Brewer has been away
Katie Gamanji: https://lists.cncf.io/g/cncf-toc/message/4799
Liz Rice: https://lists.cncf.io/g/cncf-toc/message/4833
Justin Cormack: https://lists.cncf.io/g/cncf-toc/message/4834
Xiang Li: https://lists.cncf.io/g/cncf-toc/message/4835
Alena Prokharchyk: https://lists.cncf.io/g/cncf-toc/message/4849
Brendan Burns: https://lists.cncf.io/g/cncf-toc/message/4839
Michelle Noorali: https://lists.cncf.io/g/cncf-toc/message/4939

Loading

@caniszczyk caniszczyk merged commit 3acd7b5 into cncf:master Jul 9, 2020
@amye amye moved this from Needs TOC Vote to Done in Incubation Projects Backlog Jul 9, 2020
@erinaboyd
Copy link
Contributor Author

@erinaboyd erinaboyd commented Jul 28, 2020

@amye @kgamanji After much consideration we will not be moving forward with contributing the Operatorhub.io as as sandbox project. Let's close out this PR in favor of what was already voted. Thanks. cc: @dmesser @robszumski @dmueller2001

Loading

@amye amye removed this from Done in Incubation Projects Backlog Jul 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
TOC Project Backlog 2019 Q3
  
To do (no presentation yet)
Linked issues

Successfully merging this pull request may close these issues.

None yet