Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: kubernetes "cloud" #3

Closed
mkmik opened this issue Nov 30, 2020 · 14 comments
Closed

Feature request: kubernetes "cloud" #3

mkmik opened this issue Nov 30, 2020 · 14 comments

Comments

@mkmik
Copy link

mkmik commented Nov 30, 2020

Would it make sense to treat a kubernetes cluster as a "cloud" and expose information about workloads in a similar way as with aws/gcp?

@ghost
Copy link

ghost commented Nov 30, 2020

I think it makes much sense and we should put it in the near term roadmap. Can you elaborate a bit about the use-case? Also, Do you think it would make sense to connect directly to k8s API or use GKE, AWS-ECS API?

@mkmik
Copy link
Author

mkmik commented Nov 30, 2020

Disclaimer: I didn't think this through, it's just the first thing I noticed after a colleague of mine shared the link to this project.

I have the feeling that many teams (like mine) have to deal with a heterogeneous collection of workloads scattered around various clouds. Kubernetes got added to this picture but didn't replace the "legacy". There are many companies that try to offer a solution to this, a way to present a unified dashboard (I wouldn't be surprised if the company I work for does it as well, but I'm not working on that and here I speak for myself as an engineer).

Perhaps it would be useful to have a tool that allows teams to gather an up-to-date "inventory" of what's out there and perhaps build their own dashboards/abstractions on top of it.

@ghost
Copy link

ghost commented Nov 30, 2020

Thanks for the input. This is definitely one of the use-cases I had in mind to be able for teams to build custom inventories or dashboard on top of SQL tables using this tool.

What type of workloads you would like to see on k8s, clusters? pods? deployments? etc?

Also, what other proivders/integration you would like to see? I'm trying to get sense of the biggest pain-points and prioritise the features.

@jbianquetti-nami
Copy link

I would love to see stuff like

  • select label from k8s_pods where label like...
  • select * from k8s_deployments where namespace like ...
  • select * from k8s_secrets where type like...

So I guess all standard k8s object will be needed. Also, CRDs can be useful.
I guess that by taking advantage of kubectl api-resources underlying API call you can access all k8s features

@ghost
Copy link

ghost commented Dec 2, 2020

@jbianquetti-nami Thanks for describing those. The only issue that I'm having with kubernetes so far is that the underlying data is not stored in relational data but more like json. This will produce lot of tables and then a lot of joins when working with the data.

One possible solution is - limit the data to more high level data. This solution will save us from having many nested tables but we will miss on some data (maybe not very important or not frequently used).

what do you think?

@obowersa
Copy link

Happy to try and provide some information around this if I can, I know the kubernetes API pretty well ( although I've got a couple of other projects I'm wrapped up on at the moment so might not be able to contribute from a code perspective for a month or two ). , and appologies if I'm retreating old ground for folk! Just gettting my thoughts out

A great starting point is the kubernetes API schema: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/

A few things to take into account:

  • There are differences between API versions
  • You can add custom API extesnions.

The custom API extensions are an area where I could see support being super valuable in the future, especially if you are using an operator based pattern to create deployments etc.

I think a high level approach might work as an initial approach ? The difficulties with just how extensible kubernetes is, is that it would potentially be a constant moving target.

A minimal view could be the resource name, resource type ( pod, deployment, service, etc) and any labels associated with it. It wouldn't allow for drilling into further details ( such as what is the schedule of a cron job ), and might become tricky when you have resources which are high level abstractions of other resources, like deployments/cronjobs/etc containing a pod spec as a nested data structure.

Kube query does some interesting stuff around acting as a bridge between the kubernetes api and osquery ( https://github.com/aquasecurity/kube-query ) which might be useful as some inspiration.

@ghost
Copy link

ghost commented Dec 16, 2020

@obowersa Thanks. I agree that as an initial approach this could work. My main concerns here - is SQL the right database for storing highly nested k8s data - maybe we should use noSQL for that case? we can release an experimental support for k8s with SQL backend and see if this is helpful to someone.

@obowersa
Copy link

Valid question! For me, a big thing would be to be able to maintain the same query syntax. As an example, in the future I'd love to be able to do the equivelent of query azure for load balancers with public ip's, and then match that up to services/pods which are behind those load balancers. That's a longer term dream, but gives an idea of where having both parts of the equation would be super useful.

@ghost
Copy link

ghost commented Dec 17, 2020

@obowersa This is an excellent example which makes much sense to me now. We will try to schedule an initial version for k8s in a few weeks as we roll-out Azure next week.

@yevgenypats
Copy link
Member

@obowersa @jbianquetti-nami @mkmik We've added basic support for k8s. Currently, only pods and services are supported but I'd love to hear early feedback before we add more resources. When I mentioned only two resources - it creates about 32 tables - https://schema.cloudquery.io/tables/k8s_pods.html, https://schema.cloudquery.io/tables/k8s_services.html

@dancompton

This comment was marked as spam.

@yevgenypats
Copy link
Member

Hi @dan-compton thanks for the feedback! I believe in the future the provider implementation will reside in different repositories to have a more pluggable architecture (kinda like in terraform).

@dancompton

This comment was marked as spam.

@yevgenypats
Copy link
Member

Closing - k8s provider moved to moved to https://github.com/cloudquery/cq-provider-k8s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants