Deploy support chart for the openscapes hub #827

GeorgianaElena · 2021-11-11T10:07:04Z

After this PR is merged and the support chart is deployed, we should manually trigger the grafana deployer action in order to have the grafana dashboards available for this hub.

Ref: #810

Note:

(from jupyterhub/grafana-dashboards)

NOTE: ANY CHANGES YOU MAKE VIA THE GRAFANA UI WILL BE OVERWRITTEN NEXT TIME YOU RUN deploy.bash. TO MAKE CHANGES, EDIT THE JSONNET FILE AND DEPLOY AGAIN

So just a heads up that if we deploy the dashboards through the action, any changes made to the other's hub grafanas will be ovewriten.

damianavila

LGTM, although I think there are two follow-ups:

DNS entry for the grafana DNS you picked: Setup prometheus + grafana for carbonplan #533 (comment)
The grafana dashboards you mentioned

GeorgianaElena · 2021-11-11T15:28:06Z

LGTM, although I think there are two follow-ups:

Thanks @damianavila! :D

DNS entry for the grafana DNS you picked: #533 (comment)

Waa, didn't know we had steps for this written somewhere! This is great <3! TY!
(We have them in the docs too https://infrastructure.2i2c.org/en/latest/howto/operate/grafana.html?highlight=support#deploy-the-support-chart btw 🎉 )

Update: I'm manually deploying the support chart now!

GeorgianaElena · 2021-11-11T17:25:04Z

@damianavila, I deployed the support chart manually, but finished with a timeout error.
The support pods is in a pending state and I see this warning during scheduling:

0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had taint {hub.jupyter.org_dedicated: user}, that the pod didn't tolerate.

Any ideas on how to spin up a node for the support pods?

damianavila · 2021-11-11T19:26:36Z

Mmm... I was expecting most of the support-related pods to be scheduled on the master node that also serves as the core node in our kops-based deployments.
Wondering if we might need to work around some of the support-related pods to run on master nodes as we do with CoreDNS 🤔

@yuvipanda might have more details/thoughts/ideas...

Also, wondering if we would face this one as well: #594
Notice that, in the old carbonplan PR (#533), a few more things were needed to set up support (check the LoadBalancer and traefik removals). And I think that PR was on top of some previos updates where the master node was actually split into master + core independent nodes: #532. So, adding support to kops-based deployments could be more complex than we originally thought... 😢

damianavila · 2021-11-11T23:54:37Z

So, adding support to kops-based deployments could be more complex than we originally thought... 😢

Btw, if things get really complicated here, my inclination would be to default to the current stability and keep the status quo even when that means no support (nor grafana) stuff in the openscapes cluster. We are super close to the event start date and changes should be as minimal and simple as possible if we want to avoid surprises during the event...

yuvipanda · 2021-11-12T05:31:42Z

Yeah, you definitely need the same extra tolerations the core dns pod gets on all the pods we schedule in the master nodes.

Might help to timebox the support node deployment setup. You don't have to use the nginx ingress for the hubs, so the load balancer can just stay for the hubs - we don't have to touch that. However, we must make sure that the extra pods being present on the master node - particularly prometheus - doesn't put undue extra stress on the master node components. So let's make sure they have cpu and memory limits set, and maybe are on a node by themselves?

yuvipanda · 2021-11-12T05:32:43Z

But let's think of this as 'adding Prometheus and grafana' rather than 'migrating a kops based deployment to use the support chart'. Let's migrate off kops soon instead!

choldgraf · 2021-11-12T17:28:00Z

From these conversations, how about this proposal:

Proposal

Try deploying the support resources for, say, 1 hour. If that doesn't work, then go with @damianavila's suggestion to focus more on doing what we think will make the infrastructure stable ahead of Monday's event. If somebody thinks "this will definitely take more than an hour", then we should just skip this bit and know that we'll need to pay extra attention via kubectl during the event.
Regarding EKS, I made this proposal here to remove kops from our infrastructure and would love to hear thoughts once we have bandwidth to revisit.

damianavila · 2021-11-12T17:35:12Z

This will definitely take more than 1 hour, IMHO (in fact, I have already spent more than 1 hour looking at some of these details).

And, more importantly, implies modifications bigger enough I am not conformable with pushing forward on a Friday evening before the Monday start of the event.

I would suggest closing this PR now and focusing on the transition to EKS after the event.

choldgraf · 2021-11-12T17:55:31Z

I would suggest closing this PR now and focusing on the transition to EKS after the event.

This sounds like a good plan to me. I agree this feels like we are doing too many things "for the first time and with uncertainty about whether it'll work" just before an event...

damianavila · 2021-11-14T15:12:41Z

Closing this one now to avoid confusion about what things are really active for the upcoming event.

Thanks for opening the PR, @GeorgianaElena!!

Deploy support chart for the openscapes hub

517f09e

GeorgianaElena requested a review from damianavila November 11, 2021 10:07

damianavila approved these changes Nov 11, 2021

View reviewed changes

choldgraf mentioned this pull request Nov 12, 2021

Guidelines for using kops vs EKS #431

Closed

damianavila closed this Nov 14, 2021

GeorgianaElena deleted the openscapes branch November 15, 2021 09:15

damianavila mentioned this pull request Nov 15, 2021

Team Sync - Monday, November 15th 2i2c-org/team-compass#301

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy support chart for the openscapes hub #827

Deploy support chart for the openscapes hub #827

GeorgianaElena commented Nov 11, 2021 •

edited

damianavila left a comment

GeorgianaElena commented Nov 11, 2021 •

edited

GeorgianaElena commented Nov 11, 2021

damianavila commented Nov 11, 2021

damianavila commented Nov 11, 2021

yuvipanda commented Nov 12, 2021

yuvipanda commented Nov 12, 2021

choldgraf commented Nov 12, 2021

damianavila commented Nov 12, 2021 •

edited

choldgraf commented Nov 12, 2021

damianavila commented Nov 14, 2021

Deploy support chart for the openscapes hub #827

Deploy support chart for the openscapes hub #827

Conversation

GeorgianaElena commented Nov 11, 2021 • edited

Note:

damianavila left a comment

Choose a reason for hiding this comment

GeorgianaElena commented Nov 11, 2021 • edited

GeorgianaElena commented Nov 11, 2021

damianavila commented Nov 11, 2021

damianavila commented Nov 11, 2021

yuvipanda commented Nov 12, 2021

yuvipanda commented Nov 12, 2021

choldgraf commented Nov 12, 2021

Proposal

damianavila commented Nov 12, 2021 • edited

choldgraf commented Nov 12, 2021

damianavila commented Nov 14, 2021

GeorgianaElena commented Nov 11, 2021 •

edited

GeorgianaElena commented Nov 11, 2021 •

edited

damianavila commented Nov 12, 2021 •

edited