Skip to content

[Request deployment] New Hub: Climatematch Academy #2524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 of 7 tasks
colliand opened this issue May 3, 2023 · 37 comments
Closed
4 of 7 tasks

[Request deployment] New Hub: Climatematch Academy #2524

colliand opened this issue May 3, 2023 · 37 comments
Assignees
Labels
new hub For issues that request a new hub deployment

Comments

@colliand
Copy link
Contributor

colliand commented May 3, 2023

The GitHub handle of the community representative

@abodner

Hub important dates

Target start date: 2023-06-01
Target end date: 2023-08-31

Heavy usage will take place during the course. The course will run July 17-28 2023.

Hub Authentication Type

GitHub (e.g., @MyGitHubHandle)

First Hub Administrators

  • @abodner
  • @WesleyTheGeolien
  • Who else? Abigail, please let us know whether other colleagues should be added in a comment below?

[GitHub Auth only] How would you like to manage your users?

Allowing members of specific GitHub team(s)

[GitHub Teams Auth only] Profile restriction based on team membership

pending

Abigail, can you please point to the GitHub team that Climatematch will use to manage user access to the hub?

Hub logo image URL

https://lh6.googleusercontent.com/pK1Zrf_NmWJ5KqhFB___4p8HPTf4D6u2om5UQkJbVQcwGjDSwlELPibkFfqW809chxybGrQwgiln8v0fRC00fYGzrsb6vIfFtsbh6PetpJKrk_UPoUb-4-RAH6ibtpXyxQ=w1280

Hub logo website URL

https://academy.climatematch.io/

Hub user image GitHub repository

pending, likely best to use latest pangeo image

Hub user image tag and name

pending; likely latest pangeo image

Extra features you would like to enable

  • Dedicated Kubernetes cluster
  • Scalable Dask Cluster

(Optional) Preferred cloud provider

AWS

(Optional) Billing and Cloud account

None

Other relevant information to the features above

Climatematch Academy will train a cohort of ~1000 students in computational methods for climate science. The academy is partly inspired by Pangeo and builds on a similar virtual school in Neuroscience created and operated by Neuromatch.

  • shared cluster; cloud costs passed through by 2i2c/CS&S to be paid by Climatematch/Neuromatch
  • for Pangeo style work? 2i2c can suggest best cloud vendor and data center; perhaps AWS us-west-2; or GCP?
  • likely with a single hardware profile suitable for Pangeo-style student work; @abodner reports happiness using the small machine type deployed for M2Lines hub.
  • suggested hub url: climatematch.2i2c.cloud with staging.climatematch.2i2c.cloud for companion staging hub. @abodner and team may redirect using DNS to point at hub.climatematch.io.

Tasks to deploy the hub

  • 1. Deploy information filled in above
  • 2. Engineer who will deploy the hub is assigned
  • 3. If using GitHub Orgs/Teams Auth, Engineer is given Owner rights to the org to set this up.
  • 4. Initial Hub deployment PR
  • 5. Administrators able to log on -> Hub now in steady-state
@damianavila
Copy link
Contributor

I have re-assigned this deployment to @yuvipanda. Yuvi, if you have any doubts about the details of this deployment, please ping @colliand for further details.

@damianavila damianavila moved this to Todo 👍 in Sprint Board May 17, 2023
@WesleyTheGeolien
Copy link

WesleyTheGeolien commented May 18, 2023 via email

@yuvipanda
Copy link
Member

Glad to work with you, @WesleyTheGeolien! Yes, we prefer you use quay.io rather than dockerhub! Let us know once the repo + image are setup :)

@WesleyTheGeolien
Copy link

Great,

Will do @yuvipanda,quick question will it always pull the latest? Eg. I post an image then realise I need some extra dependency so build and push a new image (potentially with some tag but the same tag I give you) will that auto update the hub (bearing in mind some time to propagate?)

In previous projects I have used watchtower I don't know if your setup uses something similar?

@yuvipanda
Copy link
Member

@WesleyTheGeolien if your hub uses only one image, you will be able to self-configure it as an admin to pull whatever tag you want. We prefer to not use the 'latest' tag, but have the admins change tags when necessary via UI.

@yuvipanda
Copy link
Member

And re: teams, let's just start with allowing access to the students team and see if that is enough?

@WesleyTheGeolien
Copy link

@yuvipanda ok sounds good:

  • go with only students
  • I will provide Docker image and tag when built (next few days) then can sort out the tags and updates myself 👍

@WesleyTheGeolien
Copy link

Hi @yuvipanda

So I have setup our ci to build docker image and currently push to my personal dockerhub: https://hub.docker.com/r/wesleyban/climatematch-notebook

We are looking at changing this to quay.io and associating with climatematch so it is succeptible to change in the coming days/ weeks, sorry for the hassle.

if needed the dockerfile can be found here: https://github.com/ClimateMatchAcademy/course-content/blob/docker/Dockerfile (currently on docker branch but will be merged into main)

@yuvipanda
Copy link
Member

@WesleyTheGeolien thanks! I realize the GCP vs AWS question hasn't been resolved. What kinda data would you be using this with? My inclination is to put this on GCP as that is where our existing shared cluster lives. Any objections?

@WesleyTheGeolien
Copy link

@yuvipanda I don't know if you are authorized to say but it would be the "same" or similar datasets to Pangeo, I am not sure where they host?

I guess the main issues is around data access to Climate data sets in the cloud and not having to pay network egress fees.

Otherwise I have uploaded "small" datasets to OSF -> Climatematch not sure how that would integrate ?

Also the questions about does AWS / GCP allow connections from all countries? We have a substantial amount of students in Iran and China for example would this cause a problem on either of the platforms? If so I guess we choose the other platform!

I have canvassed my team members and will get back with the list of cloud hosted resources we are using.

@yuvipanda
Copy link
Member

similar datasets to Pangeo

Unfortunately this is too broad :( All the current pangeo related hubs (including m2lines) are hosted on GCP, so maybe if that works, this is fine?

I guess the main issues is around data access to Climate data sets in the cloud and not having to pay network egress fees.

Note that network egress fees aren't paid by you, but by the agency hosting the data.

I have canvassed my team members and will get back with the list of cloud hosted resources we are using.

This would very much help!

@WesleyTheGeolien
Copy link

WesleyTheGeolien commented May 22, 2023

In that case if all Pangeo is hosted on GCP I think that is fine, please confirm @abodner.

Ahh I thought the egress charges were paid by the hub, that is somewhat a win then!

Here is a list of current datasets being used:

  • CMIP data from pangeo

  • SST data loaded from NOAA in the notebook

  • Precipitation data loaded from NOAA in the notebook

  • Air temperature anomaly data loaded from NOAA in the notebook

  • CHIRPS (but looks like we have some locally saved files)

  • MODIS (but looks like we have some locally saved files)

  • ECCO-2

  • MERRA2

  • ERA5 (was s3 bucket)

@yuvipanda
Copy link
Member

@WesleyTheGeolien picking this back up,

We have a substantial amount of students in Iran and China for example would this cause a problem on either of the platforms? If so I guess we choose the other platform!

Unfortunately this is totally out of our control, and afaik both cloud platforms are the same here (blocked in Iran, accessible in China).

@yuvipanda
Copy link
Member

@WesleyTheGeolien and just to confirm (because you mention use with m2lines), you are not planning on using dask-gateway with this hub?

@abodner
Copy link

abodner commented Jun 1, 2023

Correct @yuvipanda, we are not planning to use dask!

@yuvipanda
Copy link
Member

@WesleyTheGeolien @abodner check out https://climatematch.2i2c.cloud!

  • Setup on GCP, us-central1. Same as m2lines.
  • Setup to auth on GitHub, with access to ClimateMatchAcademy:2023students. The first time you login, *you must specifically grant access to the ClimateMatchAcademy organization (there should be a "Grant" button next to the list of orgs you are a part of) when you log in. If this is confusing / does not work, and you are willing to temporarily grant me admin rights on the ClimateMatchAcademy organization, I can set this up too.
  • Latest pangeo image is setup.
  • I've granted this a 2G memory limit and 1G memory guarantee. Test this out and we can see if we want to increase this? I think it's always better to start small and increase when necessary with testing.
  • I've set you up on a separate nodepool, given we are expecting ~1000 users. This allows us to scale separately as needed when work starts. It's a fairly small node now (n2-highmem-2) but we can make that bigger too closer to the time of startup.

Test it out and lmk how it goes?

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 1, 2023
- Add ability to setup resource labels for nodepools, to
  allow us to try tracking cloud costs on a per nodepool basis
  in the
  future (https://cloud.google.com/resource-manager/docs/creating-managing-labels).
  Changing these requires recreating the nodepool, so setting it
  now.
- Allow adding taints to nodepools, so we can properly dedicate a
  particular nodepool to a particular hub. Given this hub is expected
  to get a few thousand users, this is useful.
- Put climatematch on its own nodepool, but keep it small as the event
  is not expected to start until July 17.

Ref 2i2c-org#2524
@colliand
Copy link
Contributor Author

colliand commented Jun 1, 2023

Thanks @yuvipanda. FYI @abodner, the ClimateMatch Academy hub is available for testing here: https://climatematch.2i2c.cloud/

@yuvipanda
Copy link
Member

@abodner @WesleyTheGeolien if you'd like this to be at hub.climatematch.io, please add a CNAME record pointing hub.climatematch.io to climatematch.2i2c.cloud. I'd like us to keep the staging domain under 2i2c.cloud if that's ok though.

@abodner
Copy link

abodner commented Jun 1, 2023

All sounds good. This is very exciting! Thanks all for being so quick!

@abodner
Copy link

abodner commented Jun 1, 2023

@yuvipanda the logo is not ours. I have shared ours in the past but can provide another file.

It would be great if students did not have to have the additional github grant access step. I am happy to give you admin rights if that can be spared from students.

@yuvipanda
Copy link
Member

@abodner ah yes please do provide a URL to a logo I can use! The logo link in this GitHub issue doesn't work :(

And yes, the 'grant' step only needs to happen the very first time. Please grant me admin access, I'll do it and then we can remove my access.

@abodner
Copy link

abodner commented Jun 1, 2023

Thanks @yuvipanda you should have admin access now. Let me know when you are finished please, I'd like to limit the number of admins on our side.

@abodner
Copy link

abodner commented Jun 1, 2023

@yuvipanda
Copy link
Member

@abodner you can remove my access now, all good now. You should try to get someone with just student team access to login to make sure it works, but it should.

I don't think we can link directly to the google drive link :( Is it already on your website or somewhere we can directly include as an <img> tag maybe?

@abodner
Copy link

abodner commented Jun 1, 2023

Thanks @yuvipanda. Can I send you the png for now? We use google sites and I am not sure the logo is stored in a very clever way.
CMA_logo_text_transparent

@yuvipanda
Copy link
Member

@abodner hmm I'll poke around with it tomorrow if that's ok!

Do test out the memory available to see if that works or we need to increase it!

@abodner
Copy link

abodner commented Jun 1, 2023

Sounds great, thanks @yuvipanda !
Are all datasets @WesleyTheGeolien provided available already?

@yuvipanda
Copy link
Member

Ah, I haven't done anything related to those. I though those are all externally provided (by NOAA or GCP or similar) and don't need anything done on our end. Can you verify that, @WesleyTheGeolien?

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 1, 2023
- Add ability to setup resource labels for nodepools, to
  allow us to try tracking cloud costs on a per nodepool basis
  in the
  future (https://cloud.google.com/resource-manager/docs/creating-managing-labels).
  Changing these requires recreating the nodepool, so setting it
  now.
- Allow adding taints to nodepools, so we can properly dedicate a
  particular nodepool to a particular hub. Given this hub is expected
  to get a few thousand users, this is useful.
- Put climatematch on its own nodepool, but keep it small as the event
  is not expected to start until July 17.

Ref 2i2c-org#2524
@yuvipanda
Copy link
Member

@abodner I've fixed the logo, check it out.

I'll wait to hear from @WesleyTheGeolien about datasets.

@damianavila damianavila moved this from Needs Shaping / Refinement to In progress in DEPRECATED Engineering and Product Backlog Jun 8, 2023
@damianavila damianavila moved this from Todo 👍 to In Progress ⚡ in Sprint Board Jun 8, 2023
@damianavila damianavila moved this from In progress to Waiting in DEPRECATED Engineering and Product Backlog Jun 8, 2023
@damianavila damianavila moved this from In Progress ⚡ to Waiting 🕛 in Sprint Board Jun 8, 2023
@WesleyTheGeolien
Copy link

Ahh sorry everyone somehow missed these notifications.

Hey @yuvipanda so we had some questions around data on the hub. We use publicly hosted cloud datasets, from my understanding these are fine to interact with (without egress charges) (with the potential caveat of needing to be on the same region as they are hosted). However we also have some other data sets (roughly 20 GB hosted on osf as well as a 50ish Gb data set we are still unsure on what to do with.

I think pulling this data from every student on the hub seems a bit redundant? Is there a way to cache data / add data to the Hub? I saw some s3 connectivity in the jupyter lab interface? Just wondering on what the best practices are for getting data up there? (I assume baking it into the Docker image is a bad idea -> we don't really want 100gb images ...)

@yuvipanda
Copy link
Member

@WesleyTheGeolien there is a 'shared-readwrite' directory available that admins can put datasets in, and it is available in a readonly fashion under the 'shared/' directory for everyone else. Think that can work out?

@WesleyTheGeolien
Copy link

Thanks @yuvipanda that should work out.

Another quick question I have someone testing the hub. From my understanding each user has a provision of ~12Gb of Ram but at the bottom (near the left) of the screen it says 2Gb, and they are complaining that loading a 800mb file into memory is crashing the hub. Is this expected?

cheers

image

@colliand
Copy link
Contributor Author

The climatematch logo is not rendering as the splash image on the login page: https://climatematch.2i2c.cloud/hub/login. FYI @yuvipanda.

@yuvipanda
Copy link
Member

@WesleyTheGeolien as i mentioned in #2524 (comment), I actually have provided only 2G of RAM right now. m2lines 'small' profile is about 7GB - want me to bump that up?

@WesleyTheGeolien
Copy link

Ahh thanks @yuvipanda I didn't see that, yep we are getting crashes when running our tutorials so bumping to 7gb would be great, out of interest are these arbitary values or set steps?

@yuvipanda
Copy link
Member

@WesleyTheGeolien alright, bumped now ain #2665!

@yuvipanda
Copy link
Member

@WesleyTheGeolien @abodner I'm going to close this issue now, as the hub is up and running. Please email support@2i2c.org if you have any more issues! And definitely let us know at least 2 weeks before any major events with information on how many people you expect, so we can size up your nodes accordingly.

@github-project-automation github-project-automation bot moved this from Waiting 🕛 to Done 🎉 in Sprint Board Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new hub For issues that request a new hub deployment
Projects
No open projects
Archived in project
Development

No branches or pull requests

5 participants