[EVENT] OpenScapes NASA Cloud Hackathon #810

choldgraf · 2021-11-05T19:43:33Z

Summary

OpenScapes is running their first cloud hackathon for the NASA community from November 15th through 19th. During that time they will have ~40 people accessing the OpenScapes instance, and then they will have access for the next 3 months.

supercedes #767 (this is the same issue, but now with our event template so we can try it out)

Info

Event begin: November 15th, 2021
Event end: November 19th, 2021
Community Representative: @jules32
Hub URL: openscapes.2i2c.cloud
Hub decommisioned after event?: no

Task list

Before the event

Dates confirmed with the community representative
One week before event Hub is running.

Confirm with Community Representative that their workflows function as expected.

👉Template message to send to community representative

Hey {{ COMMUNITY REPRESENTATIVE }}, the date of your event is getting close!

Could you please confirm that your hub environment is ready-to-go, and matches your hub's infrastructure setup, by ensuring the following things:
- [ ] Log-in and authentication works as-expected
- [ ] `nbgitpuller` links you intend to use resolve properly
- [ ] Your notebooks run as-expected

</details>

Ensure that everybody on the team can access this cluster and add documentation about it
Check and validate the quota for this cluster to make sure we have enough space
Check the size of the Kubernetes master nodes to make sure they're big enough

During and after event

Confirm event is finished.
Nodegroup created for the hub is decommissioned.
Hub decommissioned (if needed).

Debrief with community representative.

👉Template debrief to send to community representative

Hey {{ COMMUNITY REPRESENTATIVE }}, your event appears to be over 🎉

We hope that your hub worked out well for you! We are trying to understand where we can improve our hub infrastructure and setup around events, and would love any feedback that you're willing to give. Would you mind answering the following questions? If not, just let us know and that is no problem!

- Did the infrastructure behave as expected?
- Anything that was confusing or could be improved?
- Any extra functionality you wish you would have had?
- Are you willing to share a story about how you used the hub?
- Any other feedback that you'd like to share?

The text was updated successfully, but these errors were encountered:

choldgraf · 2021-11-05T19:48:43Z

Hey @jules32, the date of your event is getting close! Could you please confirm that your hub environment is ready-to-go, and matches your hub's infrastructure setup, by ensuring the following things:

Log-in and authentication works as-expected
nbgitpuller links you intend to use resolve properly
Your notebooks run as-expected

jules32 · 2021-11-05T20:39:17Z

Hi @choldgraf thanks for starting this issue!

At the moment I think we are a check for each box but looping in @betolink to confirm!

betolink · 2021-11-06T13:37:05Z

I think we are O.K.! maybe just worth mentioning that we'll have a 1 day clinic this Tuesday that may have ~30 participants.

choldgraf · 2021-11-09T01:13:04Z

@betolink - wait there will be 30 people logging on to the hub at once on Tuesday (tomorrow?) - I thought this event began on the 15th? FYI the hub might be slow to scale up as we have not pre-created nodes for the event (we were intending to do this before the 15th, not the 9th)

betolink · 2021-11-09T01:52:38Z

The event starts the 15th @choldgraf, tomorrow morning we are doing a preliminary 2 hour clinic to get the participants familiar with the hub. I don't think we'll have more than 25 people. It's ok if the spawning is not as fast.

jules32 · 2021-11-09T01:52:52Z

Hi @choldgraf, tomorrow we're holding a pre-hackathon clinic to help folks get familiar with Python, Bash, GitHub, and also log into 2i2c – they won't be running any code with cloud data there but it's an opportunity to test more logins at once and for them to get familiar ahead of time. Likely we'll have fewer than 30 people attend so expecting it will be ok with your timeline for the 15th, which is the real "go time".

Also, over the past weeks we've been having practice dry runs with our team so we've had 10+ people all log in and run code with cloud data and it's been going quite smoothly.

choldgraf · 2021-11-09T03:17:37Z

Gotcha - thanks for the heads up. In general, it's helpful for us to know when there are likely going to be "spikes" in users, since if those spikes are associated with a specific hub we should be able to make some modifications to make things happen more quickly.

A quick explanation for context, and so that I can re-use this in our documentation :-)

Something we are trying to do for events is "pre-cooking" the nodes on which the user sessions will be run. Each node has a finite number of user sessions that can be active on it at once. When a node is ready and has space, then when a user starts their session it happens pretty quickly. However, if a node becomes full and a new user starts a session, then the hub has to "scale up" and request a new node first. This often takes a bit of time to occur.

Usually it's not a big deal if the hub has off-and-on usage over time - it just means that every now and then a user will have to wait a bit longer for their session to start. However, in the context of an event, this can be unnecessarily disruptive because you often have moments like "ok now everybody open up a session at once". This often creates the awkward moment of everybody starting up a session, it triggers a "node scale up" event on the hub, and then the session is delayed by a few minutes while the hub scales up.

So, if we know there's an event happening, we try to give that event a dedicated set of nodes, and "pre-cook" some nodes so that there are more slots available in general. That way user sessions tend to start more quickly. However, we don't want to do this too far in advance because it costs extra money to pay for those nodes, so we want to minimize that time. So that's why we try to figure out when an event will start and end, so that we can do this at the appropriate time.

If we can't pre-cook the nodes before the session on Tuesday, another thing you can try is get people to log on to the hub and start a user session before you expect them to actually use it. So for example, at the beginning of the session have them log on and open a notebook or something, and then do introductions and passive listening, then hopefully by the time you're ready for "hands on" stuff, their sessions will be ready.

I hope some of that is helpful!

betolink · 2021-11-09T14:31:11Z

Thanks for clarifying all this @choldgraf !! I haven't read the 2i2c documentation extensively and was wondering what kind of nodes you used, For a moment I wondered if you used spot instances. Now, one thing I'm not sure how is setup but it would be good to confirm, the instances are 1 per user right? once they select say m5.large there is a guarantee that they are not sharing those 8GB(RAM) with other users.

erinmr · 2021-11-09T17:01:12Z

@choldgraf - Right now we are doing a pre-hack clinic today and I'm trying to start 2i2c. It's taking much longer than usual ~15 mins so far. It looks like 31 people have servers running. Wondering if we have maxed out users?

.

choldgraf · 2021-11-09T20:24:30Z

@erinmr oops I just saw this one (FYI the preferred place to make support requests is support@2i2c.org. I suspect that this is just slowness of the cluster in creating a new node. Sometimes this takes many minutes, other times it takes just a few (part of what I was referring to above). I tried creating a new session myself and it took a few minutes but was able to successfully start. Were you able to log-on in the end?

erinmr · 2021-11-10T02:12:27Z

Thanks so much, Chris. It did eventually successfully start. We will use the support@ for urgent support needs in the future. Thanks!

…

On Tue, Nov 9, 2021 at 1:24 PM Chris Holdgraf ***@***.***> wrote: @erinmr <https://github.com/erinmr> oops I just saw this one (FYI the preferred place to make support requests is ***@***.*** I suspect that this is just slowness of the cluster in creating a new node. Sometimes this takes many minutes, other times it takes just a few (part of what I was referring to above). I tried creating a new session myself and it took a few minutes but was able to successfully start. Were you able to log-on in the end? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#810 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAWHZYYXKSLEGGBW5HWSS5DULF7QRANCNFSM5HOSF7RA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

damianavila · 2021-11-11T03:37:58Z

@erinmr, @betolink, a few more questions for you:

Can you please confirm the number of attendees for the event (it seems it is close to 40 attendees, correct)?
Can you please confirm the profile option you are going to use (small, medium, large, huge, a mix of those)?
Do you have an idea of how "intensive" the event would be? Are you using dask-gateway at all? If that is the case, how many worker nodes do you think are you going to need per user?

Thanks!

erinmr · 2021-11-11T03:44:19Z

Hi Damian - Let's plan for 80 attendees - 40 hackers + 20 staff and 20 observers that may start an instance. We plan to have them just using small profiles. There may be a few that tend toward larger instances later in the week, but that should be less than 10 instances total. No dask gateway for this round is planned. Here is the material we are planning to teach to get a sense of scope https://nasa-openscapes.github.io/2021-Cloud-Hackathon/ Thanks - E

…

On Wed, Nov 10, 2021 at 8:38 PM Damian Avila ***@***.***> wrote: @erinmr <https://github.com/erinmr>, @betolink <https://github.com/betolink>, a few more questions for you: 1. Can you please confirm the number of attendees for the event (it seems it is close to 40 attendees, correct)? 2. Can you please confirm the profile option you are going to use (small, medium, large, huge, a mix of those)? 3. Do you have an idea of how "intensive" the event would be? Are you using dask-gateway at all? If that is the case, how many worker nodes do you think are you going to need per user? Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#810 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAWHZYZPTRAS4FHTOGQNBI3ULM3CDANCNFSM5HOSF7RA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

damianavila · 2021-11-11T03:48:46Z

@yuvipanda,

I have checked the master node and it is an m5.large instance with AG min:1 and max:3. When I looked into the config file in the repo I see a t3.medium instance:

infrastructure/kops/openscapes.jsonnet

Line 30 in c9efa32

machineType: "t3.medium",

). Could it be the case the master was at some time updated to m5.large and that was not persisted in the repo?
I have checked the available quotas and it seems we have a total of 512 vCPU limit (which might be enough depending on the usage they are going to do during the event):

Running On-Demand All Standard (A, C, D, H, I, M, R, T, Z) instances | Running instances | 512 vCPUs | Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances

I currently see 3 m5.large nodes and 1 m5.xlarge node... when I looked into the autoscaling groups, it seems somehow there is some "Desired capacity" already configured with those values but I do not see any "Desired capacity" on the config files for the cluster... I only see a maxSize:

infrastructure/kops/openscapes.jsonnet

Line 54 in c9efa32

maxSize: 20,

). Could it be the case this was somehow manually configured at some time? If that is the case, I guess we could use the same approach to pre-warm the nodes a few hours before the start of the event, right?
We should deploy the support chart and all the grafana stuff on this cluster in case we need to diagnose any problems.

cc @GeorgianaElena

damianavila · 2021-11-11T09:49:48Z

About item 3, I just checked again this morning and now I see just 2 nodes and when I check the AG, the "Desired capacity" values are now "1" for m5.large and "1" for m5.xlarge as well... so it seems someone is already playing with this config 😉 .

Update: this is our autoscaler messing with us 😉

damianavila · 2021-11-11T12:38:33Z

Additionally, I think pre-pulling the image would be useful here as well since it takes more than 1 min to fetch the configured image (configured with the configurator, so we should promote that one into the config file to pre-pull it).

damianavila · 2021-11-11T14:37:49Z

I have successfully tested warming the cluster to ~80 small nodes using ASG, we should probably do the update using kops...

damianavila · 2021-11-11T14:52:31Z

Since the event is starting at 9 am PST time we can warm the nodes on EST time on Monday morning so we are optimizing for cost as well.

betolink · 2021-11-12T19:40:33Z

Hi @damianavila, questions for 2i2c.

Since we are trying to be flexible with our base image, if we update it, will the pre-pulling in Jupyterhub catch up the change? (a different docker tag)
Looks like I can't write to shared-readwrite and I'm an admin... which leads me to another question: Can we have another shared mount for all the participants(r+w)? maybe promoting this shared-readwrite/ mount to anybody in the hub? We are thinking 100GB should be enough for now.
We're thinking that setting up an S3 bucket would be super handy to show some concepts on blob storage in the cloud and also to share some results from the hack week, is this possible within the infrastructure 2i2c can provide?
If a team decides to spin up a Dask cluster, will that use the same cluster allocation we already have? or the Kubespawner will requests extra nodes? (actually, is this even possible with the current setup?) We are not focusing in Dask but just in case we need it.

Should I send these questions to support@2i2c.org as well, or here is enough? Thanks

betolink · 2021-11-12T20:54:23Z

I tested changing the image and looks like it's working as expected, the new tag is pulled.

damianavila · 2021-11-12T23:02:12Z

Should I send these questions to support@2i2c.org as well, or here is enough? Thanks

We would appreciate it if you can tunnel your question through support!!

But since I am already here, let me reply to some of your questions now...

Pre-pulling will not catch up if you update the image through the configurator. What you have tested is if the new tag is pulled once you have configured it and that will happen regardless of the pre-pulling setting. Pre-pulling is taking care of pulling the image into the node before the pod arrives.
Since I want to maintain the flexibility you mentioned above and because the current kops-based cluster supporting your hub was never tested for pre-pulling, I think it is probably a good idea to keep the status quo for now with pre-pulling deactivated.
We need to check this one, I will investigate and report back why you can not write to the shared-readwrite directory.
I also need to investigate how promoting that shared directory to w+r would look like since that is not standard practice on our deployments.
You should be able to reach any s3 point from your pod/notebook server, AFAIK. Although we are not doing any automatic auth on kops-based clusters (we are indeed doing automatic auth on EKS-based clusters and we are going to migrate your cluster to EKS soon, but not before the event 😉 ) so you would need to handle the auth by yourself.
Your current kops-based cluster is what we call a daskhub and it will spin up dask-specific nodes if you start a Dask cluster (modulo you have all the dask-specific pieces on your custom image, such as dask-gateway). We should be OK with a few users spinning up clusters but I would be worried if all your attendees want to start dask clusters because our above pressure-testing (quota testing) was starting small profiles without Dask cluster involved.

cc @GeorgianaElena who is playing our support steward role during your event.

erinmr · 2021-11-12T23:09:52Z

Thanks. I sent to support too!

…

On Fri, Nov 12, 2021 at 4:02 PM Damian Avila ***@***.***> wrote: Should I send these questions to ***@***.*** as well, or here is enough? Thanks We would appreciate it if you can tunnel your question through support!! But since I am already here, let me reply to some of your questions now... 1. Pre-pulling will not catch up if you update the image through the configurator. What you have tested is if the new tag is pulled once you have configured it and that will happen regardless of the pre-pulling setting. Pre-pulling is taking care of pulling the image into the node before the pod arrives. Since I want to maintain the flexibility you mentioned above and because the current kops-based cluster supporting your hub was never tested for pre-pulling, I think it is probably a good idea to keep the *status quo* for now with pre-culling deactivated. 2. We need to check this one, I will investigate and report back why you can not write to the shared-readwrite directory. I also need to investigate how promoting that shared directory to w+r would look like since that is not standard practice on our deployments. 3. You should be able to reach any s3 point from your pod/notebook server, AFAIK. Although we are not doing any automatic auth on kops-based clusters (we are indeed doing automatic auth on EKS-based clusters and we are going to migrate your cluster to EKS soon, but not before the event 😉 ) so you would need to handle the auth by yourself. 4. Your current kops-based cluster is what we call a daskhub and it will spin up dask-specific nodes if you start a Dask cluster (modulo you have all the dask-specific pieces on your custom image, such as dask-gateway). We should be OK with a few users spinning up clusters but I would be worried if all your attendees want to start dask clusters because our above pressure-testing (quota testing) was starting small profiles without Dask cluster involved. cc @GeorgianaElena <https://github.com/GeorgianaElena> who is playing our support steward role during your event. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#810 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAWHZY3T5UH535LYKBYCAODULWMH5ANCNFSM5HOSF7RA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

damianavila · 2021-11-15T11:51:58Z

@betolink @erinmr, I have promoted the writable shared directory feature to the prod hub and tested it successfully.

Once the event is finished, we would love to hear your experiences with a writable shared directory, btw 😉 .

damianavila · 2021-11-15T11:54:14Z

Now, let's agree on some points for the event:

Please, going forward tunnel any question/request to the support email as indicated on this page: https://docs.2i2c.org/en/latest/support.html
We are going to warm up the cluster at 7:45 am PT so it is ready to receive your attendees and cold down the cluster at 1:00 pm PT. During the "hot" hours, we are going to warm up to 80 concurrent small nodes. If you ever need more nodes, let us know so we can raise that limit. During the "cold" hours the max amount of concurrent users would be 20 concurrent small nodes. We chose this pattern to optimize for cost and accordingly to your event schedule. If you believe the hours or numbers should be adjusted, let us know ASAP.

betolink · 2021-11-15T15:17:58Z

Hi @damianavila, we think that 40 small instances would be the minimum during the week, even after 1PM PT. Is this a hard limit? or just passing 40 they'll take more time to spawn? we'll route future support questions to support@2i2c.org.

damianavila · 2021-11-15T16:05:16Z

@betolink, do you think there will be up to 80 concurrent users during the "cold" hours at some point?
If your answer is yes, then I would recommend keeping the 80 nodes during the whole week.
If the answer is no, then what would be the max amount of users during the "cold" hours given your estimations?
Currently, the max amount of nodes is an "automated" hard limit but we can change it manually if we need to (and if you signal that to us).

Btw, the cluster is already pre-warmed with 80 nodes as of now...

Note: In the close future, when we migrate your cluster to EKS the whole experience would be better: we will just define a min and max amount of nodes and the autoscaler will do the magic for you accordingly to the user's load.

betolink · 2021-11-15T16:21:31Z

It's hard to say if we'll have 80 concurrent in cold hours, my guess is no but what if... @erinmr are we OK with 80 during the whole week?

jules32 · 2021-11-15T16:36:55Z

Yes, 80 concurrent sounds good for the whole week.

…

On Mon, Nov 15, 2021 at 8:21 AM Luis Lopez ***@***.***> wrote: It's hard to say if we'll have 80 concurrent in cold hours, my guess is no but what if... @erinmr <https://github.com/erinmr> are we OK with 80 during the whole week? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#810 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABM6ORIYSUODBJN23VKD2DDUMEXRNANCNFSM5HOSF7RA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Julia Stewart Lowndes, PhD Openscapes Co-Director National Center for Ecological Analysis and Synthesis (NCEAS <https://www.nceas.ucsb.edu/>) University of California, Santa Barbara (UCSB) Openscapes <https://openscapes.org> • Ocean Health Index <http://ohi-science.org/> • Mozilla Fellow website <http://jules32.github.io/> • github <https://github.com/jules32> • twitter <https://twitter.com/juliesquid>

betolink · 2021-11-15T20:19:27Z

It seems like 60 is the magic number. Not seeing more than 50 today and probably that would be the case for the rest of the week. @damianavila

damianavila · 2021-11-15T21:03:56Z

Yep, I was monitoring the whole day and I can confirm the 50 pods/nodes.
Btw, I also detected some pods (3) on nodes greater than the smaller ones, not sure if that was attendees just choosing the wrong profile of something else (nothing to be worried about, IMHO, just letting you know about it).

I guess everything went smoothly today, am I right @betolink?

Btw, I will adjust the number of available small nodes to be 60 later tonight...

I will also try to monitor the nodes during the night to see if it really makes sense to keep 60 nodes live the whole time (my soul cries knowing that some nodes would be there without actually being used 😉 ).

betolink · 2021-11-15T21:31:46Z

Everything went smooth @damianavila! I bet you're right, probably 30 during cold hours but maybe, just maybe at the end of the week everybody will be actually working on their projects and that might change. I'm looking forward for the EKS autoscaler for the next event!

damianavila · 2021-11-16T02:14:52Z

Btw, I will adjust the number of available small nodes to be 60 later tonight...

@betolink, I have drained the 80 small nodes so I can set up the new "magic" value of 30 for the "cold" hours (sorry if your pod was restarted, that should not happen in the future even with new adjustments!).

Btw, right now, I can see 3 small nodes being used (one of them is a dask one 😉 ) and I can also see some bigger nodes (xlarge and 2xlarge) being used for more than 8 hours (be aware of those in case you want to terminate them).

Finally, I will raise the "magic" value to 60 tomorrow morning PT for the "hot" hours as I did early today (pre-warm process).

Have a great evening/night!

damianavila · 2021-11-16T16:00:04Z

OK, the cluster is already pre-warmed to 60 small nodes.

Btw, I have been following the node's occupancy for the last 12-14 hours and during night time we had 2-3 concurrent users at maximum. I think we would be totally OK cooling down the cluster to the standard configuration from 6 pm to 8 am, which will result in a fully functional autoscaling process able to support up to 20 concurrent users and, at the same time, saving resources if we only have a few working during the night.

@betolink, I would recommend the above process going forward and re-adjust if necessary accordingly to the real demand, let me know WDYT.

betolink · 2021-11-16T16:30:30Z

I think you're right again @damianavila let's use the defaults for the cold hours. Quick question, what's the idle time before an instance gets shut down?

damianavila · 2021-11-16T19:22:20Z

Quick question, what's the idle time before an instance gets shut down?

For the default experience, IIRC, the autoscaler terminates the node about 10 minutes after it is vacated.

betolink · 2021-11-17T21:30:29Z

Hi @damianavila, do you know if you can add the --collaborative flag in the hub configuration or I need to do it on the Docker image? to enable this https://jupyterlab.readthedocs.io/en/stable/user/rtc.html

damianavila · 2021-11-18T01:33:39Z

@betolink, we actually have an open issue to add this feature here: #441
But, regrettably, it seems there is currently a blocker for making it happen: jupyterlab-contrib/jupyterlab-link-share#10.

@consideRatio may have more details... but I see it unlikely to have this feature enabled for the already running event.

consideRatio · 2021-11-18T07:23:26Z

The link share button is a user interface feature that doesn't work in a JupyterHub setup with typical authentication, but the collaborative mode can be enabled in a way that allows jupyterhub admins to collaborate with other admins or non-admin users on the non-admin users servers. This is because admins can be granted permissions to access other users servers.

@damianavila you can enable this by https://github.com/2i2c-org/infrastructure/pull/436/files#diff-ca80c8d18c23e271ff0620419ee36c9229c33141accc37f88b6345ecd34bef42R68-R73, the referenced issue is resolved with the latest version of JupyterLab so it may work properly I think. You would also need to make sure the jupyterhub admins have the ability to access others servers if that isn't already enabled.

damianavila · 2021-11-19T00:49:07Z

Thanks for your feedback, @consideRatio.

@betolink, given that we are approaching the end of the event, I would suggest we try to set up this configuration next week, maybe? We need to test some things on our side first, as @consideRatio suggested.

betolink · 2021-11-19T21:52:36Z

Just to thank you all, we had a great hack week and your support was key! @damianavila @choldgraf @consideRatio

Our teams will continue to work on their projects but I guess we won't need 60 concurrent instances. I look forward to the migration to EKS and enabling the real time collaboration feature!

jules32 · 2021-11-19T22:01:49Z

Echoing Luis, thank you all so much! Excited with how this worked this week and that it's just the beginning!

choldgraf · 2021-11-19T22:04:35Z

ahhh these posts warm my heart :-)

THANK YOU to the 2i2c team for being so great

erinmr · 2021-11-19T22:11:07Z

This is an exciting chart that shows our AWS/2i2c usage. We loved it! Thank you especially to Georgiana for super support. T

…

On Fri, Nov 19, 2021 at 3:04 PM Chris Holdgraf ***@***.***> wrote: ahhh these posts warm my heart :-) THANK YOU to the 2i2c team for being so great — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#810 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAWHZY5OXJDL2ZASAR7JZVLUM3CX3ANCNFSM5HOSF7RA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

damianavila · 2021-11-20T00:13:38Z

It is wonderful to hear you have a nice experience, folks!
And, as you said, this is just the beginning and there are a lot of interesting things for all of us ahead.
If we can do it together, I have no doubts will be a path full of success and fun!

Btw, I have already "post-cold" the cluster and it is now working in the "standard/default" behaviour (with the autoscaler "serving" nodes as the demand grows up to max 20 nodes per profile option). Currently, there are only 10 concurrent users (7 using the small instances, the rest using bigger ones) and I presume the number will go down as we enter the weekend.

Finally, we will be working soon on the transition to EKS and adding RTC capabilities, so stay tuned! 😉
And have a great weekend and some rest after a long week!!

damianavila · 2021-11-22T15:46:25Z

@betolink @erinmr and @jules32, we are trying to understand where we can improve our hub infrastructure and setup around events, and would love any feedback that you're willing to give. Would you mind answering the following questions? If not, just let us know and that is no problem!

Did the infrastructure behave as expected?
Anything that was confusing or could be improved?
Any extra functionality you wish you would have had?
Are you willing to share a story about how you used the hub?
Any other feedback that you'd like to share?

I know you already replied to some of these questions, but if you want to provide more context and/or answer the ones you did not reply to yet, we would greatly appreciate it!! Thanks!

betolink · 2021-11-22T17:00:42Z

Did the infrastructure behave as expected?

I think it did, we only had a minor hiccup at the clinic(before the hack week) because we were not expecting a lot of participants.

Anything that was confusing or could be improved?

I'm probably merging this and the next question... 2i2c hubs seem to follow the "right to reproduce" principles (which is great!) but some uses cases like ours require more flexibility for the user. i.e. we cannot dictate what libraries they want to use and having a restrictive base environment can be a blocker for some. We noticed that some of the participants wanted to have some control on what they installed and such. Our solution was to use a Pangeo-based image that could serve multiple environments. The idea was that if they needed to work with their environments they only had to open a PR to that project and the base image will update accordingly. However, our users are not software engineers, they are more comfortable just doing pip install or conda install and expect that what they do will be there the next time they use their instances. Following on the discussion at the beginning of this thread, maybe there is a middle ground for having a base image with a base environment and the ability to let the users persist their changes (with the warnings that they are altering the base environment etc etc).
Another thing we didn't use and I think it's available via Jupyterhub is letting the user pick a base image. This could be also useful for hack weeks.
RTC! this would be a super feature to have for next events, it seems like this is totally doable so I'm really excited about it.

Are you willing to share a story about how you used the hub?

I think so, one of the cool things about the hack week in 2i2c is that we started to get into Pangeo territory and just the fact that we don't have to worry about configuring Dask is worth telling. I bet @erinmr @jules32 will have more stories from the participants!

Any other feedback that you'd like to share?

I like having all the questions in GitHub for visibility and readability rather than emailing to 2i2c but I guess you have your logistics as well. I thank you all for your support

damianavila · 2021-11-24T11:45:32Z

Thanks for the additional feedback, @betolink! It is very useful for us!
Btw, I think the next steps are actually collected or referenced in other issues already linked here, so closing this issue now.

Thanks all for your contributions here!

choldgraf added the type: hub label Nov 5, 2021

choldgraf added the event label Nov 5, 2021

choldgraf mentioned this issue Nov 5, 2021

EVENT: OpenScapes NASA cloud hackathon in November #767

Closed

4 tasks

choldgraf mentioned this issue Nov 9, 2021

Document process for pre-warming a hub for an event #787

Open

4 tasks

sgibson91 assigned damianavila and GeorgianaElena Nov 10, 2021

GeorgianaElena mentioned this issue Nov 11, 2021

Deploy support chart for the openscapes hub #827

Closed

choldgraf mentioned this issue Nov 13, 2021

Add an events guide 2i2c-org/docs#112

Merged

damianavila mentioned this issue Nov 15, 2021

Team Sync - Monday, November 15th 2i2c-org/team-compass#301

Closed

betolink mentioned this issue Nov 22, 2021

Real Time Collaboration pangeo-data/pangeo-docker-images#270

Closed

damianavila mentioned this issue Nov 23, 2021

Team Sync - Monday, November 22nd 2i2c-org/team-compass#310

Closed

damianavila closed this as completed Nov 24, 2021

damianavila mentioned this issue Nov 30, 2021

Team Sync - Monday, November 29th 2i2c-org/team-compass#319

Closed

[EVENT] OpenScapes NASA Cloud Hackathon #810

[EVENT] OpenScapes NASA Cloud Hackathon #810

Comments

choldgraf commented Nov 5, 2021 • edited by damianavila Loading

Summary

Info

Task list

choldgraf commented Nov 5, 2021

jules32 commented Nov 5, 2021

betolink commented Nov 6, 2021

choldgraf commented Nov 9, 2021 • edited Loading

betolink commented Nov 9, 2021

jules32 commented Nov 9, 2021

choldgraf commented Nov 9, 2021

betolink commented Nov 9, 2021

erinmr commented Nov 9, 2021

choldgraf commented Nov 9, 2021

erinmr commented Nov 10, 2021 via email

damianavila commented Nov 11, 2021

erinmr commented Nov 11, 2021 via email

damianavila commented Nov 11, 2021

damianavila commented Nov 11, 2021 • edited Loading

damianavila commented Nov 11, 2021 • edited Loading

damianavila commented Nov 11, 2021 • edited Loading

damianavila commented Nov 11, 2021

betolink commented Nov 12, 2021

betolink commented Nov 12, 2021

damianavila commented Nov 12, 2021 • edited Loading

erinmr commented Nov 12, 2021 via email

damianavila commented Nov 15, 2021 • edited Loading

damianavila commented Nov 15, 2021

betolink commented Nov 15, 2021

damianavila commented Nov 15, 2021

betolink commented Nov 15, 2021

jules32 commented Nov 15, 2021 via email

betolink commented Nov 15, 2021

damianavila commented Nov 15, 2021 • edited Loading

betolink commented Nov 15, 2021

damianavila commented Nov 16, 2021

damianavila commented Nov 16, 2021

betolink commented Nov 16, 2021

damianavila commented Nov 16, 2021

betolink commented Nov 17, 2021

damianavila commented Nov 18, 2021

consideRatio commented Nov 18, 2021

damianavila commented Nov 19, 2021

betolink commented Nov 19, 2021

jules32 commented Nov 19, 2021

choldgraf commented Nov 19, 2021

erinmr commented Nov 19, 2021 via email

damianavila commented Nov 20, 2021

damianavila commented Nov 22, 2021

betolink commented Nov 22, 2021

damianavila commented Nov 24, 2021

choldgraf commented Nov 5, 2021 •

edited by damianavila

Loading

choldgraf commented Nov 9, 2021 •

edited

Loading

damianavila commented Nov 11, 2021 •

edited

Loading

damianavila commented Nov 11, 2021 •

edited

Loading

damianavila commented Nov 11, 2021 •

edited

Loading

damianavila commented Nov 12, 2021 •

edited

Loading

damianavila commented Nov 15, 2021 •

edited

Loading

damianavila commented Nov 15, 2021 •

edited

Loading