Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demography Workshop Datahub Request #5643

Open
wrathofquan opened this issue Mar 27, 2024 · 10 comments
Open

Demography Workshop Datahub Request #5643

wrathofquan opened this issue Mar 27, 2024 · 10 comments
Assignees

Comments

@wrathofquan
Copy link

wrathofquan commented Mar 27, 2024

Summary

We have two workshop events coming up in the Demography/Population sciences department and are looking at using the workshop hub or potentially another dedicated hub.

  • The first event is a workshop/hacakthon type event that actually will run for the entire summer and involves Berkeley faculty, graduate students and post docs. It's a data challenge called the predicting fertility data challenge. It begins April 1 and expect about 5-6 users training ML models in python and R. Since this is a longer running event (continues through summer), I'm not sure if the workshop hub is suitable but wanted to check in to see if you have suggestions or alternatives.

  • This year the Demography department will again host a week-long workshop, June 3-7 on statistical methods in June with researchers visiting from all over the world. We used workshop hub last year with great success and hope to use it again! Compute needs will be same as last year - 4GB of RAM per user. We expect 20-40 users.

User Stories

Workshop instructors and faculty can have all of their instructional materials in a Datahub that is consistent across users.
Attendees won't have to worry about managing their own compute environments.

Acceptance criteria

  • For the data challenge event, users with a calnet can navigate to datahub and use python/r to train ml models. Users will be using same datasets so a shared-read-write directory would be required.

  • For the June workshop event, users can navigate to something like: workshop.datahub.berkeley.edu, authenticate without calnet (and potentially manage their own credentials), access the workshop files in RStudio or native R kernel.

Important information

Data challenge

  • The data challenge (April 1 - September) will have about 5-6 users each needed about 4GB of RAM and a shared-read-write to work on shared datasets that exceed storage capacity in github.

Demography workshop

  • This workshop is in June (3-7) and we expect somewhere between 20-50 users
  • We'd like to have a 4GB allocation of RAM per user.
  • We'd want to keep the pods alive for a few weeks after the workshop to give users a chance to save their work.

Tasks to complete

@balajialg
Copy link
Contributor

balajialg commented Mar 27, 2024

@wrathofquan Thanks, Responding to the time sensitive request first - Yes, students can use the workshop hub to do their work. We can enable shared-readwrite directories so that they can store their datasets. Is it possible to create a dummy bcourses site, add all the participants of the data challenge to that site and share the bcourses id?

You might need to assign Teacher/TA role for the folks who will have read/write access and student role for the folks who need read access to the shared drive. We can enable access to the drive using the shared bcourses id in the workshop hub.

@wrathofquan
Copy link
Author

Thank you @balajialg. bcourses id: 1534506

@balajialg
Copy link
Contributor

balajialg commented Mar 28, 2024

@wrathofquan Hi Josh, changes are merged to staging hub. and should be deployed within the next 60 minutes. Can you ask your students added to the bcourses site to check whether they can see a) shared-readwrite directory and b) RAM increase to 4 GB in the staging hub?

@balajialg
Copy link
Contributor

balajialg commented Mar 28, 2024

@wrathofquan I just updated the configs in Datahub via this PR. Changes were merged to prod 10 mins ago. You should be able to test this in https://datahub.berkeley.edu/ in an hour

@shaneknapp
Copy link
Contributor

shaneknapp commented Mar 28, 2024 via email

@wrathofquan
Copy link
Author

wrathofquan commented Apr 3, 2024

@balajialg Would it be possible to increase RAM to 8GB? We received the assignment datasets today and they are larger than expected.

Few users have reported their sessions crashing when reading in data and I can confirm that reading in a 300MB csv eats almost half of available RAM
image

Thank you for considering!

@balajialg
Copy link
Contributor

@wrathofquan Yes, we should be able to increase the RAM to 8 GB. Here is the PR

Having said that, I also want to highlight that an increase in RAM is correlated with an increase in cloud costs at our end. So, any further request to increase RAM would be something @shaneknapp and I might need to review in future.

@balajialg
Copy link
Contributor

balajialg commented May 30, 2024

@wrathofquan I have increased the Workshop Hub RAM to 4 GB. Please inform me if you encounter any discrepancies. When do you recommend reverting the RAM increase? I will schedule it to be reduced accordingly.

@wrathofquan
Copy link
Author

Thank you @balajialg! Everything looks great. Can we drop it back down on Monday June 10?

@balajialg
Copy link
Contributor

Sounds good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants