Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workshop setup for friday including data, ea-environment and a working space #37

Closed
lwasser opened this issue Jul 17, 2018 · 16 comments
Closed
Assignees
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@lwasser
Copy link

lwasser commented Jul 17, 2018

For our workshop, we will need a similar setup to what a student would need.

  1. the students will need a semi persistent (for the time of the workshop) working area where they can export files
  2. the students will need access to 1 or more datasets (on figshare) that will be setup for them in a drive on their computer
  3. the students will need access to the full earth analytics python environment that we have build

Questions

  1. how do students log in to the environment? i'm fine with the way we discussed!!
  2. How do i setup the data so it's uniform for them? ie i'll need to know what the path is to the data for them to access it.

@betatim do we have this in place to use for friday? or what do you need to get things going so we are ready? thank you!!

@lwasser lwasser added help wanted Extra attention is needed question Further information is requested labels Jul 17, 2018
@betatim
Copy link
Collaborator

betatim commented Jul 17, 2018

For (1) is it enough if users have a home directory as part of their jupyterhub login?

For (2) is this a dataset that is listed in https://github.com/earthlab/earthpy/blob/master/earthpy/io.py? If it isn't part of earthpy we could add it to the docker image directly.

I started using (3) on https://hub.earthdatascience.org/earthhub yesterday, needs some testing to be sure but should be ready.

@lwasser
Copy link
Author

lwasser commented Jul 17, 2018

@betatim thank you.

  1. I'd like to see how this works but if the student have a home directory that has space to write data to. (write access and space allocation) that should be sufficient!
  2. The data are for friday listed in : 'spatial-vector-lidar': ('https://ndownloader.figshare.com/files/12396203', '.', 'zip') <- this is the line in the dictionary. What i don't understand is i want that data to be in their home directory for them to use. i don't understand how we get it there! it may just be me not understanding how this is setup :) will i need to show them how to download the data once they login? AND if they do download it - what happens if they logout and then log back into OR get disconnected? does their workspace persist?

THANK YOU!!

@betatim
Copy link
Collaborator

betatim commented Jul 17, 2018

The workspace persists.

I'll create a workshophub just for the workshop tomorrow morning and add this file to the image so the data is there ready to use for the students.

What system should we use for authentication? Do they all have a UC Boulder email address? If yes we can use that for logins.

@lwasser
Copy link
Author

lwasser commented Jul 17, 2018

ok. ideally for future setup it would be easy for me to add a few datasets to a "hub" space! maybe we can chat more tomorrow?

We will have a mix of students potentially. There may be some without CU authentication as we had two drop in yesterday that are not in the system yet. Could a CSV approach work potentially?

@betatim
Copy link
Collaborator

betatim commented Jul 18, 2018

If there is a mixture I'd go for https://github.com/yuvipanda/jupyterhub-firstuseauthenticator with a white list.

Should we also have the material of the course directly available in people's home directory?

@betatim
Copy link
Collaborator

betatim commented Jul 18, 2018

Workshop starts at 9am Boulder time.

@betatim
Copy link
Collaborator

betatim commented Jul 19, 2018

Using https://github.com/thedataincubator/jupyterhub-hashauthenticator at the moment which does not seem to support whitelists. Maybe worth fixing at some point.

@betatim
Copy link
Collaborator

betatim commented Jul 20, 2018

Added a second nodepool with two machines that each can handle around 14 students. The node pool is allowed to scale up to 4 machines so we should have enough resources even for last minute arrivals.

  • turn off woskhop-pool again after the workshop to save money

@betatim
Copy link
Collaborator

betatim commented Jul 20, 2018

Collecting to do items and learnings from the workshop:

  • kernel died a few times, probably because using too much RAM, current limit was set by guessing a not-crazy number, what is a better way to determine this number?
  • data copying was broken (see Make sure target directory exists before rsync'ing #58), next time run through process with completely fresh users to catch problems that only show up on first-use
  • right at the start of the workshop (19min in) there was a k8s master update which caused 3minutes of outage. Why did this upgrade happen then?

@lwasser
Copy link
Author

lwasser commented Jul 20, 2018

NOTE: the kernel is consistently dying on the show_hist function using rasterio! so this is definitely a package issue but the question is what causes it to die? maybe memory hog?

@betatim
Copy link
Collaborator

betatim commented Jul 20, 2018

screen shot 2018-07-20 at 18 31 17

From mid workshop. This shows actual amount of memory and CPU used. Not how much memory/CPU we promised to each pod. This would suggest we can increase the memory limit beyond 2G a bit without having to pay more money for bigger machines. It seems most people, most of the time aren't using all the memory we assign them. Brief peaks for show_hist?

We can definitely increase the CPU limit as people mostly idle.

Before giving away more RAM we need to make sure that all the core services specify how much RAM they need so that they get protected by kubernetes.

@betatim
Copy link
Collaborator

betatim commented Jul 26, 2018

@lwasser was their any promise to participants how long their home directories will be available? Otherwise I'd clean (aka delete) them as part of shutting down the workshop cluster.

@lwasser
Copy link
Author

lwasser commented Jul 26, 2018

No! i told them to download their files as it would all disappear. we can clean it. i'd like to know how to do that. I also have questions post wowrkshop

  1. data -- do the data update now ? so if i add a new folder to a dataset, it will update even if the folder already exists?
  2. if the folder doesn't exist will it create it ? (ie @joemcglinchy didn't have data i think because his data didn't update

@betatim
Copy link
Collaborator

betatim commented Jul 30, 2018

Hub has been turned off via #67 and following documentation in https://earthlab-hub-ops.readthedocs.io/en/latest/day-to-day.html#removing-a-hub

@betatim
Copy link
Collaborator

betatim commented Jul 30, 2018

The answer to @lwasser questions is: yes and yes.

If the data in the docker image change that will be reflected in the user's home directory (once they stop&start their server to pickup the new image).

Previously there was a bug that it would not update if some of the directory existed. This should be fixed now.

@betatim
Copy link
Collaborator

betatim commented Aug 6, 2018

Closing this as the event is over and most of the action items from lessons learnt have been done.

@betatim betatim closed this as completed Aug 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants