Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Hub: Justice Innovation Lab #288

Closed
7 tasks done
choldgraf opened this issue Mar 4, 2021 · 27 comments
Closed
7 tasks done

New Hub: Justice Innovation Lab #288

choldgraf opened this issue Mar 4, 2021 · 27 comments

Comments

@choldgraf
Copy link
Member

choldgraf commented Mar 4, 2021

Background

The Justice Innovation Lab is a non-profit law+data organization. They have credits on Azure, and

Setup Information

Important Information

Notes

This hub originally began on our Pilot cluster on GCP. However, they have data that exists on Azure (as well as credits) so we need to move their hub over to Azure. This might require a few steps first, so tracking this in the to-do item below.

Deploy To Do

  • Initial Hub deployment
  • Move the hub to Azure
    • Confirm that we have got the correct Administrative access to the hub
    • ...figure out next steps
  • Administrators able to log on
  • Community Champion satisfied with hub environment
  • Hub now in steady-state

cc @yuvipanda who already deployed this hub. However, we need to move it to Azure!

@yuvipanda
Copy link
Member

There's a pilot running they are using right now, but we should move to Azure pretty soon. Should we wait for a contract, @choldgraf?

@choldgraf
Copy link
Member Author

Hmmm, I think there are two main questions:

  1. How much pain will it be to switch to Azure in the future, if we continue down the path of not using Azure?
  2. How much would we learn / improve from switching this hub to Azure right now?

What do you think?

@yuvipanda
Copy link
Member

Moving hub to Azure shouldn't be a problem, since we'll just have to move the hub db sqlite file and home directories.

We currently have experience running Azure clusters (https://github.com/utoronto-2i2c/jupyterhub-deploy/tree/staging/terraform) and we should try porting it. However, I think I'd like to prioritize new development right now, if that's ok with the JIL folks. We can come back to this in a couple weeks (or once we have more hires?)

@choldgraf
Copy link
Member Author

Note - I've opened up #373 to track adding Azure deployment infrastructure to our pilot-hubs/ repository. I think that this will need to be resolved before we can migrate the JIL hub to Azure.

@yuvipanda
Copy link
Member

@JILPulvino if I remember correctly, you have azure credits still. What would be next steps for us to get access to those so we can start working on getting a hub there?

@JILPulvino
Copy link

@yuvipanda sorry I went off on this, we can add you as an admin to our Azure account so that you can transfer us to our Azure account. Please just let me know what email address you'd like me to use in adding you.

One reason that we are interested in this as well is that the kernel often dies when we are doing anything that might be considered memory intensive - most recently that was loading a file of a million rows of data. Not sure what on the fly scaling would look like in a jupyterhub, but would love to talk that over with you.

@yuvipanda
Copy link
Member

@JILPulvino can you add yuvipanda@2i2c.org? We can start figuring it out from there.

@JILPulvino
Copy link

Added you as a guest user. I need to figure out what roles to assign you too. Would you happen to know what resources in Azure you'll need access to e.g. global administrator, though that would be the broadest.

@yuvipanda
Copy link
Member

@JILPulvino yeah, can we start with the broadest role and we can go from there?

@JILPulvino
Copy link

JILPulvino commented Jun 12, 2021 via email

@JILPulvino
Copy link

@yuvipanda do you have availability next week to discuss the global admin role?

@yuvipanda
Copy link
Member

@JILPulvino i just responded to your email. Sorry about the delay

@choldgraf
Copy link
Member Author

@JILPulvino recently re-worked the permissions on their Azure project to give us administrative privileges. He asked if we can confirm that we have this access. I added an item to our to-do list above.

@yuvipanda
Copy link
Member

I've updated #373 and opened #512 for more actionable tasks. Completing those should make the hub go live! It'll be deployed from this repo.

@JILPulvino
Copy link

JILPulvino commented Jul 13, 2021 via email

@yuvipanda
Copy link
Member

@JILPulvino we have something for you to try! Take a look at https://justiceinnovationlab.2i2c.cloud? It's using the image specified in https://github.com/2i2c-org/justiceinnovationlab-image, and we should be able to figure out a way to let you update it without needing intervention from us. This infra isn't final yet, but try it out and let me know what you think?

@JILPulvino
Copy link

@yuvipanda Thank you! The log in works, but when trying to start a notebook it doesn't find a kernel and there is not the same shared folder as in the other hub.
And, yes, that would be great for us to be able to make updates to the image at some point.

@yuvipanda
Copy link
Member

@JILPulvino yeah, shared folders aren't there yet but will be soon. I'll work on the kernel issue as well.

https://github.com/2i2c-org/justiceinnovationlab-image has instructions on how to make changes to the image and test it. Wanna give that a shot in the meantime?

@yuvipanda
Copy link
Member

yuvipanda commented Jul 28, 2021

@JILPulvino the kernel is working now, no shared dir yet. At some point, we'll just move over the shared directories from the old hub to the new hub, and then decomission the old one?

TODOs left:

  • Get shared directory working
  • Check the image, make sure it has things you need
  • Validate that authentication is ok
  • Configure resources as you wish (you can get more RAM / CPU now if you like!)
  • Setup grafana / prometheus for monitoring
  • Move shared dir (and maybe home dirs?) from old cluster to new

Try it out now and let me know what you think?

@JILPulvino
Copy link

JILPulvino commented Jul 28, 2021

Great, thank you! I don't think the shared directory is a priority as we'll be storing everything elsewhere in Azure.

I launched the hub and it looks good. A couple of questions:

  1. Where new packages are needed, what is the best practice still for adding them to the image?
  2. Will it be possible to an R kernel (it'd be great to have a 3.6 and a 4.0) and an R studio option?
  3. Is there documentation on how to increase the compute power? Perhaps this is related to the adding named servers option in the Hub Control Panel?
  4. In the hub control panel, the 'admin' tab no longer appears so I'm not quite sure how to add users?

@yuvipanda
Copy link
Member

@JILPulvino I've now made you admin again, so you should be able to add users.

If you have a sense of what kinda options for resources you might want, we can make that happen. Users will see something like this:
image

And they can pick the size they want. Aligning those to individual VM sizes on Azure is probably a good way to think about this. Each user gets the resource equivalent of an entire VM. https://azure.microsoft.com/en-in/pricing/details/virtual-machines/linux/ has the list of available VM sizes. I'd suggest looking in the A, D, and maybe E series for inspiration.

I'll think about the other questions and get back to you.

@JILPulvino
Copy link

JILPulvino commented Jul 29, 2021

@yuvipanda Thank you, this is great. With regards to our requirements on this end, I think generally people will need just 16 GB of RAM - I don't think we'd want less than that ever. The number of CPUs matters less to us, but I imagine scaling something like 2 CPU, 16 RAM; 4 CPU, 32 RAM; 8 CPU, 64 RAM; and possibly a very large option of 32 CPU and 256 RAM.

I also noticed that the hub doesn't automatically spin down?

@yuvipanda
Copy link
Member

@JILPulvino I've provided 4 options for CPU / RAM sizes. Take a look and see if that works for you?

I also noticed that the hub doesn't automatically spin down?

Are you talking about your user session? Or the whole hub in general?

@JILPulvino
Copy link

@yuvipanda Those look good and are working great.

As for the spinning down, I'm not quite sure which I mean actually, but I noticed that if I close the browser and leave for a day, when I go back to the site, it was still running. This doesn't seem to be happening now though.

@JILPulvino
Copy link

@yuvipanda a few additional questions.

  1. Can you start to include @lilygrierjil on these messages/add her to the repo? She is our new data engineer and is also working on this project.
  2. Where should we make requests for additional packages to be added to the hub?
  3. Can you point me to where I can see the code for the docker container/terraform/requirements.txt etc. for this hub? We are still working on our own container and we are hoping to use it as an example/work to modify this one on our own.

@yuvipanda
Copy link
Member

Welcome, @lilygrierjil!

@JILPulvino the image is built off https://github.com/2i2c-org/justiceinnovationlab-image, and there are instructions in the README there on how you can add / remove packages, test them, and make the hub use them without any involvement from us! Can you give it a shot?

@choldgraf
Copy link
Member Author

Hey all - since this one has been running for a couple months now, I'd like to close this issue and consider the hub "deployed", with the knowledge that we may need to upgrade and improve things but it'll be better to do this via new support requests rather than doing it all in this mega-issue.

I'll close this, but if others object don't hesitate to suggest another approach!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants