Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add S3 bucket connectivity or integration into SageMaker Studio Lab #178

Open
taureandyernv opened this issue Dec 9, 2022 · 5 comments
Open
Labels
enhancement New feature or request

Comments

@taureandyernv
Copy link

Is your feature request related to a problem? Please describe.
I'd like to be able to temporarily connect different S3 buckets to SageMaker Studio Lab (SMSL) for either extra library storage or dataset storage. I'm currently having trouble installing the latest RAPIDS version on SMSL due to running out of space, and have resorted to deleting datasets that take a long time to download. I even have to delete the zip files. The downloads, conda environment recreation, and failed installs take up precious GPU time that is already hard to come by.

Describe the solution you'd like
I'd like to be able to connect an s3 bucket to SMSL with the conda envs and data ready to go.

Describe alternatives you've considered

  • Deleting and redownloading the data needed.
  • Removing other conda environments not currently being used and reinstalling them when they are needed again
  • using Pip install packages

Additional context
Some additional libraries for GPU deep learning and machine learning take up a bit of room when downloading and expanding using conda.

@taureandyernv taureandyernv added the enhancement New feature or request label Dec 9, 2022
@taureandyernv taureandyernv changed the title Add S3 bucket connectivity to SageMaker Studio Lab Add S3 bucket connectivity or integration into SageMaker Studio Lab Dec 9, 2022
@icoxfog417
Copy link
Contributor

icoxfog417 commented Dec 13, 2022

@taureandyernv , thank you for the feedback! I understood to evacuate existing conda environment and dataset and back these after the operation is painful.

I felt "mount S3 to Studio Lab" experience will be what you need. Does goofys, s3fs match your need? (For now, we can not install it because apt install is required now.)

@MicheleMonclova
Copy link

@taureandyernv, as your ML experiments get too big for Studio Lab you may want to consider launching your notebooks in Sagemaker Studio. We tried to make this easy to do with a new feature called Notebook jobs (just released last December 22). With Notebook jobs, you can work on your notebook in Studio Lab but then schedule it to run in SageMaker (you will need an AWS account). The job will kick off and shut down when complete. Yes, there will be some cost, but depending on the instance type you select it may be negligible.
Check out this blog here and let me know what you think: https://aws.amazon.com/blogs/machine-learning/run-notebooks-as-batch-jobs-in-amazon-sagemaker-studio-lab/

@fkunn1326
Copy link

Are there any plans to allow mounting of external drives such as S3 in Studio Lab?
I am a student and cannot use Sagemaker.

@icoxfog417
Copy link
Contributor

icoxfog417 commented Jan 9, 2023

@fkunn1326 thank you for the comment. For now, we have not had the specific plan to mount the S3. You aren't allowed to use SageMaker in your school AWS account?

@icoxfog417
Copy link
Contributor

I think installing Mountpoint S3 to Studio Lab will work (it requires sudo so users can not install for now).
https://github.com/awslabs/mountpoint-s3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants