Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add support for requirements.txt in ScriptProcessor similar to other "Script Mode" parts of the SageMaker Python SDK #1248

Open
cfregly opened this issue Jan 19, 2020 · 17 comments
Labels
component: processing Relates to the SageMaker Processing Platform type: feature request

Comments

@cfregly
Copy link

cfregly commented Jan 19, 2020

Please add support for requirements.txt in ScriptProcessor similar to other "Script Mode" parts of the SageMaker Python SDK where I can specify source_dir

@ajaykarpur
Copy link
Contributor

Hi Chris, thanks for your suggestion. I've added it to our backlog.

As a workaround, you can provide a shell script containing pip install commands. (You'll want to call your python script at the end of this shell script.)

@sam-cohan
Copy link

@ajaykarpur The problem is that the ScriptProcessor only takes a single file as argument not a source_dir, so you cannot include a directory with your python source file, so the workaround does not really work around the problem.

@sam-cohan
Copy link

As a workaround, we ended up using the SklearnProcessor which actually takes a python script. The python script gets access to a packaged version of our code which gets downloaded using the ProcessingInput mechanism and installs it and runs the entrypoint. It works, but it was too much effort for something that should be builtin IMHO.

@josiahdavis
Copy link

josiahdavis commented Oct 23, 2020

Hi @ajaykarpur completely agree with the prior comments about the importance and usefulness of allowing processing to use a requirements file.

Thank you!

@oberserk
Copy link

I think it is important feature that SKLearnProcessor takes multiple python files.

@verdimrc
Copy link
Contributor

Hi, I want to share an experimental / stop-gap work called FrameworkProcessor, to simplify submitting a Python processing job with requirements.txt, source_dir, dependencies, and git_config, using SageMaker framework training containers (i.e., tf, pytorch, mxnet, xgboost, and sklearn).

It aims to give you familiar workflow of (1) instantiate a processor, then immediately (2) call the run(...) method.

Here's an example how to use this FrameworkProcessor class (right now as Python script as opposed to .ipynb). Then, run that Python example using this shell script, but you must first change the S3 prefix and execution role, then optionally choose your prefered container.

It slightly changes the processing API by adding a SageMaker Framework estimator, which was done for two purposes: (1) auto-detect container uri, and (2) re-use the packaging mechanism in the estimator to upload to s3://.../sourcedir.tar.gz.

So far it works for my cases, but more testings or bug reports are welcome.

HTH.

@jonathanglima
Copy link

any news on this?

@iCHAIT
Copy link

iCHAIT commented Dec 7, 2021

Is there an update on this?

@MatthewCaseres
Copy link

Right now I am just using the processors inheriting from FrameworkProcessor (PyTorch, not SKLearn) when I need to use extra files.

I wish I could just use docker containers from docker hub, I don't understand the need for 4 or 5 functions with similar names and features.

@dlaredo
Copy link

dlaredo commented Sep 20, 2022

No news on this one yet? I have several customers asking me how to do it and they really don't like the workarounds

@clausagerskov
Copy link

@ajaykarpur

@curt-lockhart
Copy link

Hi Team,

I have customers asking about how to do this without workarounds. Is this doable/has this been released?

@ca9071jp2
Copy link

any new regarding this 3 years later?

@martinRenou martinRenou added the component: processing Relates to the SageMaker Processing Platform label Sep 27, 2023
@j-adamczyk
Copy link

Any news on this? It's absurd that for data preprocessing, which requires much more 3rd party libraries than training, we cannot easily install additional ones, whereas the option is available for estimators. It's literally already there in estimators, why couldn't this be added to processors for well over 3 years?

@david-waterworth
Copy link

@j-adamczyk have you tried looking at 'FrameworkProcessorinstead ofScriptProcessor` - i.e. https://stackoverflow.com/a/74551264/2981639

@francisco-camargo
Copy link

Eager to hear an update on this!

@clausagerskov
Copy link

well then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: processing Relates to the SageMaker Processing Platform type: feature request
Projects
None yet
Development

No branches or pull requests