-
Notifications
You must be signed in to change notification settings - Fork 66
Feature: Support passing of remote executables via aztk spark cluster submit #549
Conversation
aztk/node_scripts/submit.py
Outdated
spark_submit_cmd.add_argument( | ||
os.environ['AZ_BATCH_TASK_WORKING_DIR'] + '/' + app + ' ' + | ||
' '.join(['\'' + str(app_arg) + '\'' for app_arg in (app_args or [])])) | ||
spark_submit_cmd.add_argument(app + ' ' + ' '.join(['\'' + str(app_arg) + '\'' for app_arg in (app_args or [])])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently just letting the $AZ_BATCH_TASK_WORKING_DIR be expanded when the spark submit command is executed but could be more explicit and do it here with os.path.expandvars. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that could be a good idea if we also log that command for debugging purposes
So i am wondering if this is the right way of doing it. Maybe we should make it the other way in that you can give the path to a remote location or an existing path on the node(exe needs to be there) but if you want to upload your local file you provide the flag I feel like Maybe have something of this type
|
Yes, I think I agree, but it's obviously more of a breaking change using "--local". I wonder whether it's a lot to do with the naming that makes it seem hacky though. Maybe "--remote" would seem more natural? Happy to give "--local" a go or rename to "--remote", just let me know. Do you have any advise regarding testing? Should I try to write an integration test despite them not running on travis at the moment? |
Yeah I think Will talk to @jafreck |
I agree with @lachiemurray here in regards to making local files default for cluster submission. I think that Thanks for doing this @lachiemurray! |
okok lets rename to |
Thanks both! Will make the changes tomorrow |
Renamed to remote and updated the docs, can you take another look @timotheeguerin @jafreck? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tested and this seems to be working great
fix #536
This change would allow the user to submit an app hosted at a remote location:
aztk spark cluster submit [...] wasbs://path@remote/location.jar --skip-app-upload
Or a location local to the docker container:
aztk spark cluster submit [...] /local/to/docker_container.jar --skip-app-upload
As well as supporting the current behaviour where the app is uploaded automatically
aztk spark cluster submit [...] /local/absolute_path.jar
The change involves moving the appending of "$AZ_BATCH_TASK_WORKING_DIR" to the application path to the client and only doing so if the "--skip-app-upload" flag is not set (note: the env variable is not resolved at this point since it is not known to the client). On the node, the now path is now un-altered.
Tested manually by:
aztk spark cluster copy
✔️Seems like the integration tests are broken at the moment so haven't added any automated tests yet, happy to though if required.