Skip to content
This repository has been archived by the owner on Feb 3, 2021. It is now read-only.

Feature: Support passing of remote executables via aztk spark cluster submit #549

Merged
merged 3 commits into from
May 24, 2018

Conversation

lachiemurray
Copy link
Contributor

@lachiemurray lachiemurray commented May 10, 2018

fix #536

This change would allow the user to submit an app hosted at a remote location:

aztk spark cluster submit [...] wasbs://path@remote/location.jar --skip-app-upload

Or a location local to the docker container:

aztk spark cluster submit [...] /local/to/docker_container.jar --skip-app-upload

As well as supporting the current behaviour where the app is uploaded automatically

aztk spark cluster submit [...] /local/absolute_path.jar

The change involves moving the appending of "$AZ_BATCH_TASK_WORKING_DIR" to the application path to the client and only doing so if the "--skip-app-upload" flag is not set (note: the env variable is not resolved at this point since it is not known to the client). On the node, the now path is now un-altered.

Tested manually by:

  • Submitting a job without the --skip-app-upload flag (current behaviour) ✔️
  • Submitting a job with the --skip-app-upload flag and using a wasb path to a remotely hosted jar ✔️
  • Submitting a job with the --skip-app-upload flag and using a path to a jar that was copied to all nodes using aztk spark cluster copy ✔️

Seems like the integration tests are broken at the moment so haven't added any automated tests yet, happy to though if required.

spark_submit_cmd.add_argument(
os.environ['AZ_BATCH_TASK_WORKING_DIR'] + '/' + app + ' ' +
' '.join(['\'' + str(app_arg) + '\'' for app_arg in (app_args or [])]))
spark_submit_cmd.add_argument(app + ' ' + ' '.join(['\'' + str(app_arg) + '\'' for app_arg in (app_args or [])]))
Copy link
Contributor Author

@lachiemurray lachiemurray May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently just letting the $AZ_BATCH_TASK_WORKING_DIR be expanded when the spark submit command is executed but could be more explicit and do it here with os.path.expandvars. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that could be a good idea if we also log that command for debugging purposes

@timotheeguerin timotheeguerin changed the title Issue 536 - Support passing of remote executables via aztk spark clus… Feature: Support passing of remote executables via aztk spark cluster submit May 10, 2018
@timotheeguerin
Copy link
Member

timotheeguerin commented May 10, 2018

So i am wondering if this is the right way of doing it. Maybe we should make it the other way in that you can give the path to a remote location or an existing path on the node(exe needs to be there) but if you want to upload your local file you provide the flag

I feel like --skip-app-upload flag is a bit of a hack/workaround

Maybe have something of this type

  • aztk spark cluster submit wasbs://path@remote/location.jar For a remote location
  • aztk spark cluster submit /full/path/to/file Where you can use AZTK_WORKING_DIR and other things
  • aztk spark cluster submit --local path/to/my-local/exe Here the exe gets uploaded automatically

@lachiemurray
Copy link
Contributor Author

Yes, I think I agree, but it's obviously more of a breaking change using "--local". I wonder whether it's a lot to do with the naming that makes it seem hacky though. Maybe "--remote" would seem more natural?

Happy to give "--local" a go or rename to "--remote", just let me know.

Do you have any advise regarding testing? Should I try to write an integration test despite them not running on travis at the moment?

@timotheeguerin
Copy link
Member

Yeah I think --remote makes more sense.
That might actually be better than using --local as I don't like the idea that you have to provide more settings when you are trying to get started and upload one of the samples or any local files

Will talk to @jafreck

@jafreck
Copy link
Member

jafreck commented May 10, 2018

I agree with @lachiemurray here in regards to making local files default for cluster submission. I think that --remote is a better naming than --skip-app-upload as well.

Thanks for doing this @lachiemurray!

@timotheeguerin
Copy link
Member

okok lets rename to --remote, could you also provide docs for this flag in 20-spark-submit.md

@lachiemurray
Copy link
Contributor Author

Thanks both! Will make the changes tomorrow

@lachiemurray
Copy link
Contributor Author

Renamed to remote and updated the docs, can you take another look @timotheeguerin @jafreck?

Copy link
Member

@timotheeguerin timotheeguerin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested and this seems to be working great

@timotheeguerin timotheeguerin merged commit f6735cc into Azure:master May 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support passing of remote executables via aztk spark cluster submit
3 participants