Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyconcrete for submitting spark job #69

Open
albertusk95 opened this issue Dec 4, 2019 · 3 comments
Open

pyconcrete for submitting spark job #69

albertusk95 opened this issue Dec 4, 2019 · 3 comments

Comments

@albertusk95
Copy link

Hi,

I recently used pyconcrete to obfuscate pyspark codes. To run a spark job on a cluster, we need to use spark-submit command. So it would look like spark-submit job.py.

The concern here is that spark-submit seems to only accept .py extension in order for it to work. Since pyconcrete generates .pye files, I didn't find any way to run the encrypted files via spark-submit.

Is there a way to run encrypted files generated by pyconcrete with spark-submit?

Thank you.

@Falldog
Copy link
Owner

Falldog commented Dec 5, 2019

pyconcrete need binary .so, does spark-submit package your source code and upload to cloud for running? if yes, you need cross-compile pyconcrete.so first. And then you could run pyconcrete as library, try to build your code as .egg, spark seems allow you submit .egg, maybe it should work. Give it a shot.

@albertusk95
Copy link
Author

albertusk95 commented Dec 6, 2019

Already tried build code as .egg along with the driver program. But spark couldn't find the main class.

It seems that .egg files are only used as dependencies. spark-submit still needs the driver code in .py. So it would look like this: spark-submit --py-files path/to/file.egg driver.py.

According to the doc itself,

For Python applications, simply pass a .py file in the place of <application-jar> instead of a JAR, 
and add Python .zip, .egg or .py files to the search path with --py-files.

@Falldog
Copy link
Owner

Falldog commented Feb 17, 2022

Can you provide more information? Maybe it's spark-sumit issue, not pyconcrete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants