-
Notifications
You must be signed in to change notification settings - Fork 739
Config prefix update to avoid -D option conflicts in Spark. #790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@rohit-nlp I know you already fixed this in the Enterprise, do we really have to change everything in the public as well? That validation code in Apache Spark has been there for the past 4 years, does this mean that none of our configs has ever worked? Or this is just to set them with -D or spark config? |
|
@maziyarpanahi our configs always worked because we used -DconfigFile= and the settings was loaded using typeconfig and not spark. but when you pass some parameter of the configuration not through properties or application conf file but directly as java property such as -Dsparknlp.something then the spark code reading these params will throw an exception at this line. now as the code looks for "-Dspark" keyword -Dsparknlp will also get caugth in this. |
|
@rohit-nlp So there is no need to rename anything here as users are supposed to pass those through DconfigFile or application.conf? |
|
It comes down to whether or not you want to support the The main advantage of will fail in cluster mode as JSL's access key validation occurs in the driver. (For local deployments |
|
I agree with @mptrepanier though there is a workaround with application conf we should support -D option directly for all properties too. Regarding --files not working in cluster mode is really surprising for me.. Though I did not get time to set up a yarn cluster and check the issue. @maziyarpanahi we have already changed the property in Enterprise from sparknlp.settings.* to jsl.settings.* we can do the same change in public to be consistent. |
|
That makes sense, @rohit-nlp do you approved this PR then? |
@rohit-nlp unable to replicate this. |
Description
Prefixes the Spark NLP settings with
jsl..Motivation and Context
The current settings implementation of
sparknlp.settings. ...leads to an exception being thrown by Spark'svalidateSettingswhen the Spark NLP configuration is provided as a-Doption in eitherspark.driver.extraJavaOptionsorspark.executor.extraJavaOptions, as shown here:https://github.com/apache/spark/blob/d8613571bc1847775dd5c1945757279234cb388c/core/src/main/scala/org/apache/spark/SparkConf.scala#L529
Spark simply looks for any strings which contain
-Dspark(which-Dsparknlpdoes) and errors out.How Has This Been Tested?
Against both a GCP Dataproc Cluster and a Zepplin notebook running on top of Dataproc. Example settings config shown below:
Types of changes
Checklist:
Additional Info:
IMPORTANT The
sparknlp.settings.licensereferences are not public. I believe these reside in the spark-nlp-license project and will need to be updated there.