-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-23669] Executors fetch jars and name the jars with md5 prefix #20812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #88208 has finished for PR 20812 at commit
|
|
@vanzin @zsxwing @jerryshao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we still need the localName here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think we don't need it . The File returned by Utils.fetchFile is the local file. We don't need localName to initialize the local file here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about s"${DigestUtils.md5Hex(url)}-${decodeFileNameInURI(new URI(url))}"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have this conf in internal/conf ?
|
The idea looks good, just a few comments. |
|
@jiangxb1987 |
|
Does it only fix the jars added by |
|
@jerryshao |
|
Test build #88303 has finished for PR 20812 at commit
|
|
Test build #88302 has finished for PR 20812 at commit
|
|
@jinxing64 , I think using same name jars which contains different classes seems practically not a best practice. Ideally different udfs should be packaged in different jars with different name/version. That will be easy for user to manage. Also same name jars could easily cause classpath issue usually. As you always has a workaround for this issue out of Spark. So I would suggest not to fix it, since this is a quite user specific issue. |
|
@jerryshao @jiangxb1987 @jerryshao Thanks again for your comments. |
|
Should we close this then? @jinxing64 @jerryshao |
What changes were proposed in this pull request?
In our cluster, there are lots of UDF jars, some of them have the same filename but different path, for example:
When user uses udfA and udfB in same sql, executor will fetch both
hdfs://A/B/udf.jarandhdfs://C/D/udf.jarto local. There will be a conflict for the same name.Can we config to fetch jars and save with a filename with MD5 prefix, so there will be no conflict.
How was this patch tested?
UT