-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5764] Delete the cache and lock file after executor fetching the jar #4548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
Isn't the point that the files should stick around for future callers? The file is not recopied and lock is not recreated if it exists. (You would need a JIRA for this anyway, but first let's clear up this question.) |
|
val cachedFileName = s"${url.hashCode}${timestamp}_cache" The cache file is named with url.hashCode and timestamp. No cache file of a jar will be the same with it. So it will not be called for future caller |
|
The idea is that this uniquely determines the file and even a version of that file. That by itself is sound. Timestamp is not always "the current time". Look at the invocation in |
|
In SparkContext.scala, the useCache is false, so it won't use the cached file |
|
Ah right of course. So, the executor is keying the cache on (hash of) URL and 'version', where version is the driver's timestamp. That would be the same for executors across the same app, and that's the purpose of this cache. Right? |
|
Do you mean, the executors on the same node will use the cached file? I think it's right. |
|
That looks like the intent, from the comment. These files should ultimately be deleted when the executor stops. Do you think there is a problem in light of this? |
|
I think the cache file should be deleted when the app is finished, not executor stops. |
|
Executors are per-app, so this is roughly the same thing? |
|
I think we should consider the dynamic executor allocation, right? |
|
Yeah, good point. Actually, ignore my comment. The executors stick this file in My point was that they should not be immediately deleted, at least. They do serve a purpose in some cases. |
Every time while executor fetching a jar from httpserver, a lock file and a cache file will be created on the local. After fetching, this two files will be useless.
And when the jar package is big, the cache file also be big. it wates the disk space.