[SPARK-5764] Delete the cache and lock file after executor fetching the jar #4548

XuTingjun · 2015-02-12T02:33:09Z

Every time while executor fetching a jar from httpserver, a lock file and a cache file will be created on the local. After fetching, this two files will be useless.
And when the jar package is big, the cache file also be big. it wates the disk space.

AmplabJenkins · 2015-02-12T02:37:08Z

Can one of the admins verify this patch?

srowen · 2015-02-12T08:33:32Z

Isn't the point that the files should stick around for future callers? The file is not recopied and lock is not recreated if it exists. (You would need a JIRA for this anyway, but first let's clear up this question.)

XuTingjun · 2015-02-12T09:19:34Z

val cachedFileName = s"${url.hashCode}${timestamp}_cache"

The cache file is named with url.hashCode and timestamp. No cache file of a jar will be the same with it. So it will not be called for future caller

srowen · 2015-02-12T10:19:18Z

The idea is that this uniquely determines the file and even a version of that file. That by itself is sound. Timestamp is not always "the current time". Look at the invocation in Executor.scala. I'm not as sure about the invocation in SparkContext.scala since it also does a fetch locally, with the current time, and that is always a 'cache miss', but I think that one is by design? But for the executor it looks correct at first glance since it uses timestamp as a sort of version key, where the timestamp is the time this particular file was added by the driver.

XuTingjun · 2015-02-12T12:16:53Z

In SparkContext.scala, the useCache is false, so it won't use the cached file

srowen · 2015-02-12T12:22:57Z

Ah right of course. So, the executor is keying the cache on (hash of) URL and 'version', where version is the driver's timestamp. That would be the same for executors across the same app, and that's the purpose of this cache. Right?

XuTingjun · 2015-02-12T12:28:06Z

Do you mean, the executors on the same node will use the cached file? I think it's right.

srowen · 2015-02-12T12:29:42Z

That looks like the intent, from the comment. These files should ultimately be deleted when the executor stops. Do you think there is a problem in light of this?

XuTingjun · 2015-02-12T12:33:26Z

I think the cache file should be deleted when the app is finished, not executor stops.

srowen · 2015-02-12T12:35:52Z

Executors are per-app, so this is roughly the same thing?

XuTingjun · 2015-02-12T12:37:27Z

I think we should consider the dynamic executor allocation, right?

srowen · 2015-02-12T12:46:04Z

Yeah, good point. Actually, ignore my comment. The executors stick this file in SparkFiles.getRootDirectory and that is not necessarily deleted by the executor. I mean, it's not necessarily even shared.

My point was that they should not be immediately deleted, at least. They do serve a purpose in some cases.

delelte no longer used file

7bea1fe

XuTingjun changed the title ~~[Core][Improvement] Delelte no longer used file~~ [SPARK-5764] Delete the cache and lock file after executor fetching the jar Feb 12, 2015

XuTingjun closed this Feb 12, 2015

XuTingjun deleted the patch branch February 17, 2015 01:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-5764] Delete the cache and lock file after executor fetching the jar #4548

[SPARK-5764] Delete the cache and lock file after executor fetching the jar #4548

Uh oh!

XuTingjun commented Feb 12, 2015

Uh oh!

AmplabJenkins commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

XuTingjun commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

XuTingjun commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

XuTingjun commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

XuTingjun commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

XuTingjun commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-5764] Delete the cache and lock file after executor fetching the jar #4548

[SPARK-5764] Delete the cache and lock file after executor fetching the jar #4548

Uh oh!

Conversation

XuTingjun commented Feb 12, 2015

Uh oh!

AmplabJenkins commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

XuTingjun commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

XuTingjun commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

XuTingjun commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

XuTingjun commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

XuTingjun commented Feb 12, 2015

Uh oh!

srowen commented Feb 12, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants