Skip to content

[SPARK-14957][Yarn] Adopt healthy dir to store executor meta#12735

Closed
suyanNone wants to merge 1 commit intoapache:masterfrom
suyanNone:fix-0dir
Closed

[SPARK-14957][Yarn] Adopt healthy dir to store executor meta#12735
suyanNone wants to merge 1 commit intoapache:masterfrom
suyanNone:fix-0dir

Conversation

@suyanNone
Copy link
Contributor

What changes were proposed in this pull request?

Adopt a healthy dir to store executor meta.

How was this patch tested?

Unit tests

@suyanNone suyanNone changed the title [SPARK-14957][Spark] Adopt healthy dir to store executor meta [SPARK-14957][Yarn] Adopt healthy dir to store executor meta Apr 27, 2016
@SparkQA
Copy link

SparkQA commented Apr 27, 2016

Test build #57126 has finished for PR 12735 at commit 2527391.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 27, 2016

I don't understand the problem or your change based on your JIRA. Are you trying to test for writeability? I don't think we want to arbitrarily require certain free space. This patch looks like it deletes directories though, which seems wrong.

@tgravescs
Copy link
Contributor

I agree with @srowen can you please describe what you are trying to do in more detail. Why are you looking for multiple of the db files and deleting older ones, when would there be more then 1? Any sort of min disk space would need to be configurable.

Note that in hadoop 2.5 they added a getRecoveryPath() routine to the AuxiliaryService that gets a path to put the ldb. I don't know if we decided on what hadoop version we are support in 2.x, @srowen do you remember if that was decided?

@srowen
Copy link
Member

srowen commented Apr 27, 2016

Hadoop 2.2 is still the base build for 2.x. If you think this is a small additional good reason to up the requirement, let's highlight that.

@tgravescs
Copy link
Contributor

thanks, this is pretty minor and wouldn't warrant changing that.

@suyanNone
Copy link
Contributor Author

suyanNone commented Apr 28, 2016

    registeredExecutorFile =
      findRegisteredExecutorFile(conf.getTrimmedStrings("yarn.nodemanager.local-dirs"));

we got that local dirs from yarnConfig, there a lot of dirs, but current we always adopt the first dir. right?

first assume there don't exist any meta file:
if the first disk had been removed, or had disk errors, like read-only filesystem/input/output errors/no space left. it will cause ExternalShuffleBlockResolver to create a new leveldb file, but it will failed...and throw IOException, and this IOException will be engulfed by YarnShufflerService

try {
      blockHandler = new ExternalShuffleBlockHandler(transportConf, registeredExecutorFile);
    } catch (Exception e) {
      logger.error("Failed to initialize external shuffle service", e);
    }

So may be we can choose a more healthy disk dir for storing meta file, to avoid necessary exception.

@suyanNone
Copy link
Contributor Author

suyanNone commented Apr 28, 2016

for why will have multi-meta files, assume we found one, but the disk will be a read-only filesystem, still use that ? or choose another healthy dir to create new one?

if choose the second, we can't delete right, and so we will have 2 files exist...

@suyanNone
Copy link
Contributor Author

To be honest, I not walk through all Yarn shuffle server process, I just fix our user reported problem, why can't connect with shuffle server due to create level db in an non-exists folder. I will take some time to re-produce problem, and be more comprehend about this.

@tgravescs I will look into the getRecoveryPath api...

@tgravescs
Copy link
Contributor

If the disk is bad or missing there is nothing else you can do then create a new db since as you say deleting wouldn't work.

Note I think all it does is log a message because we didn't want it to kill the entire nodemanager, but I think we should change that. We should throw an exception if registration doesn't work because the way the nodemanager currently works is that it doesn't fail until someone goes to fetch the data. If it failed quick when the executor registered with it that would be preferable, but that is a YARN bug.

If you are going to look at the getRecoveryPath api, I think we can do it without reflection by defining our own setRecoveryPath function in YarnShuffleService (leave override off so it works with older versions of hadoop). Have it default to some invalid value and if its hadoop 2.5 or greater it will get called and set it to a real value. Then in our code we could check to see if its set and if it is use it, if not we could fall back to currently implementation. Note that setRecoverPath is the only one we really need define since getRecoverPath is protected, but to be safe we might also implement that. We can store our own path.
The only other thing here is that we may want to handle upgrading. If you are currently running the shuffle service the ldb will be in local dirs but when you upgrade it will go to a new path and wouldn't find the old one. To handle this we could just look for it in the new path first and if not there look for it in the old locations and if found then move to the new location.

I think between the throw and using the new api we shouldn't need to check the disks. The recovery path that is supposed to be set by administrators is supposed to be something very resilient such that if its down nothing on the node would work.

@tgravescs
Copy link
Contributor

Actually after looking a bit more, Spark does fail fast if the shuffle service isn't there because very soon after start up the BlockManager registers with the shuffle service so if it didn't come up the executors should fail quickly. Is this what you were seeing?
This to me isn't so bad, at least it isn't wasting a bunch of work. Yes new executors could get scheduled there but they should fail very quickly without wasting working.

@jerryshao
Copy link
Contributor

Yes, agree with @tgravescs , setRecoveryPath and getRecoverPath is a good way for 2.5+. But as mentioned above, executor will be failed faster if external shuffle service is not existed, so leave as it is is not a big problem.

@srowen
Copy link
Member

srowen commented May 5, 2016

OK sounds like we should close this PR

@suyanNone
Copy link
Contributor Author

@tgravescs, yes, the executor will failed fast...
vaguely remember that there have a application failed caused by shuffle server unhealthy dir, I don't have time to walk deep at that time... now I can't re-produce it
so I will close this.

@suyanNone suyanNone closed this May 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants