getRackForHost returns None if host is unknown by driver #17238

charliechen211 · 2017-03-10T07:34:25Z

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-19894

How was this patch tested?

It tests on our production cluster(YARN) by YARN-cluster mode, and resolve user rack-local problems by applying this patch.
Problem:
In our production cluster(YARN), one node(called missing-rack-info node) miss some rack information for other nodes. One Spark Streaming program(Datasource: Kafka, Mode: Yarn-cluster), runs driver on this missing-rack-info node.
The nodes whose host is missed on Driver node, and the Kafka broker node whose host is also unknown by YARN, would both be recognized as "/default-rack" by YARN scheduler, so that all tasks would be assigned to the nodes for RACK_LOCAL.

AmplabJenkins · 2017-03-10T07:37:12Z

Can one of the admins verify this patch?

srowen · 2017-03-10T08:18:29Z

Please change the title per http://spark.apache.org/contributing.html
@squito or @vanzin do you know enough to evaluate this?

squito · 2017-03-10T17:16:59Z

The description of the problem makes sense for how it would effect task locality. I will need to look more closely at some yarn bits to make sure its the right change, but looks reasonable.

mridulm · 2017-03-10T20:44:14Z

This is something to be configured appropriately at the lower level (namely topology config), and should be handled there Special casing this in spark is not right.
Two scenarios :

Default rack for cpu nodes, and a special rack for gpu nodes (specifically for rdma comm) - will cause rack locality to be disabled for the cpu nodes (since they are on the default rack).
Specializations of NetworkTopology need not honor DEFAULT_RACK - resulting in limited value in those scenarios (IIRC the api does not require unknown rack to be specified as DEFAULT_RACK) - is that incorrect assessment @squito ?

squito · 2017-03-13T16:34:17Z

@mridulm you know the yarn bits involved better than I do. It sounds like you are saying this is the wrong change, and instead its just a misconfiguration in yarn -- I'll buy that argument. Essentially you are saying that "DEFAULT_RACK" may be used legitimately for one of the racks.

Even with this miconfiguration, the incorrect locality assignments don't seem horrible, except that for very short tasks its exacerbated by https://issues.apache.org/jira/browse/SPARK-18886

mridulm · 2017-03-14T00:59:48Z

@squito I looked at yarn bits about 3-4 years back, so I am sure you and @tgravescs know better :-)
But yes, you did summarize what I was attempting to get to.

I am not sure what the impact of SPARK-18886 here might be ...
Another way to look at this change is, what does MR/Tez do in this scenario w.r.t locality computation ? My understanding is that they do not fudge the value - but if I am wrong, we could do the same.

tgravescs · 2017-03-14T14:29:06Z

Sorry if I'm missing something here but I don't see why this is a problem? If you have YARN misconfigured or not configured everything is going to default to DEFAULT_RACK. If you want them to be on different racks then fix the yarn configuration.

The topology mapping in hadoop is a plugin so someone could write a plugin that just returned DEFAULT_RACK for anything, if you don't want that behavior you can write a plugin that does something different.

@morenn520 I'm assuming in this case the tasks had some host locality level before looking at the rack level?

I don't believe tez/MR do anything special with respect to the DEFAULT_RACK but I'll double check

squito · 2017-03-14T16:08:53Z

I don't think you're missing anything @tgravescs , it sounds like this is just a misconfiguration and we shouldn't be doing anything special for it (since it could hurt correct configurations). I wanted to see if this was a particularly common / easy misconfiguration in yarn, but you both have convinced me its not.

my point about SPARK-18886 was to think through how bad the effect from the misconfiguration is. In the originally described scenario, if a few nodes are accidentally labelled as belonging to their own rack, then sometimes task will get preferentially assigned to this false-rack at first. If tasks are long, its not that big a deal -- after a short delay, you'll then assign the tasks to the rest of the cluster. But if tasks are short, b/c of SPARK-18886, you may just keep assigning to the nodes in your false-rack, and leave the rest of the cluster idle.

Nonetheless, I think its still just a misconfiguration. sounds like this issue is a "won't fix", unless @morenn520 makes the case otherwise.

charliechen211 · 2017-03-15T07:39:32Z

@squito Actually it's a misconfiguration in our yarn. There are thousands of nodes in our production yarn. It will be quite frequent to rack/unrack nodes, so as to make something missing easily. It would be better if spark does something special to avoid our misconfiguration.
In addition, I think unknown host should be recognized as different racks. But if MR/tez may do nothing special to this problem, it is reasonable for this problem not to be fix.
I hope @tgravescs would double check whether tez/MR do anything special to DEFAULT_RACK.

tgravescs · 2017-03-15T13:58:04Z

If you aren't adding in machines to rack and configuring yarn properly before adding it to your cluster that is a process issue you should fix on your end. I would assume a unracking/racking a node means putting in a new node? If that is the case you have to install hadoop and hadoop configuration on that node. I would expect you to fix the configuration or have a generic rack aware script/java class that would be able to just figure it out, but that is going to be pretty configuration specific. I would also assume if you have that configuration wrong then your HDFS is also not being optimal as it could get the replication wrong.

You can specify your own class/script to do the rack resolution so you could change that to handle this case: see https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/RackAwareness.html

I'll try to check on tez/mr today and get back to you (was to busy yesterday). I know tez didn't have any explicit references to DEFAULT_RACK in the code but want to look a bit more.

tgravescs · 2017-03-15T14:28:01Z

Ok checked tez and mr and they don't do this.
Actually in a couple of the input formats it actually adds DEFAULT_RACK if there wasn't any topology information so you would end up assuming everything is rack local in those cases.

vanzin · 2017-03-15T18:05:30Z

If you aren't adding in machines to rack and configuring yarn properly before adding it to your cluster that is a process issue you should fix on your end

Actually, to play devil's advocate, the problem @morenn520 is describing is a little more involved. You have a driver running, which has its own view of what the cluster topology is, and then the cluster topology changes underneath it. Deploying a new configuration on the new nodes being added does not fix the driver, unless your "topology discovery script" is fully dynamic and always goes to a central location to figure out what's the current topology.

So, basically, even if the new nodes know about the updated topology, the existing driver instance might not, and there's no easy way to fix that I can see.

(That being said, I haven't looked at the changes here.)

charliechen211 · 2017-03-16T07:28:08Z

Thanks @tgravescs , we has used a dynamic topology discovery script to avoid this problem. Since tez and mr don‘t do this, maybe spark could not fix this problem.

tgravescs · 2017-03-16T13:28:57Z

Actually, to play devil's advocate, the problem @morenn520 is describing is a little more involved. You have a driver running, which has its own view of what the cluster topology is, and then the cluster topology changes underneath it. Deploying a new configuration on the new nodes being added does not fix the driver, unless your "topology discovery script" is fully dynamic and always goes to a central location to figure out what's the current topology.

Maybe I'm misunderstanding what you are saying, but the only way the AM gets a bad topology is if its wrong in the first place. Or are you just saying app starts and host is in one rack, host gets moved to another rack and brought back up? I guess that is possible, but I'm not really sure that applies to this case here anyway with default_rack. Any existing executors would have gone away on it when the host was moved so yarn should re-resolve when it gets a new container anyway. If your script isn't dynamic to handle that then its also a configuration issue to update all the other hosts and you should do that before bringing the host up. Again unless you aren't using HDFS the rack resolve affects more then just spark on yarn here. Its going to affect HDFS block placements and other types of apps.

Closes apache#11785 Closes apache#13027 Closes apache#13614 Closes apache#13761 Closes apache#15197 Closes apache#14006 Closes apache#12576 Closes apache#15447 Closes apache#13259 Closes apache#15616 Closes apache#14473 Closes apache#16638 Closes apache#16146 Closes apache#17269 Closes apache#17313 Closes apache#17418 Closes apache#17485 Closes apache#17551 Closes apache#17463 Closes apache#17625 Closes apache#10739 Closes apache#15193 Closes apache#15344 Closes apache#14804 Closes apache#16993 Closes apache#17040 Closes apache#15180 Closes apache#17238

This pr proposed to close stale PRs. Currently, we have 400+ open PRs and there are some stale PRs whose JIRA tickets have been already closed and whose JIRA tickets does not exist (also, they seem not to be minor issues). // Open PRs whose JIRA tickets have been already closed Closes apache#11785 Closes apache#13027 Closes apache#13614 Closes apache#13761 Closes apache#15197 Closes apache#14006 Closes apache#12576 Closes apache#15447 Closes apache#13259 Closes apache#15616 Closes apache#14473 Closes apache#16638 Closes apache#16146 Closes apache#17269 Closes apache#17313 Closes apache#17418 Closes apache#17485 Closes apache#17551 Closes apache#17463 Closes apache#17625 // Open PRs whose JIRA tickets does not exist and they are not minor issues Closes apache#10739 Closes apache#15193 Closes apache#15344 Closes apache#14804 Closes apache#16993 Closes apache#17040 Closes apache#15180 Closes apache#17238 N/A Author: Takeshi Yamamuro <yamamuro@apache.org> Closes apache#17734 from maropu/resolved_pr. Change-Id: Id2e590aa7283fe5ac01424d30a40df06da6098b5

getRackForHost returns None if host is unknown by driver

6630e74

maropu mentioned this pull request Apr 23, 2017

[BUILD] Close stale PRs #17734

Closed

asfgit closed this in e9f9715 Apr 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getRackForHost returns None if host is unknown by driver #17238

getRackForHost returns None if host is unknown by driver #17238

charliechen211 commented Mar 10, 2017

AmplabJenkins commented Mar 10, 2017

srowen commented Mar 10, 2017

squito commented Mar 10, 2017

mridulm commented Mar 10, 2017

squito commented Mar 13, 2017

mridulm commented Mar 14, 2017

tgravescs commented Mar 14, 2017

squito commented Mar 14, 2017

charliechen211 commented Mar 15, 2017

tgravescs commented Mar 15, 2017

tgravescs commented Mar 15, 2017

vanzin commented Mar 15, 2017

charliechen211 commented Mar 16, 2017

tgravescs commented Mar 16, 2017

getRackForHost returns None if host is unknown by driver #17238

getRackForHost returns None if host is unknown by driver #17238

Conversation

charliechen211 commented Mar 10, 2017

What changes were proposed in this pull request?

How was this patch tested?

AmplabJenkins commented Mar 10, 2017

srowen commented Mar 10, 2017

squito commented Mar 10, 2017

mridulm commented Mar 10, 2017

squito commented Mar 13, 2017

mridulm commented Mar 14, 2017

tgravescs commented Mar 14, 2017

squito commented Mar 14, 2017

charliechen211 commented Mar 15, 2017

tgravescs commented Mar 15, 2017

tgravescs commented Mar 15, 2017

vanzin commented Mar 15, 2017

charliechen211 commented Mar 16, 2017

tgravescs commented Mar 16, 2017