Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getRackForHost returns None if host is unknown by driver #17238

Closed
wants to merge 1 commit into from

Conversation

charliechen211
Copy link

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-19894

How was this patch tested?

It tests on our production cluster(YARN) by YARN-cluster mode, and resolve user rack-local problems by applying this patch.
Problem:
In our production cluster(YARN), one node(called missing-rack-info node) miss some rack information for other nodes. One Spark Streaming program(Datasource: Kafka, Mode: Yarn-cluster), runs driver on this missing-rack-info node.
The nodes whose host is missed on Driver node, and the Kafka broker node whose host is also unknown by YARN, would both be recognized as "/default-rack" by YARN scheduler, so that all tasks would be assigned to the nodes for RACK_LOCAL.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Mar 10, 2017

Please change the title per http://spark.apache.org/contributing.html
@squito or @vanzin do you know enough to evaluate this?

@squito
Copy link
Contributor

squito commented Mar 10, 2017

The description of the problem makes sense for how it would effect task locality. I will need to look more closely at some yarn bits to make sure its the right change, but looks reasonable.

@mridulm
Copy link
Contributor

mridulm commented Mar 10, 2017

This is something to be configured appropriately at the lower level (namely topology config), and should be handled there Special casing this in spark is not right.
Two scenarios :

  • Default rack for cpu nodes, and a special rack for gpu nodes (specifically for rdma comm) - will cause rack locality to be disabled for the cpu nodes (since they are on the default rack).
  • Specializations of NetworkTopology need not honor DEFAULT_RACK - resulting in limited value in those scenarios (IIRC the api does not require unknown rack to be specified as DEFAULT_RACK) - is that incorrect assessment @squito ?

@squito
Copy link
Contributor

squito commented Mar 13, 2017

@mridulm you know the yarn bits involved better than I do. It sounds like you are saying this is the wrong change, and instead its just a misconfiguration in yarn -- I'll buy that argument. Essentially you are saying that "DEFAULT_RACK" may be used legitimately for one of the racks.

Even with this miconfiguration, the incorrect locality assignments don't seem horrible, except that for very short tasks its exacerbated by https://issues.apache.org/jira/browse/SPARK-18886

@mridulm
Copy link
Contributor

mridulm commented Mar 14, 2017

@squito I looked at yarn bits about 3-4 years back, so I am sure you and @tgravescs know better :-)
But yes, you did summarize what I was attempting to get to.

I am not sure what the impact of SPARK-18886 here might be ...
Another way to look at this change is, what does MR/Tez do in this scenario w.r.t locality computation ? My understanding is that they do not fudge the value - but if I am wrong, we could do the same.

@tgravescs
Copy link
Contributor

Sorry if I'm missing something here but I don't see why this is a problem? If you have YARN misconfigured or not configured everything is going to default to DEFAULT_RACK. If you want them to be on different racks then fix the yarn configuration.

The topology mapping in hadoop is a plugin so someone could write a plugin that just returned DEFAULT_RACK for anything, if you don't want that behavior you can write a plugin that does something different.

@morenn520 I'm assuming in this case the tasks had some host locality level before looking at the rack level?

I don't believe tez/MR do anything special with respect to the DEFAULT_RACK but I'll double check

@squito
Copy link
Contributor

squito commented Mar 14, 2017

I don't think you're missing anything @tgravescs , it sounds like this is just a misconfiguration and we shouldn't be doing anything special for it (since it could hurt correct configurations). I wanted to see if this was a particularly common / easy misconfiguration in yarn, but you both have convinced me its not.

my point about SPARK-18886 was to think through how bad the effect from the misconfiguration is. In the originally described scenario, if a few nodes are accidentally labelled as belonging to their own rack, then sometimes task will get preferentially assigned to this false-rack at first. If tasks are long, its not that big a deal -- after a short delay, you'll then assign the tasks to the rest of the cluster. But if tasks are short, b/c of SPARK-18886, you may just keep assigning to the nodes in your false-rack, and leave the rest of the cluster idle.

Nonetheless, I think its still just a misconfiguration. sounds like this issue is a "won't fix", unless @morenn520 makes the case otherwise.

@charliechen211
Copy link
Author

@squito Actually it's a misconfiguration in our yarn. There are thousands of nodes in our production yarn. It will be quite frequent to rack/unrack nodes, so as to make something missing easily. It would be better if spark does something special to avoid our misconfiguration.
In addition, I think unknown host should be recognized as different racks. But if MR/tez may do nothing special to this problem, it is reasonable for this problem not to be fix.
I hope @tgravescs would double check whether tez/MR do anything special to DEFAULT_RACK.

@tgravescs
Copy link
Contributor

If you aren't adding in machines to rack and configuring yarn properly before adding it to your cluster that is a process issue you should fix on your end. I would assume a unracking/racking a node means putting in a new node? If that is the case you have to install hadoop and hadoop configuration on that node. I would expect you to fix the configuration or have a generic rack aware script/java class that would be able to just figure it out, but that is going to be pretty configuration specific. I would also assume if you have that configuration wrong then your HDFS is also not being optimal as it could get the replication wrong.

You can specify your own class/script to do the rack resolution so you could change that to handle this case: see https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/RackAwareness.html

I'll try to check on tez/mr today and get back to you (was to busy yesterday). I know tez didn't have any explicit references to DEFAULT_RACK in the code but want to look a bit more.

@tgravescs
Copy link
Contributor

Ok checked tez and mr and they don't do this.
Actually in a couple of the input formats it actually adds DEFAULT_RACK if there wasn't any topology information so you would end up assuming everything is rack local in those cases.

@vanzin
Copy link
Contributor

vanzin commented Mar 15, 2017

If you aren't adding in machines to rack and configuring yarn properly before adding it to your cluster that is a process issue you should fix on your end

Actually, to play devil's advocate, the problem @morenn520 is describing is a little more involved. You have a driver running, which has its own view of what the cluster topology is, and then the cluster topology changes underneath it. Deploying a new configuration on the new nodes being added does not fix the driver, unless your "topology discovery script" is fully dynamic and always goes to a central location to figure out what's the current topology.

So, basically, even if the new nodes know about the updated topology, the existing driver instance might not, and there's no easy way to fix that I can see.

(That being said, I haven't looked at the changes here.)

@charliechen211
Copy link
Author

Thanks @tgravescs , we has used a dynamic topology discovery script to avoid this problem. Since tez and mr don‘t do this, maybe spark could not fix this problem.

@tgravescs
Copy link
Contributor

Actually, to play devil's advocate, the problem @morenn520 is describing is a little more involved. You have a driver running, which has its own view of what the cluster topology is, and then the cluster topology changes underneath it. Deploying a new configuration on the new nodes being added does not fix the driver, unless your "topology discovery script" is fully dynamic and always goes to a central location to figure out what's the current topology.

Maybe I'm misunderstanding what you are saying, but the only way the AM gets a bad topology is if its wrong in the first place. Or are you just saying app starts and host is in one rack, host gets moved to another rack and brought back up? I guess that is possible, but I'm not really sure that applies to this case here anyway with default_rack. Any existing executors would have gone away on it when the host was moved so yarn should re-resolve when it gets a new container anyway. If your script isn't dynamic to handle that then its also a configuration issue to update all the other hosts and you should do that before bringing the host up. Again unless you aren't using HDFS the rack resolve affects more then just spark on yarn here. Its going to affect HDFS block placements and other types of apps.

@maropu maropu mentioned this pull request Apr 23, 2017
maropu added a commit to maropu/spark that referenced this pull request Apr 23, 2017
@asfgit asfgit closed this in e9f9715 Apr 24, 2017
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
This pr proposed to close stale PRs. Currently, we have 400+ open PRs and there are some stale PRs whose JIRA tickets have been already closed and whose JIRA tickets does not exist (also, they seem not to be minor issues).

// Open PRs whose JIRA tickets have been already closed
Closes apache#11785
Closes apache#13027
Closes apache#13614
Closes apache#13761
Closes apache#15197
Closes apache#14006
Closes apache#12576
Closes apache#15447
Closes apache#13259
Closes apache#15616
Closes apache#14473
Closes apache#16638
Closes apache#16146
Closes apache#17269
Closes apache#17313
Closes apache#17418
Closes apache#17485
Closes apache#17551
Closes apache#17463
Closes apache#17625

// Open PRs whose JIRA tickets does not exist and they are not minor issues
Closes apache#10739
Closes apache#15193
Closes apache#15344
Closes apache#14804
Closes apache#16993
Closes apache#17040
Closes apache#15180
Closes apache#17238

N/A

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes apache#17734 from maropu/resolved_pr.

Change-Id: Id2e590aa7283fe5ac01424d30a40df06da6098b5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants