Restore data locality preference for hadoop and spark #814

jimczi · 2016-07-28T19:02:00Z

The data locality preference has been dropped in #812. We should restore the functionality but instead of setting one node per partition we should let hadoop/spark choose among the nodes that host the index/shard targeted by the partition. Since it's only a preference, having multiple hosts that could serve the query is beneficial in terms of execution. The list of hosts should be shuffled in order to spread the execution of multiple partitions that target the same index/shard.

jimczi added feature :MR breaking :Spark v5.0.0-beta1 labels Jul 28, 2016

jimczi mentioned this issue Aug 3, 2016

Restores the ability to prefer local nodes when creating a PartitionReader #819

Merged

jbaiera closed this as completed in #819 Aug 8, 2016

jbaiera added v5.0.0-alpha5 and removed v5.0.0-beta1 labels Aug 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore data locality preference for hadoop and spark #814

Restore data locality preference for hadoop and spark #814

jimczi commented Jul 28, 2016

Restore data locality preference for hadoop and spark #814

Restore data locality preference for hadoop and spark #814

Comments

jimczi commented Jul 28, 2016