You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not sure yet how is one supposed to use Spark to process data in ES... but running my first tests, I note that it is using three partitions, while I have more nodes in the spark cluster. seems the number three is from the number of shards in the index, and more generally maybe it is the number of shards used in the query? anyways... is there a way to make it use more nodes? or is that not really useful thing to do?
Thanks a lot,
Mohamed.
The text was updated successfully, but these errors were encountered:
@lrhazi have you looked into the architecture section in the reference documentation? You're assessment is right that the number of tasks is dictated by the number of shards on the ES front and increasing the tasks beyond this number is not possible since it would simply duplicate the work the other workers are doing.
Each task should work on its own 'slice' of data - allocating multiple workers on the same slice means the slice itself needs to be further divided but that's not really efficient ...
Cheers,
P.S. This aside, please use the mailing list or IRC for discussions/questions instead of the issue tracker.
I am not sure yet how is one supposed to use Spark to process data in ES... but running my first tests, I note that it is using three partitions, while I have more nodes in the spark cluster. seems the number three is from the number of shards in the index, and more generally maybe it is the number of shards used in the query? anyways... is there a way to make it use more nodes? or is that not really useful thing to do?
Thanks a lot,
Mohamed.
The text was updated successfully, but these errors were encountered: