Can one increase number of partitions and hence spark nodes used? #339

lrhazi · 2014-12-11T06:44:33Z

I am not sure yet how is one supposed to use Spark to process data in ES... but running my first tests, I note that it is using three partitions, while I have more nodes in the spark cluster. seems the number three is from the number of shards in the index, and more generally maybe it is the number of shards used in the query? anyways... is there a way to make it use more nodes? or is that not really useful thing to do?

Thanks a lot,
Mohamed.

costin · 2014-12-11T12:41:53Z

@lrhazi have you looked into the architecture section in the reference documentation? You're assessment is right that the number of tasks is dictated by the number of shards on the ES front and increasing the tasks beyond this number is not possible since it would simply duplicate the work the other workers are doing.
Each task should work on its own 'slice' of data - allocating multiple workers on the same slice means the slice itself needs to be further divided but that's not really efficient ...

Cheers,

P.S. This aside, please use the mailing list or IRC for discussions/questions instead of the issue tracker.

lrhazi · 2014-12-11T14:52:48Z

Cool. thanks a lot.

costin added :Spark question v2.1.0.Beta4 v2.0.3 labels Dec 11, 2014

lrhazi closed this as completed Dec 11, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can one increase number of partitions and hence spark nodes used? #339

Can one increase number of partitions and hence spark nodes used? #339

lrhazi commented Dec 11, 2014

costin commented Dec 11, 2014

lrhazi commented Dec 11, 2014

Can one increase number of partitions and hence spark nodes used? #339

Can one increase number of partitions and hence spark nodes used? #339

Comments

lrhazi commented Dec 11, 2014

costin commented Dec 11, 2014

lrhazi commented Dec 11, 2014