Provided dedicated support for Spark Streaming #802

costin · 2016-07-12T17:38:23Z

In Spark 1.6 streaming support is implemented as micro-batching meaning each data window triggers a Spark job. While this works, for very small time windows (under 1 minute) this can lead to resource exhaustion (such as consuming all the HTTP connections).
The Spark documentation mentions the usage of a connection pool to get around this.
Spark 2.0 might revise this pattern and provide a richer/better scenario.
Either way, ES-Hadoop should provide a work-around for both Spark 1.x and 2.x.

jbaiera · 2016-08-23T15:39:55Z

Spark 2.0 has provided improvements to the processing capabilities for streams, but not much in the ways of how the streams are executed. The microbatching still exists in Spark 2.0 Structured Streaming, and as such, we'll definitely need this for both Streaming and Structure Streaming.

costin added feature :Spark v5.0.0-alpha5 labels Jul 12, 2016

costin assigned jbaiera Jul 12, 2016

acchen97 added v5.0.0-beta1 and removed v5.0.0-alpha5 labels Jul 21, 2016

jbaiera closed this as completed in 954c3e2 Sep 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provided dedicated support for Spark Streaming #802

Provided dedicated support for Spark Streaming #802

costin commented Jul 12, 2016

jbaiera commented Aug 23, 2016

Provided dedicated support for Spark Streaming #802

Provided dedicated support for Spark Streaming #802

Comments

costin commented Jul 12, 2016

jbaiera commented Aug 23, 2016