Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provided dedicated support for Spark Streaming #802

Closed
costin opened this issue Jul 12, 2016 · 1 comment
Closed

Provided dedicated support for Spark Streaming #802

costin opened this issue Jul 12, 2016 · 1 comment

Comments

@costin
Copy link
Member

costin commented Jul 12, 2016

In Spark 1.6 streaming support is implemented as micro-batching meaning each data window triggers a Spark job. While this works, for very small time windows (under 1 minute) this can lead to resource exhaustion (such as consuming all the HTTP connections).
The Spark documentation mentions the usage of a connection pool to get around this.
Spark 2.0 might revise this pattern and provide a richer/better scenario.
Either way, ES-Hadoop should provide a work-around for both Spark 1.x and 2.x.

@jbaiera
Copy link
Member

jbaiera commented Aug 23, 2016

Spark 2.0 has provided improvements to the processing capabilities for streams, but not much in the ways of how the streams are executed. The microbatching still exists in Spark 2.0 Structured Streaming, and as such, we'll definitely need this for both Streaming and Structure Streaming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants