[SPARK-16114] [SQL] structured streaming network word count examples#13816
[SPARK-16114] [SQL] structured streaming network word count examples#13816jjthomas wants to merge 11 commits intoapache:masterfrom
Conversation
| .appName("JavaStructuredNetworkWordCount") | ||
| .getOrCreate(); | ||
|
|
||
| Dataset<String> df = spark.readStream().format("socket").option("host", args[0]) |
There was a problem hiding this comment.
make this
spark
.readStream()
.format("socket")
.option(...)
....
easier to read.
|
ok to test. |
|
|
||
| import spark.implicits._ | ||
|
|
||
| val df = spark.readStream |
|
Test build #60970 has finished for PR 13816 at commit
|
|
Responded to comments |
|
Test build #60981 has finished for PR 13816 at commit
|
| .getOrCreate(); | ||
|
|
||
| // input lines (may be multiple words on each line) | ||
| Dataset<String> lines = spark |
There was a problem hiding this comment.
you dont need to convert to Dataset[String] using as, since you are not using the typed groupByKey. keeping as Dataset[Row] is fine, as you done with the scala and python version.
|
Test build #61051 has finished for PR 13816 at commit
|
|
Test build #61054 has finished for PR 13816 at commit
|
|
test this again |
|
Test build #3127 has finished for PR 13816 at commit
|
|
Test build #3128 has finished for PR 13816 at commit
|
|
Test build #61071 has finished for PR 13816 at commit
|
|
Test build #61310 has finished for PR 13816 at commit
|
| * `$ bin/run-example org.apache.spark.examples.sql.streaming.EventTimeWindowExample | ||
| * localhost 9999 <checkpoint dir>` | ||
| */ | ||
| object NetworkEventTimeWindow { |
There was a problem hiding this comment.
Just rename to EventTimeWindow.
| * To run this on your local machine, you need to first run a Netcat server | ||
| * `$ nc -lk 9999` | ||
| * and then run the example | ||
| * `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount |
There was a problem hiding this comment.
I think you can just do $ bin/run-example sql.streaming.JavaStructuredNetworkWordCount. Verify that, and if it works, please change it.
|
Test build #61389 has finished for PR 13816 at commit
|
|
Test build #61413 has finished for PR 13816 at commit
|
|
|
||
| if __name__ == "__main__": | ||
| if len(sys.argv) != 3: | ||
| print("Usage: network_wordcount.py <hostname> <port>", file=sys.stderr) |
|
LGTM. Merging this to master and 2.0. Thank @jjthomas |
## What changes were proposed in this pull request? Network word count example for structured streaming ## How was this patch tested? Run locally Author: James Thomas <jamesjoethomas@gmail.com> Author: James Thomas <jamesthomas@Jamess-MacBook-Pro.local> Closes #13816 from jjthomas/master. (cherry picked from commit 3554713) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
Test build #61421 has finished for PR 13816 at commit
|
What changes were proposed in this pull request?
Network word count example for structured streaming
How was this patch tested?
Run locally