Skip to content

Commit

Permalink
[SPARK-21069][SS][DOCS] Add rate source to programming guide.
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

SPARK-20979 added a new structured streaming source: Rate source. This patch adds the corresponding documentation to programming guide.

## How was this patch tested?

Tested by running jekyll locally.

Author: Prashant Sharma <prashant@apache.org>
Author: Prashant Sharma <prashsh1@in.ibm.com>

Closes #18562 from ScrapCodes/spark-21069/rate-source-docs.
  • Loading branch information
ScrapCodes authored and zsxwing committed Jul 8, 2017
1 parent 9760c15 commit d0bfc67
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions docs/structured-streaming-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -499,6 +499,8 @@ There are a few built-in sources.

- **Socket source (for testing)** - Reads UTF8 text data from a socket connection. The listening server socket is at the driver. Note that this should be used only for testing as this does not provide end-to-end fault-tolerance guarantees.

- **Rate source (for testing)** - Generates data at the specified number of rows per second, each output row contains a `timestamp` and `value`. Where `timestamp` is a `Timestamp` type containing the time of message dispatch, and `value` is of `Long` type containing the message count, starting from 0 as the first row. This source is intended for testing and benchmarking.

Some sources are not fault-tolerant because they do not guarantee that data can be replayed using
checkpointed offsets after a failure. See the earlier section on
[fault-tolerance semantics](#fault-tolerance-semantics).
Expand Down Expand Up @@ -546,6 +548,19 @@ Here are the details of all the sources in Spark.
<td>No</td>
<td></td>
</tr>
<tr>
<td><b>Rate Source</b></td>
<td>
<code>rowsPerSecond</code> (e.g. 100, default: 1): How many rows should be generated per second.<br/><br/>
<code>rampUpTime</code> (e.g. 5s, default: 0s): How long to ramp up before the generating speed becomes <code>rowsPerSecond</code>. Using finer granularities than seconds will be truncated to integer seconds. <br/><br/>
<code>numPartitions</code> (e.g. 10, default: Spark's default parallelism): The partition number for the generated rows. <br/><br/>
The source will try its best to reach <code>rowsPerSecond</code>, but the query may be resource constrained, and <code>numPartitions</code> can be tweaked to help reach the desired speed.
</td>
<td>Yes</td>
<td></td>
</tr>

<tr>
<td><b>Kafka Source</b></td>
<td>
Expand Down

0 comments on commit d0bfc67

Please sign in to comment.