Skip to content

Commit

Permalink
[SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream sou…
Browse files Browse the repository at this point in the history
…rce are in a wrong table

## What changes were proposed in this pull request?

The description for several options of File Source for structured streaming appeared in the File Sink description instead.

This pull request has two commits: The first includes changes to the version as it appeared in spark 2.1 and the second handled an additional option added for spark 2.2

## How was this patch tested?

Built the documentation by SKIP_API=1 jekyll build and visually inspected the structured streaming programming guide.

The original documentation was written by tdas and lw-lin

Author: assafmendelson <assaf.mendelson@gmail.com>

Closes #18342 from assafmendelson/spark-21123.
  • Loading branch information
assafmendelson authored and zsxwing committed Jun 19, 2017
1 parent e92ffe6 commit 66a792c
Showing 1 changed file with 15 additions and 13 deletions.
28 changes: 15 additions & 13 deletions docs/structured-streaming-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,20 @@ Here are the details of all the sources in Spark.
<td><b>File source</b></td>
<td>
<code>path</code>: path to the input directory, and common to all file formats.
<br/><br/>
<br/>
<code>maxFilesPerTrigger</code>: maximum number of new files to be considered in every trigger (default: no max)
<br/>
<code>latestFirst</code>: whether to processs the latest new files first, useful when there is a large backlog of files (default: false)
<br/>
<code>fileNameOnly</code>: whether to check new files based on only the filename instead of on the full path (default: false). With this set to `true`, the following files would be considered as the same file, because their filenames, "dataset.txt", are the same:
<br/>
· "file:///dataset.txt"<br/>
· "s3://a/dataset.txt"<br/>
· "s3n://a/b/dataset.txt"<br/>
· "s3a://a/b/c/dataset.txt"<br/>
<br/>

<br/>
For file-format-specific options, see the related methods in <code>DataStreamReader</code>
(<a href="api/scala/index.html#org.apache.spark.sql.streaming.DataStreamReader">Scala</a>/<a href="api/java/org/apache/spark/sql/streaming/DataStreamReader.html">Java</a>/<a href="api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamReader">Python</a>/<a
href="api/R/read.stream.html">R</a>).
Expand Down Expand Up @@ -1234,18 +1247,7 @@ Here are the details of all the sinks in Spark.
<td>Append</td>
<td>
<code>path</code>: path to the output directory, must be specified.
<br/>
<code>maxFilesPerTrigger</code>: maximum number of new files to be considered in every trigger (default: no max)
<br/>
<code>latestFirst</code>: whether to processs the latest new files first, useful when there is a large backlog of files (default: false)
<br/>
<code>fileNameOnly</code>: whether to check new files based on only the filename instead of on the full path (default: false). With this set to `true`, the following files would be considered as the same file, because their filenames, "dataset.txt", are the same:
<br/>
· "file:///dataset.txt"<br/>
· "s3://a/dataset.txt"<br/>
· "s3n://a/b/dataset.txt"<br/>
· "s3a://a/b/c/dataset.txt"<br/>
<br/>
<br/><br/>
For file-format-specific options, see the related methods in DataFrameWriter
(<a href="api/scala/index.html#org.apache.spark.sql.DataFrameWriter">Scala</a>/<a href="api/java/org/apache/spark/sql/DataFrameWriter.html">Java</a>/<a href="api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter">Python</a>/<a
href="api/R/write.stream.html">R</a>).
Expand Down

0 comments on commit 66a792c

Please sign in to comment.