Skip to content

[SPARK-16430][SQL][STREAMING] Add option maxFilesPerTrigger#14094

Closed
tdas wants to merge 3 commits intoapache:masterfrom
tdas:SPARK-16430
Closed

[SPARK-16430][SQL][STREAMING] Add option maxFilesPerTrigger#14094
tdas wants to merge 3 commits intoapache:masterfrom
tdas:SPARK-16430

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented Jul 7, 2016

What changes were proposed in this pull request?

An option that limits the file stream source to read 1 file at a time enables rate limiting. It has the additional convenience that a static set of files can be used like a stream for testing as this will allows those files to be considered one at a time.

This PR adds option maxFilesPerTrigger.

How was this patch tested?

New unit test

@tdas
Copy link
Contributor Author

tdas commented Jul 7, 2016

@marmbrus @zsxwing

import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
import org.apache.spark.sql.execution.datasources.{CaseInsensitiveMap, DataSource, ListingFileCatalog, LogicalRelation}
import org.apache.spark.sql.types.StructType
import org.apache.spark.util.Utils
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

@marmbrus
Copy link
Contributor

marmbrus commented Jul 7, 2016

LGTM

@SparkQA
Copy link

SparkQA commented Jul 7, 2016

Test build #61927 has finished for PR 14094 at commit ddd9426.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 7, 2016

Test build #61928 has finished for PR 14094 at commit c591007.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor Author

tdas commented Jul 7, 2016

@marmbrus I have also added the option to the docs of the load methods in DataStreamReader.

@SparkQA
Copy link

SparkQA commented Jul 8, 2016

Test build #61936 has finished for PR 14094 at commit 9663b42.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor Author

tdas commented Jul 8, 2016

Merging this to master and 2.0

asfgit pushed a commit that referenced this pull request Jul 8, 2016
## What changes were proposed in this pull request?

An option that limits the file stream source to read 1 file at a time enables rate limiting. It has the additional convenience that a static set of files can be used like a stream for testing as this will allows those files to be considered one at a time.

This PR adds option `maxFilesPerTrigger`.

## How was this patch tested?

New unit test

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #14094 from tdas/SPARK-16430.

(cherry picked from commit 5bce458)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@asfgit asfgit closed this in 5bce458 Jul 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants