Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23096][SS] Migrate rate source to V2 #20688

Closed
wants to merge 1 commit into from

Conversation

jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

This PR migrate micro batch rate source to V2 API and rewrite UTs to suite V2 test.

How was this patch tested?

UTs.

@SparkQA
Copy link

SparkQA commented Feb 28, 2018

Test build #87740 has finished for PR 20688 at commit 538223e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2018

Test build #87751 has finished for PR 20688 at commit 8bfadc3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Feb 28, 2018

Test build #87799 has finished for PR 20688 at commit 8bfadc3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

@tdas @jose-torres can you please take a review when you have time, thanks!

@SparkQA
Copy link

SparkQA commented Mar 5, 2018

Test build #87951 has finished for PR 20688 at commit 5a29d62.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

override def shortName(): String = "rate"
}

class RateStreamMicroBatchReader(options: DataSourceOptions, checkpointLocation: String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split this into a separate file.

import org.apache.spark.sql.streaming.StreamTest
import org.apache.spark.util.ManualClock

class RateSourceSuite extends StreamTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you not move this file using "git mv" and then change? Then we would have been able to diff it properly.
This was a pain in the text socket v2 PR as well :(

Copy link
Contributor Author

@jerryshao jerryshao Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tdas , I think I used "git mv", the thing is that when the diff is larger then x%, it will treat as "git rm" and "git add" (https://makandracards.com/makandra/30957-git-how-to-get-a-useful-diff-when-renaming-files).

Sorry about the inconvenience, but I'm not sure if there's other approaches.

import org.apache.spark.sql.execution.streaming.{RateSourceProvider, RateStreamOffset, ValueRunTimeMsPair}
import org.apache.spark.sql.execution.streaming.sources.RateStreamSourceV2
import org.apache.spark.sql.execution.streaming.{RateStreamOffset, ValueRunTimeMsPair}
import org.apache.spark.sql.execution.streaming.sources.RateSourceProvider
import org.apache.spark.sql.sources.v2.DataSourceOptions
import org.apache.spark.sql.sources.v2.reader._
import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousDataReader, ContinuousReader, Offset, PartitionOffset}
Copy link
Contributor

@tdas tdas Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make the class names and file names of the different readers consistent with each other? Similar to Kafka?

RateStreamProvider
RateStreamMicroBatchReader, RateStreamMicroBatchDataReaderFactory ....
RateStreamContinuousReader, ....

@jerryshao
Copy link
Contributor Author

@tdas I tried different ways to keep history of rename, but seems git always treat as "rm and add" for big changes (https://coderwall.com/p/_csouq/renaming-and-changing-files-in-git-without-losing-history). Sorry to bring inconvenience. Do you have any better solutions?

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88085 has finished for PR 20688 at commit 63ed0b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88086 has finished for PR 20688 at commit 63ed0b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jose-torres
Copy link
Contributor

Sorry it took me so long to get to this. LGTM

@asfgit asfgit closed this in c68ec4e Mar 27, 2018
@jerryshao
Copy link
Contributor Author

Thanks @tdas and @jose-torres .

@gatorsmile
Copy link
Member

Sorry, I need to revert this PR. It breaks our build. Could you resubmit the PR after fixing the tests?

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6/

@gatorsmile
Copy link
Member

Next time, we should re-trigger the tests before merging the code. The most recent tests ran 20 days ago

jose-torres pushed a commit to jose-torres/spark that referenced this pull request Mar 28, 2018
## What changes were proposed in this pull request?

This PR migrate micro batch rate source to V2 API and rewrite UTs to suite V2 test.

## How was this patch tested?

UTs.

Author: jerryshao <sshao@hortonworks.com>

Closes apache#20688 from jerryshao/SPARK-23096.
asfgit pushed a commit that referenced this pull request Mar 30, 2018
## What changes were proposed in this pull request?

Roll forward c68ec4e (#20688).

There are two minor test changes required:

* An error which used to be TreeNodeException[ArithmeticException] is no longer wrapped and is now just ArithmeticException.
* The test framework simply does not set the active Spark session. (Or rather, it doesn't do so early enough - I think it only happens when a query is analyzed.) I've added the required logic to SQLTestUtils.

## How was this patch tested?

existing tests

Author: Jose Torres <torres.joseph.f+github@gmail.com>
Author: jerryshao <sshao@hortonworks.com>

Closes #20922 from jose-torres/ratefix.
mshtelma pushed a commit to mshtelma/spark that referenced this pull request Apr 5, 2018
## What changes were proposed in this pull request?

This PR migrate micro batch rate source to V2 API and rewrite UTs to suite V2 test.

## How was this patch tested?

UTs.

Author: jerryshao <sshao@hortonworks.com>

Closes apache#20688 from jerryshao/SPARK-23096.
mshtelma pushed a commit to mshtelma/spark that referenced this pull request Apr 5, 2018
## What changes were proposed in this pull request?

Roll forward c68ec4e (apache#20688).

There are two minor test changes required:

* An error which used to be TreeNodeException[ArithmeticException] is no longer wrapped and is now just ArithmeticException.
* The test framework simply does not set the active Spark session. (Or rather, it doesn't do so early enough - I think it only happens when a query is analyzed.) I've added the required logic to SQLTestUtils.

## How was this patch tested?

existing tests

Author: Jose Torres <torres.joseph.f+github@gmail.com>
Author: jerryshao <sshao@hortonworks.com>

Closes apache#20922 from jose-torres/ratefix.
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
Roll forward c68ec4e (apache#20688).

There are two minor test changes required:

* An error which used to be TreeNodeException[ArithmeticException] is no longer wrapped and is now just ArithmeticException.
* The test framework simply does not set the active Spark session. (Or rather, it doesn't do so early enough - I think it only happens when a query is analyzed.) I've added the required logic to SQLTestUtils.

existing tests

Author: Jose Torres <torres.joseph.f+github@gmail.com>
Author: jerryshao <sshao@hortonworks.com>

Closes apache#20922 from jose-torres/ratefix.

Ref: LIHADOOP-48531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants