Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33607][SS][WEBUI] Input Rate timeline/histogram aren't rendered if built with Scala 2.13 #30546

Closed
wants to merge 1 commit into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Nov 30, 2020

What changes were proposed in this pull request?

This PR fixes an issue that the histogram and timeline aren't rendered in the Streaming Query Statistics page if we built Spark with Scala 2.13.

before-fix-the-issue
NaN_Error

The reason is maxRecordRate can be NaN for Scala 2.13.

The NaN is the result of query.recentProgress.map(_.inputRowsPerSecond).max when the first element of query.recentProgress.map(_.inputRowsPerSecond) is NaN.
Actually, the comparison logic for Double type was changed in Scala 2.13.
scala/bug#12107
scala/scala#6410

So this issue happens as of Scala 2.13.

The root cause of the NaN is here.
This NaN seems to be an initial value of inputTimeSec so I think Double.PositiveInfinity is suitable rather than NaN and this change can resolve this issue.

Why are the changes needed?

To make sure we can use the histogram/timeline with Scala 2.13.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

First, I built with the following commands.

$ /dev/change-scala-version.sh 2.13
$ build/sbt -Phive -Phive-thriftserver -Pscala-2.13 package

Then, ran the following query (this is brought from #30427 ).

import org.apache.spark.sql.streaming.Trigger
val query = spark
  .readStream
  .format("rate")
  .option("rowsPerSecond", 1000)
  .option("rampUpTime", "10s")
  .load()
  .selectExpr("*", "CAST(CAST(timestamp AS BIGINT) - CAST((RAND() * 100000) AS BIGINT) AS TIMESTAMP) AS tsMod")
  .selectExpr("tsMod", "mod(value, 100) as mod", "value")
  .withWatermark("tsMod", "10 seconds")
  .groupBy(window($"tsMod", "1 minute", "10 seconds"), $"mod")
  .agg(max("value").as("max_value"), min("value").as("min_value"), avg("value").as("avg_value"))
  .writeStream
  .format("console")
  .trigger(Trigger.ProcessingTime("5 seconds"))
  .outputMode("append")
  .start()

Finally, I confirmed that the timeline and histogram are rendered.
after-fix-the-issue

@sarutak sarutak changed the title [SPARK-33607][SS][WEBUI] Input Rate isn't rendered if built with Scala 2.13 [SPARK-33607][SS][WEBUI] Input Rate timeline/histogram aren't rendered if built with Scala 2.13 Nov 30, 2020
@sarutak
Copy link
Member Author

sarutak commented Nov 30, 2020

cc: @HeartSaVioR

@srowen
Copy link
Member

srowen commented Nov 30, 2020

Seems reasonable to me, unless there's a clean way to handle NaN. I only wonder if this value is used elsewhere where a big positive value is unexpected.

@SparkQA
Copy link

SparkQA commented Nov 30, 2020

Test build #131986 has finished for PR 30546 at commit cecf62f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Nov 30, 2020

Nice finding! Is the calculation numRecords / inputTimeSec (= Double.PositiveInfinity) consistent between 2.12 and 2.13? I guess you've experimented for both versions, but just to confirm again.

@sarutak
Copy link
Member Author

sarutak commented Dec 1, 2020

@srowen

I only wonder if this value is used elsewhere where a big positive value is unexpected.

inputTimeSec is a local val and it's used only for numRecords / inputTimeSec so I believe this change doesn't affect elsewhere.

@HeartSaVioR

Is the calculation numRecords / inputTimeSec (= Double.PositiveInfinity) consistent between 2.12 and 2.13? I guess you've experimented for both versions, but just to confirm again.

Yes, I have confirmed the result of numRecords / Double.PositiveInfinity is zero for both 2.12 and 2.13.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Dec 1, 2020

This is a single line change, and both Jenkins and GA passed. Merging to master. (I'll skip porting back as Scala 2.13 compatible doesn't look to be a target for 3.0.x.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants