You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug: MetricRepository cannot store metrics of Histogram analyzer with filter correctly. It is stored as Histogram metric without any filter.
Here is a small snippet to demo the issue, using latest build 1.0.4:
val path = <repository_path>
val spark: SparkSession = ...
import spark.implicits._
val inputDF = Seq(("a", 1),("a", 1),("a", 2),("a", 3),("b", 1),("b", 2),("b", 2),("c",1)).toDF("id", "value")
val repository = FileSystemMetricsRepository(spark, path)
val resultKey = ResultKey(System.currentTimeMillis(), Map("tag" -> "test"))
// collect Histogram metrics with filter and store in the repository
val analysisResult = AnalysisRunner
.onData(inputDF)
.useRepository(repository)
.addAnalyzers(Seq(Histogram("value",where=Some("id='a'")), Histogram("value",where=Some("id='b'")), Histogram("value",where=Some("id='c'"))))
.saveOrAppendResult(resultKey)
.run()
// print out the collected metric. It shows Histogram metrics with filter are collected correctly.
AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult).show(false)
// print out the collected metric loading from the repository. Here is the error: the filter is missing. It is stored as Histogram metric without any filter.
println("Data stored in metric repository: ")
repository.load()
.withTagValues(Map("tag" -> "test"))
.getSuccessMetricsAsDataFrame(spark)
.show(false)
Root cause:
filter property is missing in AnalyzerSerializer and MetricSerializer .
In /src/main/scala/com/amazon/deequ/repository/AnalysisResultSerde.scala, line 300:
case histogram: Histogram if histogram.binningUdf.isEmpty =>
result.addProperty(ANALYZER_NAME_FIELD, "Histogram")
result.addProperty(COLUMN_FIELD, histogram.column)
result.addProperty("maxDetailBins", histogram.maxDetailBins)
result.addProperty(WHERE_FIELD, histogram.where.orNull) is missing.
In /src/main/scala/com/amazon/deequ/repository/AnalysisResultSerde.scala, line 433:
case "Histogram" =>
Histogram(
json.get(COLUMN_FIELD).getAsString,
None,
json.get("maxDetailBins").getAsInt)
getOptionalWhereParam(json) is missing.
Would you please fix this bug or do I need to fix and submit a PR by myself?
The text was updated successfully, but these errors were encountered:
pwzhong
changed the title
MetricRepository cannot store metrics of Histogram analyzer with filter
Bug: MetricRepository cannot store metrics of Histogram analyzer with filter
Aug 4, 2020
Bug: MetricRepository cannot store metrics of Histogram analyzer with filter correctly. It is stored as Histogram metric without any filter.
Here is a small snippet to demo the issue, using latest build 1.0.4:
Result:
Root cause:
filter property is missing in AnalyzerSerializer and MetricSerializer .
result.addProperty(WHERE_FIELD, histogram.where.orNull)
is missing.getOptionalWhereParam(json)
is missing.Would you please fix this bug or do I need to fix and submit a PR by myself?
The text was updated successfully, but these errors were encountered: