Skip to content

added suppport for parsing bigQuery data#53

Merged
minskya merged 6 commits intomainfrom
DATAFLINT-4341
Mar 22, 2026
Merged

added suppport for parsing bigQuery data#53
minskya merged 6 commits intomainfrom
DATAFLINT-4341

Conversation

@minskya
Copy link
Copy Markdown
Contributor

@minskya minskya commented Mar 19, 2026

No description provided.

@notion-workspace
Copy link
Copy Markdown

minskya added 3 commits March 22, 2026 10:43
…OSS UI

  - Add BigQuery scan detection: BatchScan nodes with "Reading table [...]"
    in their description get isBigQueryRead=true and the fully-qualified
    table name; BQ metrics (bq rows, bq bytes read, bq scan/parse time, etc.)
    are allowlisted, transformed, and renamed
  - Add WriteToIceberg plan parser: extracts tableName and format from
    IcebergWrite(table=..., format=...) descriptions for AppendData,
    ReplaceData, WriteDelta, DeleteFromTable, OverwriteByExpression, and
    OverwritePartitionsDynamic nodes; Iceberg-specific enriched names only
    shown when the plan confirms it's actually an Iceberg write; generic
    fallback names used otherwise
  - Add PlanMetricsProcessor case for WriteToIceberg to display table name
    and format in the UI
  - Move sqlMetricsPatchApplied to shared SqlMetricsPatch object in
    dataflint-common, eliminating duplication between pluginspark3 and
    pluginspark4
@minskya minskya marked this pull request as ready for review March 22, 2026 09:41
@minskya minskya requested a review from menishmueli March 22, 2026 09:41
Comment thread spark-plugin/build.sbt Outdated
libraryDependencies += "com.amazonaws" % "aws-java-sdk-s3" % "1.12.470" % "provided",
libraryDependencies += "org.apache.iceberg" %% "iceberg-spark-runtime-3.5" % "1.5.0" % "provided",
libraryDependencies += "io.delta" %% "delta-spark" % "3.2.0" % "provided",
libraryDependencies += "org.javassist" % "javassist" % "3.30.2-GA" % "provided",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

"remote bytes read": "shuffle read (remote)",
"fetch wait time": "fetch wait time",
"data size": "shuffle data size",
"time spent in spark": "bq time in spark",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change names to out standard names. "number of BQ bytes read" should be "bytes read". "bq rows" to "rows"

minskya added 2 commits March 22, 2026 17:30
…mpatibility

  - Add google-cloud-dataproc, google-cloud-storage, and spark-bigquery-metrics JARs to the history server image, with version build args exposed in docker-compose
  - Make Iceberg and BigQuery connector downloads graceful — skip with a warning instead of failing when the artifact doesn't exist for the given Spark version (fixes builds for Spark 4.1+)

  spark-plugin: revert SQLMetrics javassist patch

  - Remove SqlMetricsPatch.scala and its references from both pluginspark3 and pluginspark4 — the patch was reverted
  - Remove javassist provided dependency from build.sbt

  spark-ui: normalize BigQuery metric display names

  - Remove the bq prefix from BigQuery metric labels in SqlReducerUtils.ts ("bq rows" → "rows", "bq scan time" → "scan time", etc.) for cleaner UI display
@minskya minskya merged commit 261fd3f into main Mar 22, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants