Skip to content

feat: add ECDF plot visualization operator#4406

Merged
mengw15 merged 10 commits into
apache:mainfrom
eugenegujing:feat/ecdf-plot
Apr 21, 2026
Merged

feat: add ECDF plot visualization operator#4406
mengw15 merged 10 commits into
apache:mainfrom
eugenegujing:feat/ecdf-plot

Conversation

@eugenegujing
Copy link
Copy Markdown
Contributor

@eugenegujing eugenegujing commented Apr 17, 2026

PR Description

Purpose

Adds a new Empirical Cumulative Distribution Function (ECDF) plot operator to the Statistical Visualization group, letting users visualize the cumulative distribution of a numeric column and easily compare distributions across groups.

Summary

  • New operator ECDFPlotOpDesc under operator/visualization/ecdfPlot/, rendered via plotly.express.ecdf.

  • Configurable fields:

    • Value Column (required, numeric): column to compute ECDF on
    • Color Column (optional): group and color lines by category
    • SeparateBy Column (optional): split plot into facets
    • Y Axis Mode: probability / count / sum
    • CDF Mode: standard / reversed / complementary
    • Orientation: vertical / horizontal
    • Show Markers / Show Lines toggles
    • Marginal Plot: "" / histogram / rug
  • Registered in LogicalOp.scala as the ECDFPlot operator type.

  • Added operator icon frontend/src/assets/operator_images/ECDFPlot.png.

  • User-provided enum fields (cdfMode, orientation, marginal) use EncodableString so generated Python safely passes PythonCodeRawInvalidTextSpec.

  • Unit tests in ECDFPlotOpDescSpec covering the empty-value assertion and the generated figure with all optional parameters.

Test

  • sbt scalafmtCheckAll passes
  • sbt "scalafixAll --check" passes
  • sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.visualization.ecdfPlot.ECDFPlotOpDescSpec org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"
    — all tests pass (4/4), 110/110 raw-invalid OK, 110/110 py_compile OK
  • Manually tested end-to-end in the UI with CSV source → ECDF Plot operator; verified the rendered plot in the result panel for all combinations of color/facet/CDF mode/orientation/marginal options.

Screenshots

image [ecdf_demo.csv](https://github.com/user-attachments/files/26914412/ecdf_demo.csv)

@github-actions github-actions Bot added feature frontend Changes related to the frontend GUI common labels Apr 17, 2026
@chenlica
Copy link
Copy Markdown
Contributor

@gracecluvohio please review it. After that, @mengw15 can review it as a committer.

@chenlica chenlica requested a review from mengw15 April 17, 2026 16:15
mengw15 and others added 2 commits April 19, 2026 14:22
  - Rename "facets" to "subplots" in SeparateBy Column description
  - Expand CDF Mode description with formulas for each mode
    (standard / reversed / complementary)
  - Replace technical term "ECDF trace" with "ECDF line" in Show Markers
  - Remove Show Lines field: ECDF is inherently a step-line plot, and the points-only rendering was not a meaningful configuration. This also eliminates the confusing fallback where lines were still drawn when both Show Lines and Show Markers were off.
@eugenegujing
Copy link
Copy Markdown
Contributor Author

@mengw15 I have finished what Grace asked me to revise. Could you please review it for finalization?

@mengw15
Copy link
Copy Markdown
Contributor

mengw15 commented Apr 20, 2026

@mengw15 I have finished what Grace asked me to revise. Could you please review it for finalization?

Please make sure all conversations are marked resolved if you resolve them, and @gracecluvohio please take a look of the changes. I will review after Grace's approval

@gracecluvohio
Copy link
Copy Markdown
Contributor

LGTM! @mengw15 please take a look.

Copy link
Copy Markdown
Contributor

@mengw15 mengw15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

@mengw15
Copy link
Copy Markdown
Contributor

mengw15 commented Apr 20, 2026

remember to run scalafmtAll for formatting checks.

@mengw15
Copy link
Copy Markdown
Contributor

mengw15 commented Apr 20, 2026

Please update the link of the test csv file in the PR description, it seems not working right now

  - Use OperatorInfo.forVisualization(...) helper instead of constructing OperatorInfo manually, matching other visualization operators
  - Rename the empty-string option in the Marginal Plot enum to "none" for clarity, and update the conditional check accordingly
  - Drop the now-unused showLines assertion from ECDFPlotOpDescSpec(the field was removed in the previous commit)
Copy link
Copy Markdown
Contributor

@mengw15 mengw15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mengw15 mengw15 merged commit 782069d into apache:main Apr 21, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common feature frontend Changes related to the frontend GUI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants