test(workflow-operator): pin Sklearn OpDesc registry strings by aglinxinyuan · Pull Request #4827 · apache/texera

aglinxinyuan · 2026-05-03T04:11:06Z

What changes were proposed in this PR?

Add SklearnOpDescRegistrySpec covering every concrete SklearnClassifierOpDesc (25 subclasses) and SklearnTrainingOpDesc (26 subclasses) with the exact (importStatement, userFriendlyModelName) pair each one returns. A typo in either string would silently misroute either the generated Python pipeline or the user-facing UI label; pinning them in one table makes that a test failure.

Also covers:

SklearnTrainingOpDesc base default (RandomForest)
generatePythonCode smoke test for a classifier and a training subclass, verifying the import string is embedded
Completeness check: classpath scan (via PythonReflectionUtils.scanCandidates, the same scanner PythonCodeRawInvalidTextSpec uses) verifies that classifierEntries / trainingEntries cover every concrete subclass on the classpath. Adding a new concrete OpDesc without also adding it to the manual list now fails this spec with a "drift" message listing the missing class names.

Note: SklearnClassifierOpDesc's empty-string base defaults are not exercised here. The earlier attempt to do so via new SklearnClassifierOpDesc {} created an anonymous test-package class under org.apache.texera.amber.operator.sklearn, which the PythonCodeRawInvalidTextSpec classpath scan picked up as a descriptor candidate and failed to instantiate (anonymous classes have no accessible no-arg constructor). The base default is never observable in production anyway, since every concrete subclass overrides both methods. A spec-file comment documents this so a future contributor doesn't reflexively re-add it.

Also includes a small fix to SklearnDummyClassifierOpDesc / SklearnTrainingDummyClassifierOpDesc (corrects from sklearn.dummy import dummy → import DummyClassifier) and SklearnTrainingBaggingOpDesc (corrects the duplicated word in Training: Bagging Training → Training: Bagging), surfaced while writing the registry table.

Any related issues, documentation, discussions?

Closes #4826

How was this PR tested?

sbt "WorkflowOperator/Test/testOnly org.apache.texera.amber.operator.sklearn.SklearnOpDescRegistrySpec"
# → 107 tests, all pass

sbt "WorkflowOperator/Test/testOnly org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"
# → raw-invalid SUMMARY ok=116/116, py-compile SUMMARY ok=116/116

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

Add SklearnOpDescRegistrySpec covering every concrete SklearnClassifierOpDesc (24) and SklearnTrainingOpDesc (26) with the exact (importStatement, userFriendlyModelName) pair each one returns, plus the base-class defaults and a generatePythonCode smoke test for both pipelines (UDFOperatorV2 / UDFTableOperator). A typo in either string would silently misroute either the generated Python pipeline or the user-facing UI label; pinning them in one table makes that a test failure. Closes apache#4826 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov-commenter · 2026-05-03T04:13:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 47.77%. Comparing base (810263c) to head (95a382b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #4827      +/-   ##
============================================
+ Coverage     42.03%   47.77%   +5.73%     
- Complexity     2154     2159       +5     
============================================
  Files           980      818     -162     
  Lines         36292    25972   -10320     
  Branches       3783     2346    -1437     
============================================
- Hits          15257    12407    -2850     
+ Misses        20110    12813    -7297     
+ Partials        925      752     -173

Flag	Coverage Δ
access-control-service	`39.53% <ø> (ø)`
amber	`42.86% <100.00%> (+0.01%)`	⬆️
computing-unit-managing-service	`0.00% <ø> (ø)`
config-service	`0.00% <ø> (ø)`
file-service	`33.24% <ø> (ø)`
workflow-compiling-service	`47.72% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds a ScalaTest spec to prevent regressions in the string “registry” contract for scikit-learn operator descriptors, since typos in (importStatement, userFriendlyModelName) can silently break generated Python pipelines or UI labels.

Changes:

Introduces SklearnOpDescRegistrySpec to pin (import statement, user-friendly model name) for classifier and training OpDesc subclasses.
Adds assertions for base-class defaults on SklearnClassifierOpDesc and SklearnTrainingOpDesc.
Adds generatePythonCode smoke tests verifying the expected import statement is embedded.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ing name Per Copilot feedback on apache#4827: - SklearnDummyClassifierOpDesc and SklearnTrainingDummyClassifierOpDesc imported the symbol "dummy" from sklearn.dummy, which does not exist. Correct to "DummyClassifier" to produce a runnable Python pipeline. - SklearnTrainingBaggingOpDesc duplicated the word "Training" in the user-friendly model name ("Training: Bagging Training"). Correct to "Training: Bagging" to match the rest of the SklearnTraining* registry pattern. - SklearnDummyClassifierOpDesc was missing from the spec's classifier registry; add it (now 25 entries) so a typo in this OpDesc is caught. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eRawInvalidTextSpec CI on apache#4827 was failing in `PythonCodeRawInvalidTextSpec` with: - org.apache.texera.amber.operator.sklearn.SklearnOpDescRegistrySpec$$anon$1 cannot instantiate (needs an accessible no-arg constructor or must be a Scala object) The spec's "base class default" test was instantiating the abstract `SklearnClassifierOpDesc` via `new SklearnClassifierOpDesc {}` — an anonymous test-package class under `org.apache.texera.amber.operator.sklearn`. The `PythonCodeRawInvalidTextSpec` classpath scan picks up everything in `org.apache.texera.amber.operator` that extends `PythonOperatorDescriptor` and tries to instantiate it via no-arg constructor (or a Scala object MODULE$); anonymous classes have no accessible no-arg constructor (they capture an enclosing instance), so the scan fails on it. The base-class empty-string default is never observable in production (every concrete subclass below overrides both methods), so this fix just deletes the test and leaves a comment explaining why no future contributor should re-add an anonymous-subclass instantiation here. PythonCodeRawInvalidTextSpec now reports ok=116/116 (was 116/117). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address Copilot feedback on apache#4827: - Completeness: add classpath-scan completeness assertions for both classifierEntries and trainingEntries. Reuse PythonReflectionUtils .scanCandidates (the same scanner PythonCodeRawInvalidTextSpec uses) so a new concrete SklearnClassifierOpDesc / SklearnTrainingOpDesc subclass cannot silently bypass this registry — adding one without also adding it to the manual list will fail with a "drift" message listing the missing class names. SklearnTrainingOpDesc itself is concrete (used as a default fallback) so it is excluded from the diff for the training side. - Grammar: rephrase the file-level scaladoc from "would silently misroute downstream UI labels and breakage of the generated Python pipeline" to "...labels and cause breakage in the generated Python pipeline". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

aglinxinyuan requested review from Yicong-Huang and Copilot May 3, 2026 04:11

github-actions Bot assigned aglinxinyuan May 3, 2026

github-actions Bot added the common label May 3, 2026

Copilot started reviewing on behalf of aglinxinyuan May 3, 2026 04:11 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

aglinxinyuan enabled auto-merge (squash) May 3, 2026 08:47

Merge branch 'main' into test-sklearn-op-desc-registry-spec

8be3922

aglinxinyuan requested review from Yicong-Huang and removed request for Yicong-Huang May 4, 2026 04:08

Yicong-Huang approved these changes May 4, 2026

View reviewed changes

Yicong-Huang and others added 11 commits May 3, 2026 21:11

Merge branch 'main' into test-sklearn-op-desc-registry-spec

1b8f1d0

Merge branch 'main' into test-sklearn-op-desc-registry-spec

b22042c

Merge branch 'main' into test-sklearn-op-desc-registry-spec

b75dd54

Merge branch 'main' into test-sklearn-op-desc-registry-spec

8f92a7b

Merge branch 'main' into test-sklearn-op-desc-registry-spec

e14ee49

Merge branch 'main' into test-sklearn-op-desc-registry-spec

237585c

Merge branch 'main' into test-sklearn-op-desc-registry-spec

a225b92

Merge branch 'main' into test-sklearn-op-desc-registry-spec

44974be

Merge branch 'main' into test-sklearn-op-desc-registry-spec

93aae94

Merge branch 'main' into test-sklearn-op-desc-registry-spec

fbd1d31

aglinxinyuan requested a review from Copilot May 4, 2026 08:30

Copilot started reviewing on behalf of aglinxinyuan May 4, 2026 08:31 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

aglinxinyuan requested a review from Copilot May 4, 2026 08:41

Copilot started reviewing on behalf of aglinxinyuan May 4, 2026 08:42 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

aglinxinyuan merged commit ef0634d into apache:main May 4, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(workflow-operator): pin Sklearn OpDesc registry strings#4827

test(workflow-operator): pin Sklearn OpDesc registry strings#4827
aglinxinyuan merged 15 commits into
apache:mainfrom
aglinxinyuan:test-sklearn-op-desc-registry-spec

aglinxinyuan commented May 3, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aglinxinyuan commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

codecov-commenter commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aglinxinyuan commented May 3, 2026 •

edited

Loading

codecov-commenter commented May 3, 2026 •

edited

Loading