Task Summary
Feature Summary
The HuggingFace inference operator (#5041) is being landed as a sequence of focused task-family PRs. The dispatcher + per-task codegen architecture was introduced in #5277, and subsequent task families plug into that structure by adding dedicated TaskCodegen implementations and registering their task strings in HuggingFaceInferenceOpDesc.
This issue covers adding the question-answering and ranking task families to the HuggingFace inference operator.
Concretely, landing this would enable:
question-answering
table-question-answering
zero-shot-classification
sentence-similarity
text-ranking
The implementation should keep task-specific Python payload and parse logic in a separate QaRankingCodegen file, while shared validation and table setup stay in PythonCodegenBase.
Proposed Solution or Design
Add a new file under:
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/
| File |
Purpose |
QaRankingCodegen.scala |
Payload and response parsing for QA, zero-shot classification, sentence similarity, and text ranking |
Modify:
| File |
Change |
HuggingFaceInferenceOpDesc.scala |
Add QA/ranking fields and register QaRankingCodegen |
TaskCodegen.scala |
Extend CodegenContext with QA/ranking fields |
PythonCodegenBase.scala |
Add context/sentences column validation and table-QA table payload setup |
HuggingFaceInferenceOpDescSpec.scala |
Add descriptor/codegen coverage for QA/ranking tasks |
Design constraints:
References:
Impact / Priority
(P2) Medium — required for broader HuggingFace operator task coverage. Does not affect existing operators.
Affected Area
Workflow Engine (Amber) — HuggingFace operator descriptor and Python codegen.
Task Type
Testing / QA
Other
Task Type
Task Summary
Feature Summary
The HuggingFace inference operator (#5041) is being landed as a sequence of focused task-family PRs. The dispatcher + per-task codegen architecture was introduced in #5277, and subsequent task families plug into that structure by adding dedicated
TaskCodegenimplementations and registering their task strings inHuggingFaceInferenceOpDesc.This issue covers adding the question-answering and ranking task families to the HuggingFace inference operator.
Concretely, landing this would enable:
question-answeringtable-question-answeringzero-shot-classificationsentence-similaritytext-rankingThe implementation should keep task-specific Python payload and parse logic in a separate
QaRankingCodegenfile, while shared validation and table setup stay inPythonCodegenBase.Proposed Solution or Design
Add a new file under:
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/QaRankingCodegen.scalaModify:
HuggingFaceInferenceOpDesc.scalaQaRankingCodegenTaskCodegen.scalaCodegenContextwith QA/ranking fieldsPythonCodegenBase.scalaHuggingFaceInferenceOpDescSpec.scalaDesign constraints:
QaRankingCodegen.scala.EncodableString+pyb"..."safety for user-provided string fields.generatePythonCodetotal so arbitrary@JsonPropertyvalues do not throw during code generation.References:
Impact / Priority
(P2) Medium — required for broader HuggingFace operator task coverage. Does not affect existing operators.
Affected Area
Workflow Engine (Amber) — HuggingFace operator descriptor and Python codegen.
Task Type
Testing / QA
Other
Task Type