Skip to content

Add HuggingFace question answering and ranking tasks #5292

@anishshiva7

Description

@anishshiva7

Task Summary

Feature Summary

The HuggingFace inference operator (#5041) is being landed as a sequence of focused task-family PRs. The dispatcher + per-task codegen architecture was introduced in #5277, and subsequent task families plug into that structure by adding dedicated TaskCodegen implementations and registering their task strings in HuggingFaceInferenceOpDesc.

This issue covers adding the question-answering and ranking task families to the HuggingFace inference operator.

Concretely, landing this would enable:

  • question-answering
  • table-question-answering
  • zero-shot-classification
  • sentence-similarity
  • text-ranking

The implementation should keep task-specific Python payload and parse logic in a separate QaRankingCodegen file, while shared validation and table setup stay in PythonCodegenBase.

Proposed Solution or Design

Add a new file under:

common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/

File Purpose
QaRankingCodegen.scala Payload and response parsing for QA, zero-shot classification, sentence similarity, and text ranking

Modify:

File Change
HuggingFaceInferenceOpDesc.scala Add QA/ranking fields and register QaRankingCodegen
TaskCodegen.scala Extend CodegenContext with QA/ranking fields
PythonCodegenBase.scala Add context/sentences column validation and table-QA table payload setup
HuggingFaceInferenceOpDescSpec.scala Add descriptor/codegen coverage for QA/ranking tasks

Design constraints:

References:

Impact / Priority

(P2) Medium — required for broader HuggingFace operator task coverage. Does not affect existing operators.

Affected Area

Workflow Engine (Amber) — HuggingFace operator descriptor and Python codegen.

Task Type

Testing / QA

Other

Task Type

  • Refactor / Cleanup
  • DevOps / Deployment / CI
  • Testing / QA
  • Documentation
  • Performance
  • Other

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions