feat: domain classification and sensitivity disposition workflows#45
Conversation
…rewrite-engine-domain-disposition
|
andreatgretel
left a comment
There was a problem hiding this comment.
clean PR overall - the column-factory pattern is well-suited for the single-NddAdapter.run_workflow() call, and the Pydantic validation on LLM outputs is thorough. left a few inline nits: one missing export in schemas/__all__, an unused constant, a minor defensive-handling inconsistency in _enrich_domain, and a suggestion around exhaustiveness coverage for _DOMAIN_LIST. nothing blocking - approve.
PR #48 moves this function to its final resting place in the pipeline, so we handle exports there rather than adding another interim export from |
andreatgretel
left a comment
There was a problem hiding this comment.
looks good, ship it!
Summary
Implements the first two rewrite pipeline steps as column factories, part of the
broader single-workflow rewrite architecture tracked in #30.
engine/rewrite/domain_classification.py—DomainClassificationWorkflowLLMStructuredColumnConfig_enrich_domaincustom column looks up per-domain guidance fromDOMAIN_SUPPLEMENT_MAP_DOMAIN_LIST(for the prompt) andDOMAIN_SUPPLEMENT_MAP(for enrichment) are intentionally separateengine/rewrite/sensitivity_disposition.py—SensitivityDispositionWorkflowSensitivityDispositionSchemaengine/constants.py— addedCOL_DOMAIN_SUPPLEMENTDesign Decisions
Column factory pattern: workflows expose
columns() -> list[ColumnConfigT]only.No
run()method — all steps will be collected and passed to a singleNddAdapter.run_workflow()call in the top-levelRewriteWorkflow(tracked in #30).Prompt section headers: standardized to XML tags (
<privacy_goal>,<input_tagged_text>, etc.)as XML provides the clearest semantic structure across several model families (gpt-oss, claude, nemotron)
Data summary label: standardized to
Dataset description:in prompts. Python paramstays
data_summaryto matchAnonymizerConfig.Trust DataDesigner output types:
_enrich_domainaccesses.domaindirectly on theDomainClassificationSchemaPydantic object — no defensive dict/fallback handling sinceLLMStructuredColumnConfigguarantees a valid schema instance.Follow-ups
_get_validation_prompt,_get_augment_prompt,_get_latent_prompt) to XML section headers andDataset description:label (TODO insensitivity_disposition.py)Related Issues
Closes #31