feat(model): remove coreference resolution task by hanneshapke · Pull Request #286 · dataiku/kiji-proxy

hanneshapke · 2026-03-29T23:55:58Z

Summary

Remove coreference resolution task from the entire training pipeline
Rename MultiTaskPIIDetectionModel → PIIDetectionModel, MultiTaskTrainer → PIIModelTrainer
Remove MultiTaskLoss, coref classifier, coref loss weights, coref metrics
Simplify preprocessing to skip coreference tokenization
Remove create_coreference_sample from tokenization.py
Update ONNX export to only output pii_logits
Update eval, comparison, and pipeline scripts for single-task model

Motivation

The coref task competes for model capacity with noisy synthetic supervision and untuned loss weights. Removing it dedicates 100% of encoder gradients to PII detection — the primary and most critical task. This also simplifies the codebase significantly (-693 lines).

Test plan

Train model and verify PII F1 is equal or better than multi-task baseline
Verify ONNX export produces correct single-output model
Run eval_model.py and compare_models.py to ensure they work with new model
Verify Go proxy inference returns empty coref dict for backward compatibility

Closes #259

…tion Remove the coreference resolution task from the entire training pipeline to dedicate 100% of encoder capacity to PII detection. This simplifies the model architecture, training, evaluation, and ONNX export. Changes across model.py, trainer.py, config.py, preprocessing.py, tokenization.py, quantitize.py, train.py, eval_model.py, eval_model_detailed.py, compare_models.py, and training_pipeline.py. Closes #259

hanneshapke added 6 commits March 29, 2026 16:53

go-fix

a183ac4

removed coref

603c46f

linter fix

71989c7

fix run

2f87e80

WIP

3660bb7

hanneshapke merged commit cb7fd7b into main Mar 31, 2026
6 checks passed

hanneshapke deleted the feat/remove-coreference-task branch March 31, 2026 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(model): remove coreference resolution task#286

feat(model): remove coreference resolution task#286
hanneshapke merged 6 commits intomainfrom
feat/remove-coreference-task

hanneshapke commented Mar 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hanneshapke commented Mar 29, 2026

Summary

Motivation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant