Skip to content

feat(model): remove coreference resolution task#286

Merged
hanneshapke merged 6 commits intomainfrom
feat/remove-coreference-task
Mar 31, 2026
Merged

feat(model): remove coreference resolution task#286
hanneshapke merged 6 commits intomainfrom
feat/remove-coreference-task

Conversation

@hanneshapke
Copy link
Copy Markdown
Collaborator

Summary

  • Remove coreference resolution task from the entire training pipeline
  • Rename MultiTaskPIIDetectionModelPIIDetectionModel, MultiTaskTrainerPIIModelTrainer
  • Remove MultiTaskLoss, coref classifier, coref loss weights, coref metrics
  • Simplify preprocessing to skip coreference tokenization
  • Remove create_coreference_sample from tokenization.py
  • Update ONNX export to only output pii_logits
  • Update eval, comparison, and pipeline scripts for single-task model

Motivation

The coref task competes for model capacity with noisy synthetic supervision and untuned loss weights. Removing it dedicates 100% of encoder gradients to PII detection — the primary and most critical task. This also simplifies the codebase significantly (-693 lines).

Test plan

  • Train model and verify PII F1 is equal or better than multi-task baseline
  • Verify ONNX export produces correct single-output model
  • Run eval_model.py and compare_models.py to ensure they work with new model
  • Verify Go proxy inference returns empty coref dict for backward compatibility

Closes #259

…tion

Remove the coreference resolution task from the entire training pipeline
to dedicate 100% of encoder capacity to PII detection. This simplifies
the model architecture, training, evaluation, and ONNX export.

Changes across model.py, trainer.py, config.py, preprocessing.py,
tokenization.py, quantitize.py, train.py, eval_model.py,
eval_model_detailed.py, compare_models.py, and training_pipeline.py.

Closes #259
@hanneshapke hanneshapke merged commit cb7fd7b into main Mar 31, 2026
6 checks passed
@hanneshapke hanneshapke deleted the feat/remove-coreference-task branch March 31, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(model): disable coreference resolution task and focus training on PII detection

1 participant