Skip to content

Workflow question: Can you restrict automatic NER pre-annotation to simplify the annotation process? #528

@LeJudith

Description

@LeJudith

Thanks for the help so far!

Problem

I am trying to use Medcat to annotate main clinically relevant findings in pathology reports using a large pretrained SNOMED International CDB in Medcattrainer. My goal is not to annotate every SNOMED concept in the text. I am mainly interested in the main clinically relevant findings, for example histological type, grade, ER/PR/HER2/Ki-67, margins, lymphovascular invasion (as highlighted in blue in the screenshot). Currently, each document is automatically pre-annotated with many concepts such as ("material", "size", "cells", "protocol", etc.) that are not relevant for my use case. However, I still need to manually mark as "terminate" or "incorrect" before I can submit. This makes the annotation process quite slow. I guess that my use case is also quite different from the intended use of Medcat, but I am wondering if there is a better way.

I have a curated whitelist of ~200 clinically relevant CUIs per organ. The CUI File project filter restricts concept lookup but does not restrict automatic pre-annotation, so irrelevant concepts outside the whitelist are still automatically recognised in grey.

Image

Questions

  • Is there a supported way to disable automatic NER pre-annotation entirely, while keeping the full CDB available for manual concept lookup?
  • Alternatively, can I restrict automatic pre-annotation to only the CUIs in the project CUI File, while keeping the full CDB available for manual lookup when concepts are missing from the whitelist?
  • Or would it be better to build a small CDB containing only the ~200 whitelisted concepts? If so, can I still manually search and annotate concepts outside that CDB using the CDB search filter, or does every concept need to be fully added including CUI, name, and synonyms — which also seems inefficient?
  • Or is the best approach to continue with the full CDB and terminate unwanted concepts as negative training examples?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions