Skip to content

NadiaSaeed/MedTCS

Repository files navigation

MedTCS: Medical terminology-based computing system: a lightweight post-processing solution for out-of-vocabulary multi-word terms [1]

This project aims to provide an Medical Terminology based Computing System (MedTCS): a lightweight post-processing solution for out-of-vocabulary(OOV) multi-word terms. MedTCS is a natural language processing system helps the distributed representation models (like: Word2Vec, GloVe) to handle the OOV problem effectively.

The below image shows how the biomedical/clinical terms components deliver a maningful information.

Meta-data Collection

In MedTCS, we have build meta-dictionaries for the prefixes, roots, and suffixes defining the meanings of medical term components. The three semantic dictionaries contain 467 root words, 432 prefixes, and 112 suffixes, along with their corresponding meanings as shown in Fig 1.

MedTCS Framework

MedTCS module to encode OOV words from a set of sentences or words have following steps:

  • OOV Word Detector
  • Pluralizer/Singularizer
  • Term Parser
  • Term Segmenter

MedTCS:Term Segmenter model

The pre-trained term segmenter model returned meaningful sub-words of an unknown term (like seasickness → sea+sick+ness).

Example

Mode Sentences
Original Input flavoxate hydrochloride tablets are indicated for symptomatic relief of dysuria urgency nocturia suprapubic pain frequency and incontinence as may occur in cystitis prostatitis urethritis urethrocystitis urethrotrigonitis
OOV Term flavoxate hydrochloride tablets are indicated for symptomatic relief of dysuria urgency OOV OOV pain frequency and incontinence as may occur in cystitis prostatitis urethritis OOV OOV
MedTCS’s Input to Encoder flavoxate hydrochloride tablets are indicated for symptomatic relief of dysuria urgency night urination urine above excessive superior pubis portion of pelvic bone ic pain frequency and incontinence as may occur in cystitis prostatitis urethritis urethra bladder or cyst inflammation urethra trigone inflammation

Above sentence took from RxList and OOV terms observed in GloVe-Twitter-200d that estimate with MedTCS.

MedTCS module enabled the word embedding models to encode the vector for OOV terms from its search-space effectively.

References

[1] Saeed, Nadia, and Hammad Naveed. "Medical terminology-based computing system: a lightweight post-processing solution for out-of-vocabulary multi-word terms." Frontiers in Molecular Biosciences 9 (2022): 928530.