# Acronym resolution (`acres`) demo

This is a demo of the `acres` project.

## Configuration

### Working directory

All `acres` code should run under the root folder.

In [62]:
import os
cwd = os.getcwd()
if "notebooks" not in os.listdir(cwd):
    print("Wrong working directory: " + cwd)
    
os.getcwd()

'/Users/michel/git/acres'

### Disable gensim warnings

This is a problem with gensim 3.6.0 only.

In [63]:
import warnings
warnings.filterwarnings('ignore')

## Using word2vec

We trained a word2vec model using [gensim](https://radimrehurek.com/gensim/models/word2vec.html) on a cardiology corpus written in German. We obtained best results with a continuous bag-of-words (CBOW) model with a window size of 2.

### Raw results from wordvec

In [64]:
from acres.nn import test

def word2vec_raw(acronym):
    return test.find_candidates(acronym)

The model allows us to get words that occur in a similar context.

In [65]:
word2vec_raw("gutem")

['zufriedenstellendem',
 'gebessertem',
 'verbessertem',
 'kardiorespiratstabilem',
 'gebesserten',
 'reduziertem',
 'stabilem',
 'altersentsprechendem',
 'stabilen',
 'stationrer Behandlung']

It can also be used for spell checking, as typos happen in similar contexts than the correct spelling.

In [66]:
word2vec_raw("Cardiomyopathie")

['Kardioymopathie',
 'Kardiomoypathie',
 'Kardiomyopatie',
 'Kardiomypathie',
 'Kardiopmyopathie',
 'Kardiomoyopathie',
 'DCMP',
 'Kardiomyoapthie',
 'Dysplasie',
 'Kardiomyopathien']

It turns out that acronym expansions also happen in similar contexts than the acronym itself:

In [67]:
word2vec_raw("SR")

['Sinusrhythmus',
 'biventrikulrer Stimulation',
 'Sr',
 'Schrittmacherrhythmus',
 'Konversion',
 'SMRhythmus',
 'SMEKG',
 'Am Untersuchungsbeginn',
 'bergeleitetes Vorhofflimmern',
 'ÐÐÐJ']

In [68]:
word2vec_raw("HF")

['Herzfrequenz',
 'HFAnstieg',
 'Frequenz',
 'Herzfrequenzanstieg',
 'RRAnstieg',
 'Herzfrequenzen',
 'Gradienten',
 'ÐÐÐmin',
 'Hf',
 'TachykardieZykluslnge']

In [69]:
word2vec_raw("GGT")

['yGT',
 'ALT',
 'Harnstoff',
 'LDH',
 'Trigl',
 'CK',
 'L Thyroxin',
 'Ferritin',
 'Euthyrox',
 'Myoglobin']

As this is a semantic model, note that it also works for non-trivial expansions, like `RR => Blutdruck`.

In [70]:
word2vec_raw("RR")

['Blutdruck',
 'BZ',
 'Mittlerer Nachtwert',
 'Blutdruckes',
 'Leberwerte',
 'Mittlerer Tageswert',
 'Vertrglichkeit',
 'CRP',
 'Blutdruckwerten',
 'Blutruck']

(RR is a common acronym for designating "blood pressure" in German. It stems from the inventors of the cuff-based sphygmomanometer, the italians Riva and Rocci 🙂)


It also works for terms borrowed from other languages (e.g. English), such as `EF => Auswurffraktion` (ejection fraction).

In [71]:
word2vec_raw("EF")

['LVEF',
 'Auswurffraktion',
 'LVF',
 'Simpson',
 'LVEDD',
 'global',
 'CO Diffusionskapazitt',
 'GFR',
 'LVFunktion',
 'visuell']

We also enriched the model with common collocations provided by the `Phraser` module so that it also works for multiwords acronyms:

In [72]:
word2vec_raw("AP")

['APSymptomatik',
 'Angina pectoris',
 'Stenokardien',
 'Crescendo AP',
 'Angina pectorisSymptomatik',
 'BelastungsAP',
 'Atemnot',
 'Leistungsminderung',
 'Angina Pectoris',
 'aortenstenosespezifische']

It does not work, however, for acronyms that are never expanded in the collection:

In [73]:
word2vec_raw("EKG")

['HolterEKG',
 'RuheEKG',
 'LangzeitEKG',
 'KontrollEKG',
 'Am Untersuchungsbeginn',
 'Echo',
 'AnfallsEKG',
 'ÐÐStdEKG',
 'AufnahmeEKG',
 'Schdel CT']

### Filtered results

For proper acronym expansion, we further filter results using hand-crafted rules.

In [74]:
from acres.evaluation import evaluation

def word2vec_filtered(acronym):
    return evaluation.cached_resolve(acronym, "", "", evaluation.Strategy.WORD2VEC)

In [75]:
word2vec_filtered("SR")

['Sinusrhythmus', 'Schrittmacherrhythmus', 'SMRhythmus']

In [76]:
word2vec_filtered("HF")

['Herzfrequenz', 'Herzfrequenzanstieg', 'Herzfrequenzen']

In [77]:
word2vec_filtered("AP")

['Angina pectoris', 'Angina pectorisSymptomatik', 'Angina Pectoris']

Unfortunately, the current rules are too restrictive and do not allow semantic expansions...

In [78]:
word2vec_filtered("RR")

[]

In [79]:
word2vec_filtered("EF")

[]