# Map your concept to UMLS CUI

We assume that you have some related medical concepts (keywords) paired with images in your dataset.

If so, we recommend you to use [UMLS metathesaurus browser](https://uts.nlm.nih.gov/uts/umls/home) to map your concepts to existing CUIs shown as below.

e.g. nodule corresponds to C0028259

<img src="img/mapping.png">

# For practical usage we advise you to use following API

In [2]:
import requests
from lxml.html import fromstring
from cachetools import cached, TTLCache
TTL_7HRS = TTLCache(maxsize=2, ttl=25200)

class Auth:
    def __init__(self, api_key):
        self._api_key = api_key

    @cached(TTL_7HRS)
    def get_single_use_service_ticket(self):
        url = 'https://utslogin.nlm.nih.gov/cas/v1/api-key'
        headers = {
            'Content-type': 'application/x-www-form-urlencoded',
            'Accept': 'text/plain',
            'User-Agent': 'python'
        }
        resp = requests.post(
            url, data={'apikey': self._api_key}, headers=headers
        )
        resp.raise_for_status()
        html = fromstring(resp.text)
        ticket_granting_ticket_url = html.xpath('//form/@action')[0]

        resp = requests.post(
            ticket_granting_ticket_url,
            data={'service': 'http://umlsks.nlm.nih.gov'},
            headers=headers
        )
        resp.raise_for_status()
        single_use_service_ticket = resp.text
        return single_use_service_ticket

class API:
    BASE_URL = 'https://uts-ws.nlm.nih.gov/rest'

    def __init__(self, *, api_key, version='current'):
        self._auth = Auth(api_key=api_key)
        self._version = version

    def get_cui(self, cui):
        url = f'{self.BASE_URL}/content/{self._version}/CUI/{cui}'
        return self._get(url=url)
    
    def get_cui_code(self, keyword):
        url = f'{self.BASE_URL}/search/{self._version}/?string={keyword}'
        candidates= self._get(url=url)['result']['results']
        for i in range(3):
            cui=candidates[i]['ui']
            name=candidates[i]['name']
            print(f'No.{i+1} => CUI = {cui}, Concept Name: {name}')
        return candidates[0]['ui']

    def _get(self, url):
        ticket = self._auth.get_single_use_service_ticket()
        resp = requests.get(url, params={'ticket': ticket})
        resp.raise_for_status()
        return resp.json()
    
    
cui_code = API(api_key='72d06e11-2fa4-4bf1-b702-ee2d852038a7').get_cui_code(keyword='nodule')
print(cui_code)

No.1 => CUI = C0028259, Concept Name: Nodule
No.2 => CUI = C2700389, Concept Name: Plant nodule
No.3 => CUI = C0228505, Concept Name: Nodulus cerebelli
C0028259


# If you do not have concepts paired with your image

In this case, we advise your to extract CUIs from paired text.

There are several tools available(e.g. [QuickUMLS](https://github.com/Georgetown-IR-Lab/QuickUMLS), [MedCAT](https://github.com/CogStack/MedCAT), [Scispacy](https://github.com/allenai/scispacy)).

Here we show an example of how to use **Scispacy** below

In [5]:
import spacy
import scispacy
from scispacy.linking import EntityLinker

spacy.prefer_gpu()
nlp = spacy.load("en_core_sci_scibert")
nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True, "linker_name": "umls"})
linker = nlp.get_pipe("scispacy_linker")

def parse_a_text(text):
    entities = []
    doc = nlp(text)
    for ent in doc.ents:
        # Noise Filtering
        if len(ent.text) == 1:
            continue
        # Link to UMLS
        if len(ent._.kb_ents) == 0:
            continue
        start_id = ent.start_char
        end_id = ent.end_char
        cuis = ent._.kb_ents
        cuis = [cui[0] for cui in cuis if cui[1] >= 0.95]
        if len(cuis) == 0:
            continue
        entities.append((start_id, end_id, ent.text, cuis[0]))
    return entities

CUI=parse_a_text('The breast mammogram (cranio-caudal view) showing an interval development of a suspicious grouped microcalcification in the upper outer quadrant of the right breast')
print(CUI)

[(90, 97, 'grouped', 'C0439745'), (98, 116, 'microcalcification', 'C0520594'), (152, 164, 'right breast', 'C0222600')]
