### Extra credit

- The instructions to set up MetaMap and pymetamap are provided by [Gary Weissman](https://gweissman.github.io/post/using-metamap-with-python-to-access-the-umls-metathesaurus-a-quick-start-guide/) and [NIH](https://lhncbc.nlm.nih.gov/ii/tools/MetaMap/documentation/Installation.html#metamap-installation) (click link to go to reference)
- The `test_output_human (2).txt` file is generated in `DL.ipynb`. The first line is the word "term", and the rest of the file contains extracted SSIs and AEs from the test data. The line is empty if no SSIs and/or AEs are extracted. 

In [1]:
# import libraries
from pymetamap import MetaMap
import os
import re
from tqdm import tqdm
from time import sleep

# Setup UMLS Server (configure by file path)
metamap_base_dir = '/Users/randy/Downloads/public_mm/'
metamap_bin_dir = 'bin/metamap18'
metamap_pos_server_dir = 'bin/skrmedpostctl'
metamap_wsd_server_dir = 'bin/wsdserverctl'

# Start POS and WSD servers, both are required for MetaMap
os.system(metamap_base_dir + metamap_pos_server_dir + ' start') 
os.system(metamap_base_dir + metamap_wsd_server_dir + ' start') 

# configure sleep time to ensure servers have completely been started
# sleep(10)

Starting skrmedpostctl: 
started.
Starting wsdserverctl: 
started.


0

loading properties file /Users/randy/Downloads/public_mm//WSD_Server/config/disambServer.cfg


In [2]:
# instantiate metamap 
metam = MetaMap.get_instance(metamap_base_dir + metamap_bin_dir)

In [28]:
test_str = "isoechoic nodule of left lobe of thyroid"

metam.extract_concepts([test_str], 
                       word_sense_disambiguation = True,
                       prefer_multiple_concepts=True,
                       term_processing=True,
                       restrict_to_sources=['lbtr', "fndg"])


### MetaMap ERROR: The UMLS sources [lbtr,fndg] are unknown. Aborting.


([], None)

In [10]:
with open("test_output_human (2).txt", "r") as f, open("test_output_labelseq.txt", "r") as ff:
     mm = f.readlines() 
     ls = ff.readlines()

In [12]:
result = []
for i in tqdm(range(1, len(mm))):
     mm[i] = re.sub(r"\n", "", mm[i])
     # skip empty lines (lines with no predicted values)
     if len(mm[i]) == 0: 
          result.append([]) 
     else:
          cons, errs = metam.extract_concepts([mm[i]], 
                                              word_sense_disambiguation = True, 
                                              composite_phrase = 1, 
                                              prune = 99)
          # get cui from concept (cons) variable
          result.append([con.cui for con in cons])

  1%|          | 8/1259 [00:08<20:02,  1.04it/s]

WSD Server initializing disambiguation methods.
WSD Server databases and disambiguation methods have been initialized.
Could not listen on port : 5554 : Address already in use (Bind failed)


100%|██████████| 1259/1259 [12:41<00:00,  1.65it/s]


In [16]:
# write results to file
with open("extra_credit.txt", "w") as f:
     f.write("ID\tTAGSEQ\tCUI\n")
     for labelseq, cuis in zip(ls[1:], result):
          labelseq = re.sub(r"\n", r"\t", labelseq)
          cui = ",".join(cuis)
          f.write(f"{labelseq}\t{cui}\n")

# kill MetaMap Java process
# !jcmd -l