# 🖖🏼 Semantic Focus for Understanding Healthcare Content   
Semantically inferring the meaningful UMLS concept entities recognized from PubMed content using semantic alignment. A summary diagram of the data is in the [README.md](README.md).   
The subject of "mindfulness" is the focus in this example.   

Applications include content summarization, search, and AI.

### Semantic Focus
The UMLS semantic types selected for the target medical focus are:
 * topp | T061 | Therapeutic or Preventive Procedure
 * menp | T041 | Mental Process

 These are selected from the broader spectrum of the 127 possible semantic types that summarize the field of medicine. [Semantic Types](SemanticTypes_2018AB.txt)   
 
 Our [Semantic Focus](SemanticFocus.txt) is on mental processes and therapies in this example. If, as an alternative example, we were concerned with medications, we would select the pharmacological semantic types to focus on.

### initial configuration

In [None]:
%pip install beautifulsoup4
%pip install spacy
os.system('python -m spacy download en_core_web_sm')

### get the pubmed document title and abstract
This is a single document example of work that could run on batches of content in a pipeline.

In [43]:
import requests
import spacy

docID = '28031068'
pubmedurl='https://pubmed.ncbi.nlm.nih.gov/' + docID + '/?format=pubmed'

x = requests.get(pubmedurl)
the_html = x.text

try: 
    from BeautifulSoup import BeautifulSoup
except ImportError:
    from bs4 import BeautifulSoup

parsed_html = BeautifulSoup(the_html)
pmdoc = parsed_html.body.find('pre')
just_the_text = pmdoc.text

pubmed_doc_dict = {
  "PMID": "id",
  "TI": "Title",
  "AB": [],
  "MH": [],
  "OT": []
}

abstract = ''
for l in just_the_text.splitlines():
    parts = l.split("- ")
    if(len(parts)==1):
        abstract = abstract + ' ' + parts[0].strip()
    if(len(parts)==2):
        match parts[0].strip():
            case 'PMID':
                pubmed_doc_dict['PMID'] = parts[1]
            case 'TI':
                pubmed_doc_dict['TI'] = parts[1]
            case 'AB':
                abstract = parts[1].strip()
            case 'MH':
                pubmed_doc_dict['MH'].append(parts[1].strip())
            case 'OT':
                pubmed_doc_dict['OT'].append(parts[1].strip())

nlp = spacy.load("en_core_web_sm")
doc = nlp(abstract)
sentences = []
for sent in doc.sents:
    sentences.append(sent)

pubmed_doc_dict['AB'] = sentences
print('title:')
print(pubmed_doc_dict['TI'])
print('')
print('abstract(first sentence):')
print(sentences[0])
input_file_name = 'mmlinput.txt'
forinputfile = ['title|' + pubmed_doc_dict['TI'], ]
with open(input_file_name, 'w') as output:
    output.write('title|' + pubmed_doc_dict['TI'] + '\nabs1|' + str(sentences[0]))

title:
What defines mindfulness-based programs? The warp and the weft.

abstract(first sentence):
There has been an explosion of interest in mindfulness-based programs (MBPs) such as Mindfulness-Based Stress Reduction (MBSR) and Mindfulness-Based Cognitive Therapy.


### recognize UMLS CUIs
as read from 'mmlinput.txt'   

The title of a scientific article is assumed(based on experimental evidence) to be the single most representative sentence. We will be focusing on the semantic terms to identify the intent.
The first sentence of the abstract is assumed to be most representative of the general intent. The whole abstract and other attributes could still be used later for building index attributes or analytical dimensions.

In [87]:
import os
import pandas as pd

input_file_name = 'mmlinput.txt'
df_mml_in = pd.read_csv(input_file_name, sep='|')

os.system('python mmlrestclient.py https://ii.nlm.nih.gov/metamaplite/rest/annotate mmlinput.txt --output mmloutput.txt --docformat sldiwi --resultformat mmi')
output_file_name = 'mmloutput.txt'
df_mml_out = pd.read_csv(output_file_name, sep='|', names=["source", "att", "score", "label", "CUI", "semname", "terms", "type", "nuther", "druther"])
df_mml_out = df_mml_out.drop(df_mml_out.columns[[1, 2, 7, 8, 9]],axis='columns')
print(output_file_name)
print(df_mml_out)

mmloutput.txt
   source                                      label       CUI semname  \
0    abs1                                Mindfulness  C3542996  [menp]   
1    abs1                          Cognitive Therapy  C0009244  [topp]   
2    abs1                                    Andorra  C0002838  [geoa]   
3    abs1                             Blast Injuries  C0005700  [inpo]   
4    abs1                                  Explosion  C0015329  [phpr]   
5    abs1                                   Programs  C3484370  [ftcn]   
6    abs1                   Base - General Qualifier  C1705938  [idcn]   
7    abs1                  Basis - conceptual entity  C1527178  [ftcn]   
8    abs1                       Mental concentration  C0086045  [menp]   
9    abs1                     Mindfulness Relaxation  C2985553  [menp]   
10   abs1                                        And  C1515981  [idcn]   
11   abs1                                       Have  C3539897  [qlco]   
12   abs1               

### Observations
We can see semantic "crash blossoms" in the recognized entities of "Blast Injuries" and "Explosion" (and even the disasterous "Medical Device Explosion Issue") mistaken from the figure of speech which begins the abstract, "There has been an explosion " ...


### filter by semantic focus
When we filter by semantic type, we can see clearly that this article is not about exploding medical devices, but is about the concepts of Mindfulness and Cognitive Therapy.

In [88]:
df_semantic_focus = pd.read_csv('SemanticFocus.txt', sep='|', names=["semname", "id", "label"])
# remove a couple noisy chars before comparison
df_mml_out.semname = df_mml_out.semname.str.rstrip(']')
df_mml_out.semname = df_mml_out.semname.str.lstrip('[')
semantic_match = df_mml_out.semname.isin(df_semantic_focus.semname)
df_mml_out = df_mml_out[semantic_match]
print(df_mml_out)

# this triggers a VS Code "Launch Data Wrangler" button
df_mml_out.head()


   source                                      label       CUI semname  \
0    abs1                                Mindfulness  C3542996    menp   
1    abs1                          Cognitive Therapy  C0009244    topp   
8    abs1                       Mental concentration  C0086045    menp   
9    abs1                     Mindfulness Relaxation  C2985553    menp   
12   abs1                                   Interest  C0543488    menp   
14   abs1  Meditation-Based Stress Reduction Program  C4527300    topp   
17  title                                Mindfulness  C3542996    menp   
23  title                       Mental concentration  C0086045    menp   

                                                terms  
0   Mindfulness-text-2-"mindfulness"--0,"Mindfulne...  
1    Cognitive Therapy-text-36-"Cognitive Therapy"--0  
8   mindfulness-text-2-"mindfulness"--0,"mindfulne...  
9   Mindfulness-Based Stress Reduction-text-22-"Mi...  
12                      Interest-text-0-"interest"--0

Unnamed: 0,source,label,CUI,semname,terms
0,abs1,Mindfulness,C3542996,menp,"Mindfulness-text-2-""mindfulness""--0,""Mindfulne..."
1,abs1,Cognitive Therapy,C0009244,topp,"Cognitive Therapy-text-36-""Cognitive Therapy""--0"
8,abs1,Mental concentration,C0086045,menp,"mindfulness-text-2-""mindfulness""--0,""mindfulne..."
9,abs1,Mindfulness Relaxation,C2985553,menp,"Mindfulness-Based Stress Reduction-text-22-""Mi..."
12,abs1,Interest,C0543488,menp,"Interest-text-0-""interest""--0"


### reference notes ...

pyMeSHSim
https://pubmed.ncbi.nlm.nih.gov/32552728/
https://pymeshsim.readthedocs.io/en/latest/
https://github.com/luozhhub/pyMeSHSim <=latest

UMLS Semantic Types
https://lhncbc.nlm.nih.gov/ii/tools/MetaMap/Docs/SemanticTypes_2018AB.txt
more UMLS
https://www.nlm.nih.gov/research/umls/new_users/online_learning/UMLST_001.html
https://documentation.uts.nlm.nih.gov/rest/rest-api-cookbook/python-scripts.html

"What defines mindfulness-based programs? The warp and the weft"
https://pubmed.ncbi.nlm.nih.gov/28031068/?format=pubmed
https://pubmed.ncbi.nlm.nih.gov/28031068/

"Meditators' Non-academic Definition of Mindfulness"
https://pubmed.ncbi.nlm.nih.gov/35634214/?format=pubmed
https://pubmed.ncbi.nlm.nih.gov/35634214/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9127491/

pubmed search results
https://pubmed.ncbi.nlm.nih.gov/?term=%28mindfulness%29+AND+%28Therapy%2FNarrow%5Bfilter%5D%29&filter=pubt.meta-analysis&filter=pubt.review&filter=pubt.systematicreview&sort=date&format=pubmed&size=10