# 🖖🏼 Healthcare AI DIY    
A novel, Focused Language Model(FLM) generator for building an indexable, semantically tuned taxonomy of medical content.   
Semantically inferring UMLS concepts from  PubMed content, a content index and explainable taxonomy model are generated.   
The subject of "mindfulness" is the focus in this example.   

### step 1 identify semantically aligned UMLS CUIs from the PubMed article title and abstract
The UMLS semantic types selected for the target medical focus are:
 * topp | T061 | Therapeutic or Preventive Procedure
 * menp | T041 | Mental Process

In [None]:
%pip install beautifulsoup4
%pip install spacy
os.system('python -m spacy download en_core_web_sm')

### get the pubmed document title and abstract

In [111]:
import requests
import spacy

docID = '28031068'
pubmedurl='https://pubmed.ncbi.nlm.nih.gov/' + docID + '/?format=pubmed'

x = requests.get(pubmedurl)
the_html = x.text

try: 
    from BeautifulSoup import BeautifulSoup
except ImportError:
    from bs4 import BeautifulSoup

parsed_html = BeautifulSoup(the_html)
pmdoc = parsed_html.body.find('pre')
just_the_text = pmdoc.text

pubmed_doc_dict = {
  "PMID": "id",
  "TI": "Title",
  "AB": [],
  "MH": [],
  "OT": []
}
abstract = ''
for l in just_the_text.splitlines():
    parts = l.split("- ")
    if(len(parts)==1):
        abstract = abstract + ' ' + parts[0].strip()
    if(len(parts)==2):
        match parts[0].strip():
            case 'PMID':
                pubmed_doc_dict['PMID'] = parts[1]
            case 'TI':
                pubmed_doc_dict['TI'] = parts[1]
            case 'AB':
                abstract = parts[1].strip()
            case 'MH':
                pubmed_doc_dict['MH'].append(parts[1].strip())
            case 'OT':
                pubmed_doc_dict['OT'].append(parts[1].strip())

#pubmed_doc_dict['AB'] = abstract

nlp = spacy.load("en_core_web_sm")
doc = nlp(abstract)
sentences = []
for sent in doc.sents:
    sentences.append(sent)

pubmed_doc_dict['AB'] = sentences
print('title:')
print(pubmed_doc_dict['TI'])
print('')
print('abstract(first sentence):')
print(sentences[0])
input_file_name = 'mmlinput.txt'
forinputfile = ['title|' + pubmed_doc_dict['TI'], ]
with open(input_file_name, 'w') as output:
    output.write('title|' + pubmed_doc_dict['TI'] + '\nabs1|' + str(sentences[0]))

title:
What defines mindfulness-based programs? The warp and the weft.

abstract(first sentence):
There has been an explosion of interest in mindfulness-based programs (MBPs) such as Mindfulness-Based Stress Reduction (MBSR) and Mindfulness-Based Cognitive Therapy.


### recognize UMLS CUIs
as read from 'mmlinput.txt'

In [112]:
import os
import pandas as pd

input_file_name = 'mmlinput.txt'
df_mml_in = pd.read_csv(input_file_name, sep='|')

os.system('python mmlrestclient.py https://ii.nlm.nih.gov/metamaplite/rest/annotate mmlinput.txt --output mmloutput.txt --docformat sldiwi --resultformat mmi')
output_file_name = 'mmloutput.txt'
df_mml_out = pd.read_csv(output_file_name, sep='|')
#df_mml_out = df_mml_out.drop(df_mml_out.columns[[6, 7, 8, 9]], axis=1)
df_mml_out = df_mml_out.drop(df_mml_out.columns[[6, 7, 8, 9]],axis='columns')
print(output_file_name)
print(df_mml_out)
df_mml_out.head()


mmloutput.txt
     abs1  MMI  7.37                                Mindfulness  C3542996  \
0    abs1  MMI  1.84                          Cognitive Therapy  C0009244   
1    abs1  MMI  1.38                                    Andorra  C0002838   
2    abs1  MMI  1.38                             Blast Injuries  C0005700   
3    abs1  MMI  1.38                                  Explosion  C0015329   
4    abs1  MMI  1.38                                   Programs  C3484370   
5    abs1  MMI  0.92                   Base - General Qualifier  C1705938   
6    abs1  MMI  0.92                  Basis - conceptual entity  C1527178   
7    abs1  MMI  0.92                       Mental concentration  C0086045   
8    abs1  MMI  0.92                     Mindfulness Relaxation  C2985553   
9    abs1  MMI  0.46                                        And  C1515981   
10   abs1  MMI  0.46                                       Have  C3539897   
11   abs1  MMI  0.46                                   Interes

Unnamed: 0,abs1,MMI,7.37,Mindfulness,C3542996,[menp]
0,abs1,MMI,1.84,Cognitive Therapy,C0009244,[topp]
1,abs1,MMI,1.38,Andorra,C0002838,[geoa]
2,abs1,MMI,1.38,Blast Injuries,C0005700,[inpo]
3,abs1,MMI,1.38,Explosion,C0015329,[phpr]
4,abs1,MMI,1.38,Programs,C3484370,[ftcn]


### filter by semantic focus

In [None]:
#get UMLS CUIs
# https://documentation.uts.nlm.nih.gov/rest/rest-api-cookbook/python-scripts.html

## something here

In [19]:
df_data = pd.read_json('data.json')

newdf = df_data.drop("qualified",axis='columns')

print(newdf)

df_data.head()

   Txt001  MMI  3.68                    Mindfulness  C3542996  [menp]  \
0  Txt001  MMI  1.84              Cognitive Therapy  C0009244  [topp]   
1  Txt001  MMI  1.84          Color Blindness, Blue  C0155017  [dsyn]   
2  Txt001  MMI  1.38                        Andorra  C0002838  [geoa]   
3  Txt001  MMI  0.46                            And  C1515981  [idcn]   
4  Txt001  MMI  0.46           Mental concentration  C0086045  [menp]   
5  Txt001  MMI  0.46                    OPN1SW gene  C1412770  [gngm]   
6  Txt001  MMI  0.46  RelationshipConjunction - and  C1550557  [inpr]   

   Mindfulness-text-0-"mindfulness"--0  text  0/11  \
0                  CBT-text-0-"CBT"--0  text  16/3   
1                  CBT-text-0-"CBT"--0  text  16/3   
2                  AND-text-0-"and"--0  text  12/3   
3                  And-text-0-"and"--0  text  12/3   
4  mindfulness-text-0-"mindfulness"--0  text  0/11   
5                  CBT-text-0-"CBT"--0  text  16/3   
6                  and-text-0-"and"--

Unnamed: 0,provenance,url,title,qualified,bonus
0,pubmed,https://pubmed.ncbi.nlm.nih.gov/28031068/?form...,What defines mindfulness-based programs? The w...,True,
1,pubmed,https://pubmed.ncbi.nlm.nih.gov/35634214/?form...,Meditators' Non-academic Definition of Mindful...,True,
2,github,github,,True,
3,nlm umls,nlm umls,,False,{'name': 'library'}



### notes ...


pyMeSHSim
https://pubmed.ncbi.nlm.nih.gov/32552728/
https://pymeshsim.readthedocs.io/en/latest/
https://github.com/luozhhub/pyMeSHSim <=latest


UMLS Semantic Types
https://lhncbc.nlm.nih.gov/ii/tools/MetaMap/Docs/SemanticTypes_2018AB.txt


[[{Value1:1},{value2:2}],[{value3:3},{value4:4}],.....]   
<= for data like that use this =>   
value1 = df['column_name'][0][0].get(Value1)   


health data modeling   
novel, focused language modeling   

![weird](/weird.svg)

Documents to get

"What defines mindfulness-based programs? The warp and the weft"
https://pubmed.ncbi.nlm.nih.gov/28031068/?format=pubmed
https://pubmed.ncbi.nlm.nih.gov/28031068/

"Meditators' Non-academic Definition of Mindfulness"
https://pubmed.ncbi.nlm.nih.gov/35634214/?format=pubmed
https://pubmed.ncbi.nlm.nih.gov/35634214/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9127491/

UMLS to get

https://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html
https://www.nlm.nih.gov/research/umls/new_users/online_learning/UMLST_001.html

python mmlrestclient.py https://ii-public2vm.nlm.nih.gov/metamaplite/rest/annotate mmlinput.txt --output mmloutput.txt

python mmlrestclient.py https://ii.nlm.nih.gov/metamaplite/rest/annotate mmlinput.txt --output mmloutput.txt --docformat sldiwi --resultformat mmi

pubmed results
https://pubmed.ncbi.nlm.nih.gov/?term=%28mindfulness%29+AND+%28Therapy%2FNarrow%5Bfilter%5D%29&filter=pubt.meta-analysis&filter=pubt.review&filter=pubt.systematicreview&sort=date&format=pubmed&size=10