# 🖖🏼 Mindfulness AI (mai) builder    
A taxonomy generator for an explainable health AI.
A novel, Focused Language Model builder for generating an indexable, semantically tuned taxonomy of medical content.   
Prompted with a natural English input (statement or query), a content (repository) and (explainable taxonomy model) are emitted/generated from (PubMed) content and (UMLS) concepts.   
The subject of "how to practice mindfulness" is generated in this example.   
The resulting JSON data model is paginated for efficiency of import to an Algolia index.   
The resulting Mindfulness AI is implemented with Algolia search, (here url to Algolia demo UI). 

In [52]:
%pip install -q ipywidgets
#%pip install pypmed
%pip install beautifulsoup4

Note: you may need to restart the kernel to use updated packages.

Collecting beautifulsoup4
  Downloading beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
     ---------------------------------------- 0.0/143.0 kB ? eta -:--:--
     ----------------------------------- -- 133.1/143.0 kB 3.8 MB/s eta 0:00:01
     -------------------------------------- 143.0/143.0 kB 4.1 MB/s eta 0:00:00
Collecting soupsieve>1.2 (from beautifulsoup4)
  Downloading soupsieve-2.4.1-py3-none-any.whl (36 kB)
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.12.2 soupsieve-2.4.1
Note: you may need to restart the kernel to use updated packages.


## get pubmed document

In [77]:
import requests
docID = '28031068'
pubmedurl='https://pubmed.ncbi.nlm.nih.gov/' + docID + '/?format=pubmed'
# pubmedurl='https://pubmed.ncbi.nlm.nih.gov/?term=%28mindfulness%29+AND+%28Therapy%2FNarrow%5Bfilter%5D%29&filter=pubt.meta-analysis&filter=pubt.review&filter=pubt.systematicreview&sort=date&format=pubmed&size=10'

x = requests.get(pubmedurl)
the_html = x.text

try: 
    from BeautifulSoup import BeautifulSoup
except ImportError:
    from bs4 import BeautifulSoup

parsed_html = BeautifulSoup(the_html)
pmdoc = parsed_html.body.find('pre')
just_the_text = pmdoc.text

this_doc_dict = {
  "PMID": "id",
  "TI": "Title",
  "AB": "abstract",
  "MH": [],
  "OT": []
}
abstract = ''
for l in just_the_text.splitlines():
    parts = l.split("- ")
    if(len(parts)==1):
        abstract = abstract + ' ' + parts[0].strip()
    if(len(parts)==2):
        match parts[0].strip():
            case 'PMID':
                this_doc_dict['PMID'] = parts[1]
            case 'TI':
                this_doc_dict['TI'] = parts[1]
            case 'AB':
                abstract = parts[1].strip()
            case 'MH':
                this_doc_dict['MH'].append(parts[1].strip())
            case 'OT':
                this_doc_dict['OT'].append(parts[1].strip())

this_doc_dict['AB'] = abstract

print(this_doc_dict)

{'PMID': '28031068', 'TI': 'What defines mindfulness-based programs? The warp and the weft.', 'AB': "There has been an explosion of interest in mindfulness-based programs (MBPs) such as Mindfulness-Based Stress Reduction (MBSR) and Mindfulness-Based Cognitive Therapy. This is demonstrated in increased research, implementation of MBPs in healthcare, educational, criminal justice and workplace settings, and in mainstream interest. For the sustainable development of the field there is a need to articulate a definition of what an MBP is and what it is not. This paper provides a framework to define the essential characteristics of the family of MBPs originating from the parent program MBSR, and the processes which inform adaptations of MBPs for different populations or contexts. The framework addresses the essential characteristics of the program and of teacher. MBPs: are informed by theories and practices that draw from a confluence of contemplative traditions, science, and the major disci

In [32]:
import os
import pandas as pd
import ipywidgets as widgets
from IPython.display import YouTubeVideo

input_file_name = 'mmlinput.txt'
df_mml_in = pd.read_csv(input_file_name, sep='|')
print(input_file_name)
print(df_mml_in)

out = widgets.Output(layout={'border': '3px solid green'})
#with out:
    #display(YouTubeVideo('eWzY2nGfkXk'))
out

mmlinput.txt
   t001  mindfulness benefits
0  t002  meditation practices


Output(layout=Layout(border_bottom='3px solid green', border_left='3px solid green', border_right='3px solid g…

### encode recognizable language as NLM IDs
as read from 'mmlinput.txt'

In [30]:
os.system('python mmlrestclient.py https://ii.nlm.nih.gov/metamaplite/rest/annotate mmlinput.txt --output mmloutput.txt --docformat sldiwi --resultformat mmi')
output_file_name = 'mmloutput.txt'
df_mml_out = pd.read_csv(output_file_name, sep='|')
df_mml_out = df_mml_out.drop(df_mml_out.columns[[6, 7, 8, 9]], axis=1)
print(output_file_name)
print(df_mml_out)
# df_mml_out.head()


mmloutput.txt
   t001  MMI  3.68                         Mindfulness  C3542996  [menp]
0  t001  MMI  0.46                             Benefit  C0814225  [qnco]
1  t001  MMI  0.46                Mental concentration  C0086045  [menp]
2  t002  MMI  5.99                          meditation  C0150277  [menp]
3  t002  MMI  0.46                  Meditation Therapy  C0814263  [topp]
4  t002  MMI  0.46                 Practice Experience  C0237607  [menp]
5  t002  MMI  0.46  Religious Affiliation - Meditation  C1552052  [fndg]


## collect relevant NLM UMLS CUI attributes

In [None]:
#get UMLS CUIs
# https://documentation.uts.nlm.nih.gov/rest/rest-api-cookbook/python-scripts.html

## something here

In [19]:
df_data = pd.read_json('data.json')

newdf = df_data.drop("qualified",axis='columns')

print(newdf)

df_data.head()

   Txt001  MMI  3.68                    Mindfulness  C3542996  [menp]  \
0  Txt001  MMI  1.84              Cognitive Therapy  C0009244  [topp]   
1  Txt001  MMI  1.84          Color Blindness, Blue  C0155017  [dsyn]   
2  Txt001  MMI  1.38                        Andorra  C0002838  [geoa]   
3  Txt001  MMI  0.46                            And  C1515981  [idcn]   
4  Txt001  MMI  0.46           Mental concentration  C0086045  [menp]   
5  Txt001  MMI  0.46                    OPN1SW gene  C1412770  [gngm]   
6  Txt001  MMI  0.46  RelationshipConjunction - and  C1550557  [inpr]   

   Mindfulness-text-0-"mindfulness"--0  text  0/11  \
0                  CBT-text-0-"CBT"--0  text  16/3   
1                  CBT-text-0-"CBT"--0  text  16/3   
2                  AND-text-0-"and"--0  text  12/3   
3                  And-text-0-"and"--0  text  12/3   
4  mindfulness-text-0-"mindfulness"--0  text  0/11   
5                  CBT-text-0-"CBT"--0  text  16/3   
6                  and-text-0-"and"--

Unnamed: 0,provenance,url,title,qualified,bonus
0,pubmed,https://pubmed.ncbi.nlm.nih.gov/28031068/?form...,What defines mindfulness-based programs? The w...,True,
1,pubmed,https://pubmed.ncbi.nlm.nih.gov/35634214/?form...,Meditators' Non-academic Definition of Mindful...,True,
2,github,github,,True,
3,nlm umls,nlm umls,,False,{'name': 'library'}



### notes ...


pyMeSHSim
https://pubmed.ncbi.nlm.nih.gov/32552728/
https://pymeshsim.readthedocs.io/en/latest/
https://github.com/luozhhub/pyMeSHSim <=latest


UMLS Semantic Types
https://lhncbc.nlm.nih.gov/ii/tools/MetaMap/Docs/SemanticTypes_2018AB.txt


[[{Value1:1},{value2:2}],[{value3:3},{value4:4}],.....]   
<= for data like that use this =>   
value1 = df['column_name'][0][0].get(Value1)   


health data modeling   
novel, focused language modeling   

![weird](/weird.svg)

Documents to get

"What defines mindfulness-based programs? The warp and the weft"
https://pubmed.ncbi.nlm.nih.gov/28031068/?format=pubmed
https://pubmed.ncbi.nlm.nih.gov/28031068/

"Meditators' Non-academic Definition of Mindfulness"
https://pubmed.ncbi.nlm.nih.gov/35634214/?format=pubmed
https://pubmed.ncbi.nlm.nih.gov/35634214/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9127491/

UMLS to get

https://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html
https://www.nlm.nih.gov/research/umls/new_users/online_learning/UMLST_001.html

python mmlrestclient.py https://ii-public2vm.nlm.nih.gov/metamaplite/rest/annotate mmlinput.txt --output mmloutput.txt

python mmlrestclient.py https://ii.nlm.nih.gov/metamaplite/rest/annotate mmlinput.txt --output mmloutput.txt --docformat sldiwi --resultformat mmi

pubmed results
https://pubmed.ncbi.nlm.nih.gov/?term=%28mindfulness%29+AND+%28Therapy%2FNarrow%5Bfilter%5D%29&filter=pubt.meta-analysis&filter=pubt.review&filter=pubt.systematicreview&sort=date&format=pubmed&size=10