# Extraction and manipulation of MeSH tree/terms

Author: **Pablo Iriarte, University of Geneva - pablo.iriarte@unige.ch**

### Extracting information from MeSH thesaurus

The processus of extracting information frome MeSH follows this simple steps:

 1. Download MeSH thesaurus in XML format from https://www.nlm.nih.gov/mesh/filelist.html
 1. Extract and anlyze the MeSH XML tree
 1. Select the informations to extract with the UID (loops and secondary loops)
 
 
### Exemple: extract all the possible entry terms for one particular pharmacological action

This idea was proposed by Mrs Kirsten van Gelderen-Ziesemer during the workshop "[Mining PubMed metadata with Pandas and Jupyter Notebooks](https://www.conftool.com/eahil2019/index.php?page=browseSessions&downloads=show&form_session=39&mode=table&presentations=show)" given during the EAHIL congress in Basel in June 2019


 

### 2. Extract the MeSH XML tree

Record exemple:

```xml
<?xml version="1.0"?>
<!DOCTYPE DescriptorRecordSet SYSTEM "https://www.nlm.nih.gov/databases/dtd/nlmdescriptorrecordset_20180101.dtd">
<DescriptorRecordSet LanguageCode = "eng">
<DescriptorRecord DescriptorClass = "1">
  <DescriptorUI>D000001</DescriptorUI>
  <DescriptorName>
   <String>Calcimycin</String>
  </DescriptorName>
  <DateCreated>
   <Year>1974</Year>
   <Month>11</Month>
   <Day>19</Day>
  </DateCreated>
  <DateRevised>
   <Year>2016</Year>
   <Month>05</Month>
   <Day>27</Day>
  </DateRevised>
  <DateEstablished>
   <Year>1984</Year>
   <Month>01</Month>
   <Day>01</Day>
  </DateEstablished>
  <AllowableQualifiersList>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000302</QualifierUI>
      <QualifierName>
      <String>isolation &amp; purification</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>IP</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000276</QualifierUI>
      <QualifierName>
      <String>immunology</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>IM</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000493</QualifierUI>
      <QualifierName>
      <String>pharmacokinetics</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>PK</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000096</QualifierUI>
      <QualifierName>
      <String>biosynthesis</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>BI</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000627</QualifierUI>
      <QualifierName>
      <String>therapeutic use</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>TU</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000819</QualifierUI>
      <QualifierName>
      <String>agonists</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>AG</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000097</QualifierUI>
      <QualifierName>
      <String>blood</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>BL</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000134</QualifierUI>
      <QualifierName>
      <String>cerebrospinal fluid</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>CF</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000138</QualifierUI>
      <QualifierName>
      <String>chemical synthesis</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>CS</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000737</QualifierUI>
      <QualifierName>
      <String>chemistry</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>CH</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000652</QualifierUI>
      <QualifierName>
      <String>urine</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>UR</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000528</QualifierUI>
      <QualifierName>
      <String>radiation effects</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>RE</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000266</QualifierUI>
      <QualifierName>
      <String>history</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>HI</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000032</QualifierUI>
      <QualifierName>
      <String>analysis</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>AN</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000145</QualifierUI>
      <QualifierName>
      <String>classification</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>CL</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000191</QualifierUI>
      <QualifierName>
      <String>economics</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>EC</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000378</QualifierUI>
      <QualifierName>
      <String>metabolism</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>ME</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000494</QualifierUI>
      <QualifierName>
      <String>pharmacology</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>PD</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000592</QualifierUI>
      <QualifierName>
      <String>standards</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>ST</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000008</QualifierUI>
      <QualifierName>
      <String>administration &amp; dosage</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>AD</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000037</QualifierUI>
      <QualifierName>
      <String>antagonists &amp; inhibitors</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>AI</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000031</QualifierUI>
      <QualifierName>
      <String>analogs &amp; derivatives</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>AA</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000600</QualifierUI>
      <QualifierName>
      <String>supply &amp; distribution</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>SD</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000506</QualifierUI>
      <QualifierName>
      <String>poisoning</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>PO</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000009</QualifierUI>
      <QualifierName>
      <String>adverse effects</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>AE</Abbreviation>
   </AllowableQualifier>
   <AllowableQualifier>
    <QualifierReferredTo>
     <QualifierUI>Q000633</QualifierUI>
      <QualifierName>
      <String>toxicity</String>
      </QualifierName>
    </QualifierReferredTo>
    <Abbreviation>TO</Abbreviation>
   </AllowableQualifier>
  </AllowableQualifiersList>
  <HistoryNote>91(75); was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)
  </HistoryNote>
  <OnlineNote>use CALCIMYCIN to search A 23187 1975-90
  </OnlineNote>
  <PublicMeSHNote>91; was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)
  </PublicMeSHNote>
  <PreviousIndexingList>
   <PreviousIndexing>Antibiotics (1973-1974)</PreviousIndexing>
   <PreviousIndexing>Carboxylic Acids (1973-1974)</PreviousIndexing>
  </PreviousIndexingList>
    <PharmacologicalActionList>
     <PharmacologicalAction>
      <DescriptorReferredTo>
       <DescriptorUI>D000900</DescriptorUI>
        <DescriptorName>
         <String>Anti-Bacterial Agents</String>
        </DescriptorName>
      </DescriptorReferredTo>
     </PharmacologicalAction>
     <PharmacologicalAction>
      <DescriptorReferredTo>
       <DescriptorUI>D061207</DescriptorUI>
        <DescriptorName>
         <String>Calcium Ionophores</String>
        </DescriptorName>
      </DescriptorReferredTo>
     </PharmacologicalAction>
    </PharmacologicalActionList>
  <TreeNumberList>
   <TreeNumber>D03.633.100.221.173</TreeNumber>
  </TreeNumberList>
  <ConceptList>
   <Concept PreferredConceptYN="Y">
    <ConceptUI>M0000001</ConceptUI>
    <ConceptName>
     <String>Calcimycin</String>
    </ConceptName>
    <CASN1Name>4-Benzoxazolecarboxylic acid, 5-(methylamino)-2-((3,9,11-trimethyl-8-(1-methyl-2-oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7-dioxaspiro(5.5)undec-2-yl)methyl)-, (6S-(6alpha(2S*,3S*),8beta(R*),9beta,11alpha))-</CASN1Name>
    <RegistryNumber>37H9VM9WZL</RegistryNumber>
    <ScopeNote>An ionophorous, polyether antibiotic from Streptomyces chartreusensis. It binds and transports CALCIUM and other divalent cations across membranes and uncouples oxidative phosphorylation while inhibiting ATPase of rat liver mitochondria. The substance is used mostly as a biochemical tool to study the role of divalent cations in various biological systems.
    </ScopeNote>
    <RelatedRegistryNumberList>
     <RelatedRegistryNumber>52665-69-7 (Calcimycin)</RelatedRegistryNumber>
    </RelatedRegistryNumberList>
    <ConceptRelationList>
     <ConceptRelation RelationName="NRW">
     <Concept1UI>M0000001</Concept1UI>
     <Concept2UI>M0353609</Concept2UI>
     </ConceptRelation>
    </ConceptRelationList>
    <TermList>
     <Term  ConceptPreferredTermYN="Y"  IsPermutedTermYN="N"  LexicalTag="NON"  RecordPreferredTermYN="Y">
      <TermUI>T000002</TermUI>
      <String>Calcimycin</String>
      <DateCreated>
       <Year>1999</Year>
       <Month>01</Month>
       <Day>01</Day>
      </DateCreated>
      <ThesaurusIDlist>
       <ThesaurusID>FDA SRS (2014)</ThesaurusID>
       <ThesaurusID>NLM (1975)</ThesaurusID>
      </ThesaurusIDlist>
     </Term>
    </TermList>
   </Concept>
   <Concept PreferredConceptYN="N">
    <ConceptUI>M0353609</ConceptUI>
    <ConceptName>
     <String>A-23187</String>
    </ConceptName>
    <RegistryNumber>0</RegistryNumber>
    <ConceptRelationList>
     <ConceptRelation RelationName="NRW">
     <Concept1UI>M0000001</Concept1UI>
     <Concept2UI>M0353609</Concept2UI>
     </ConceptRelation>
    </ConceptRelationList>
    <TermList>
     <Term  ConceptPreferredTermYN="Y"  IsPermutedTermYN="N"  LexicalTag="LAB"  RecordPreferredTermYN="N">
      <TermUI>T000001</TermUI>
      <String>A-23187</String>
      <DateCreated>
       <Year>1990</Year>
       <Month>03</Month>
       <Day>08</Day>
      </DateCreated>
      <ThesaurusIDlist>
       <ThesaurusID>NLM (1991)</ThesaurusID>
      </ThesaurusIDlist>
     </Term>
     <Term  ConceptPreferredTermYN="N"  IsPermutedTermYN="Y"  LexicalTag="LAB"  RecordPreferredTermYN="N">
      <TermUI>T000001</TermUI>
      <String>A 23187</String>
     </Term>
     <Term  ConceptPreferredTermYN="N"  IsPermutedTermYN="N"  LexicalTag="LAB"  RecordPreferredTermYN="N">
      <TermUI>T000004</TermUI>
      <String>A23187</String>
      <DateCreated>
       <Year>1974</Year>
       <Month>11</Month>
       <Day>11</Day>
      </DateCreated>
      <ThesaurusIDlist>
       <ThesaurusID>UNK (19XX)</ThesaurusID>
      </ThesaurusIDlist>
     </Term>
     <Term  ConceptPreferredTermYN="N"  IsPermutedTermYN="N"  LexicalTag="NON"  RecordPreferredTermYN="N">
      <TermUI>T000003</TermUI>
      <String>Antibiotic A23187</String>
      <DateCreated>
       <Year>1990</Year>
       <Month>03</Month>
       <Day>08</Day>
      </DateCreated>
      <ThesaurusIDlist>
       <ThesaurusID>NLM (1991)</ThesaurusID>
      </ThesaurusIDlist>
     </Term>
     <Term  ConceptPreferredTermYN="N"  IsPermutedTermYN="Y"  LexicalTag="NON"  RecordPreferredTermYN="N">
      <TermUI>T000003</TermUI>
      <String>A23187, Antibiotic</String>
     </Term>
    </TermList>
   </Concept>
  </ConceptList>
 </DescriptorRecord>
 ```


We use the MeSH dump in XML format downloaded by FTP. We use the 2018 version to have the maximum of corresponding terms.

Export the XML tree
```shell
$ xmlstarlet el -u desc2018.gz > desc2018_tree.txt
```

Result:
```shell
DescriptorRecordSet
DescriptorRecordSet/DescriptorRecord
DescriptorRecordSet/DescriptorRecord/AllowableQualifiersList
DescriptorRecordSet/DescriptorRecord/AllowableQualifiersList/AllowableQualifier
DescriptorRecordSet/DescriptorRecord/AllowableQualifiersList/AllowableQualifier/Abbreviation
DescriptorRecordSet/DescriptorRecord/AllowableQualifiersList/AllowableQualifier/QualifierReferredTo
DescriptorRecordSet/DescriptorRecord/AllowableQualifiersList/AllowableQualifier/QualifierReferredTo/QualifierName
DescriptorRecordSet/DescriptorRecord/AllowableQualifiersList/AllowableQualifier/QualifierReferredTo/QualifierName/String
DescriptorRecordSet/DescriptorRecord/AllowableQualifiersList/AllowableQualifier/QualifierReferredTo/QualifierUI
DescriptorRecordSet/DescriptorRecord/Annotation
DescriptorRecordSet/DescriptorRecord/ConceptList
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/CASN1Name
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/ConceptName
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/ConceptName/String
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/ConceptRelationList
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/ConceptRelationList/ConceptRelation
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/ConceptRelationList/ConceptRelation/Concept1UI
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/ConceptRelationList/ConceptRelation/Concept2UI
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/ConceptUI
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/RegistryNumber
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/RelatedRegistryNumberList
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/RelatedRegistryNumberList/RelatedRegistryNumber
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/ScopeNote
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/DateCreated
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/DateCreated/Day
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/DateCreated/Month
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/DateCreated/Year
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/EntryVersion
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/SortVersion
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/String
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/TermUI
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/ThesaurusIDlist
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/ThesaurusIDlist/ThesaurusID
DescriptorRecordSet/DescriptorRecord/ConsiderAlso
DescriptorRecordSet/DescriptorRecord/DateCreated
DescriptorRecordSet/DescriptorRecord/DateCreated/Day
DescriptorRecordSet/DescriptorRecord/DateCreated/Month
DescriptorRecordSet/DescriptorRecord/DateCreated/Year
DescriptorRecordSet/DescriptorRecord/DateEstablished
DescriptorRecordSet/DescriptorRecord/DateEstablished/Day
DescriptorRecordSet/DescriptorRecord/DateEstablished/Month
DescriptorRecordSet/DescriptorRecord/DateEstablished/Year
DescriptorRecordSet/DescriptorRecord/DateRevised
DescriptorRecordSet/DescriptorRecord/DateRevised/Day
DescriptorRecordSet/DescriptorRecord/DateRevised/Month
DescriptorRecordSet/DescriptorRecord/DateRevised/Year
DescriptorRecordSet/DescriptorRecord/DescriptorName
DescriptorRecordSet/DescriptorRecord/DescriptorName/String
DescriptorRecordSet/DescriptorRecord/DescriptorUI
DescriptorRecordSet/DescriptorRecord/EntryCombinationList
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECIN
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECIN/DescriptorReferredTo
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECIN/DescriptorReferredTo/DescriptorName
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECIN/DescriptorReferredTo/DescriptorName/String
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECIN/DescriptorReferredTo/DescriptorUI
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECIN/QualifierReferredTo
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECIN/QualifierReferredTo/QualifierName
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECIN/QualifierReferredTo/QualifierName/String
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECIN/QualifierReferredTo/QualifierUI
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECOUT
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECOUT/DescriptorReferredTo
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECOUT/DescriptorReferredTo/DescriptorName
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECOUT/DescriptorReferredTo/DescriptorName/String
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECOUT/DescriptorReferredTo/DescriptorUI
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECOUT/QualifierReferredTo
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECOUT/QualifierReferredTo/QualifierName
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECOUT/QualifierReferredTo/QualifierName/String
DescriptorRecordSet/DescriptorRecord/EntryCombinationList/EntryCombination/ECOUT/QualifierReferredTo/QualifierUI
DescriptorRecordSet/DescriptorRecord/HistoryNote
DescriptorRecordSet/DescriptorRecord/NLMClassificationNumber
DescriptorRecordSet/DescriptorRecord/OnlineNote
DescriptorRecordSet/DescriptorRecord/PharmacologicalActionList
DescriptorRecordSet/DescriptorRecord/PharmacologicalActionList/PharmacologicalAction
DescriptorRecordSet/DescriptorRecord/PharmacologicalActionList/PharmacologicalAction/DescriptorReferredTo
DescriptorRecordSet/DescriptorRecord/PharmacologicalActionList/PharmacologicalAction/DescriptorReferredTo/DescriptorName
DescriptorRecordSet/DescriptorRecord/PharmacologicalActionList/PharmacologicalAction/DescriptorReferredTo/DescriptorName/String
DescriptorRecordSet/DescriptorRecord/PharmacologicalActionList/PharmacologicalAction/DescriptorReferredTo/DescriptorUI
DescriptorRecordSet/DescriptorRecord/PreviousIndexingList
DescriptorRecordSet/DescriptorRecord/PreviousIndexingList/PreviousIndexing
DescriptorRecordSet/DescriptorRecord/PublicMeSHNote
DescriptorRecordSet/DescriptorRecord/SeeRelatedList
DescriptorRecordSet/DescriptorRecord/SeeRelatedList/SeeRelatedDescriptor
DescriptorRecordSet/DescriptorRecord/SeeRelatedList/SeeRelatedDescriptor/DescriptorReferredTo
DescriptorRecordSet/DescriptorRecord/SeeRelatedList/SeeRelatedDescriptor/DescriptorReferredTo/DescriptorName
DescriptorRecordSet/DescriptorRecord/SeeRelatedList/SeeRelatedDescriptor/DescriptorReferredTo/DescriptorName/String
DescriptorRecordSet/DescriptorRecord/SeeRelatedList/SeeRelatedDescriptor/DescriptorReferredTo/DescriptorUI
DescriptorRecordSet/DescriptorRecord/TreeNumberList
DescriptorRecordSet/DescriptorRecord/TreeNumberList/TreeNumber
```


### 3. Select the informations to extract with the UID

In our csse we want to extract all the Entry Terms ot MeSH terms with a particular "Pharmacological Action" 

```
Term informations (1):
DescriptorRecordSet/DescriptorRecord/DescriptorName/String
DescriptorRecordSet/DescriptorRecord/DescriptorUI

Pharmacological actions (n):
DescriptorRecordSet/DescriptorRecord/PharmacologicalActionList/PharmacologicalAction/DescriptorReferredTo/DescriptorName/String
DescriptorRecordSet/DescriptorRecord/PharmacologicalActionList/PharmacologicalAction/DescriptorReferredTo/DescriptorUI

Entry terms (n):
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/String
DescriptorRecordSet/DescriptorRecord/ConceptList/Concept/TermList/Term/TermUI
```

One MeSH term could have more than one Pharmacological action and entry terms, in those cases we extract all the terms repeating the name and the UI


In [1]:
import codecs
import glob
from lxml import etree

# Position of the MeSH file in XML gz format
myfilein = 'desc2018.gz'

# Name of the file with the results
myfileout = 'mesh2018_pharmacological_actions.tsv'

# create file
file = codecs.open(myfileout, 'w', 'utf-8')

# write first line
file.write('name\tui\tpharmacological_action_name\tpharmacological_action_ui\n')

# Parse XML
root = etree.parse(myfilein)

# select the node roots
mesh_term = root.xpath('/DescriptorRecordSet/DescriptorRecord')

for i in range(len(mesh_term)):
    mesh_name = mesh_term[i].xpath('DescriptorName/String')[0].text
    mesh_ui = mesh_term[i].xpath('DescriptorUI')[0].text
    
    # loop of pharmacological actions
    if (mesh_term[i].xpath('PharmacologicalActionList/PharmacologicalAction/DescriptorReferredTo')):
        pharmacological_actions = mesh_term[i].xpath('PharmacologicalActionList/PharmacologicalAction/DescriptorReferredTo')
        for k in range(len(pharmacological_actions)):
            pa_name = pharmacological_actions[k].xpath('DescriptorName/String')[0].text
            pa_ui = pharmacological_actions[k].xpath('DescriptorUI')[0].text
            # write info to file
            file.write(mesh_name)
            file.write('\t')
            file.write(mesh_ui)
            file.write('\t')
            file.write(pa_name)
            file.write('\t')
            file.write(pa_ui)
            file.write('\n')
file.close()

In [3]:
# same process but for entry terms

# Position of the MeSH file in XML gz format
myfilein = 'desc2018.gz'

# Name of the file with the results
myfileout = 'mesh2018_entry_terms.tsv'

# create file
file = codecs.open(myfileout, 'w', 'utf-8')

# write first line
file.write('name\tui\tentry_term_name\tentry_term_ui\n')

# Parse XML
root = etree.parse(myfilein)

# select the node roots
mesh_term = root.xpath('/DescriptorRecordSet/DescriptorRecord')

for i in range(len(mesh_term)):
    mesh_name = mesh_term[i].xpath('DescriptorName/String')[0].text
    mesh_ui = mesh_term[i].xpath('DescriptorUI')[0].text
    # loop of pharmacological actions
    if (mesh_term[i].xpath('ConceptList/Concept/TermList/Term')):
        entry_terms = mesh_term[i].xpath('ConceptList/Concept/TermList/Term')
        for k in range(len(entry_terms)):
            et_name = entry_terms[k].xpath('String')[0].text
            et_ui = entry_terms[k].xpath('TermUI')[0].text
            # write info to file
            file.write(mesh_name)
            file.write('\t')
            file.write(mesh_ui)
            file.write('\t')
            file.write(et_name)
            file.write('\t')
            file.write(et_ui)
            file.write('\n')
file.close()

### Merge of pharmacological actions with Entry Terms

In [4]:
import pandas as pd
# open pharmacological actions
pa = pd.read_csv('mesh2018_pharmacological_actions.tsv', delimiter='\t', header=0)
pa

Unnamed: 0,name,ui,pharmacological_action_name,pharmacological_action_ui
0,Calcimycin,D000001,Anti-Bacterial Agents,D000900
1,Calcimycin,D000001,Calcium Ionophores,D061207
2,Temefos,D000002,Insecticides,D007306
3,Abscisic Acid,D000040,Plant Growth Regulators,D010937
4,Aripiprazole,D000068180,Antipsychotic Agents,D014150
5,Albumin-Bound Paclitaxel,D000068196,Antineoplastic Agents,D000970
6,Lubiprostone,D000068238,Chloride Channel Agonists,D065101
7,Darbepoetin alfa,D000068256,Hematinics,D006397
8,"Efavirenz, Emtricitabine, Tenofovir Disoproxil...",D000068257,Reverse Transcriptase Inhibitors,D018894
9,"Efavirenz, Emtricitabine, Tenofovir Disoproxil...",D000068257,Anti-HIV Agents,D019380


In [5]:
# open entry terms
et = pd.read_csv('mesh2018_entry_terms.tsv', delimiter='\t', header=0)
et

Unnamed: 0,name,ui,entry_term_name,entry_term_ui
0,Calcimycin,D000001,Calcimycin,T000002
1,Calcimycin,D000001,A-23187,T000001
2,Calcimycin,D000001,A 23187,T000001
3,Calcimycin,D000001,A23187,T000004
4,Calcimycin,D000001,Antibiotic A23187,T000003
5,Calcimycin,D000001,"A23187, Antibiotic",T000003
6,Temefos,D000002,Temefos,T000008
7,Temefos,D000002,Temephos,T000007
8,Temefos,D000002,Abate,T000005
9,Temefos,D000002,Difos,T000006


In [6]:
# merge both
pa_et = pd.merge(pa, et, how='left', on='ui')
pa_et

Unnamed: 0,name_x,ui,pharmacological_action_name,pharmacological_action_ui,name_y,entry_term_name,entry_term_ui
0,Calcimycin,D000001,Anti-Bacterial Agents,D000900,Calcimycin,Calcimycin,T000002
1,Calcimycin,D000001,Anti-Bacterial Agents,D000900,Calcimycin,A-23187,T000001
2,Calcimycin,D000001,Anti-Bacterial Agents,D000900,Calcimycin,A 23187,T000001
3,Calcimycin,D000001,Anti-Bacterial Agents,D000900,Calcimycin,A23187,T000004
4,Calcimycin,D000001,Anti-Bacterial Agents,D000900,Calcimycin,Antibiotic A23187,T000003
5,Calcimycin,D000001,Anti-Bacterial Agents,D000900,Calcimycin,"A23187, Antibiotic",T000003
6,Calcimycin,D000001,Calcium Ionophores,D061207,Calcimycin,Calcimycin,T000002
7,Calcimycin,D000001,Calcium Ionophores,D061207,Calcimycin,A-23187,T000001
8,Calcimycin,D000001,Calcium Ionophores,D061207,Calcimycin,A 23187,T000001
9,Calcimycin,D000001,Calcium Ionophores,D061207,Calcimycin,A23187,T000004


In [7]:
# remove repeated name
del pa_et['name_y']

In [8]:
# rename name_x
pa_et = pa_et.rename(columns={'name_x': 'name'})
pa_et

Unnamed: 0,name,ui,pharmacological_action_name,pharmacological_action_ui,entry_term_name,entry_term_ui
0,Calcimycin,D000001,Anti-Bacterial Agents,D000900,Calcimycin,T000002
1,Calcimycin,D000001,Anti-Bacterial Agents,D000900,A-23187,T000001
2,Calcimycin,D000001,Anti-Bacterial Agents,D000900,A 23187,T000001
3,Calcimycin,D000001,Anti-Bacterial Agents,D000900,A23187,T000004
4,Calcimycin,D000001,Anti-Bacterial Agents,D000900,Antibiotic A23187,T000003
5,Calcimycin,D000001,Anti-Bacterial Agents,D000900,"A23187, Antibiotic",T000003
6,Calcimycin,D000001,Calcium Ionophores,D061207,Calcimycin,T000002
7,Calcimycin,D000001,Calcium Ionophores,D061207,A-23187,T000001
8,Calcimycin,D000001,Calcium Ionophores,D061207,A 23187,T000001
9,Calcimycin,D000001,Calcium Ionophores,D061207,A23187,T000004


In [9]:
# export to tsv
pa_et[['pharmacological_action_name', 'pharmacological_action_ui', 'name', 'ui', 'entry_term_name', 'entry_term_ui']].sort_values(by='pharmacological_action_name').to_csv('mesh2018_pharmacological_actions_entry_terms.tsv', sep='\t', index=False)

In [10]:
# chek the terms of one pharmacological action, eg: Antihypertensive Agents
pa_et.loc[pa_et['pharmacological_action_name'] == 'Antihypertensive Agents']

Unnamed: 0,name,ui,pharmacological_action_name,pharmacological_action_ui,entry_term_name,entry_term_ui
308,Brimonidine Tartrate,D000068438,Antihypertensive Agents,D000959,Brimonidine Tartrate,T000875824
309,Brimonidine Tartrate,D000068438,Antihypertensive Agents,D000959,"Tartrate, Brimonidine",T000875824
310,Brimonidine Tartrate,D000068438,Antihypertensive Agents,D000959,5-Bromo-6-(2-imidazolin-2-ylamino)quinoxaline ...,T000876771
311,Brimonidine Tartrate,D000068438,Antihypertensive Agents,D000959,"UK 14,304",T095045
312,Brimonidine Tartrate,D000068438,Antihypertensive Agents,D000959,UK-14304,T095051
313,Brimonidine Tartrate,D000068438,Antihypertensive Agents,D000959,UK14304,T095051
314,Brimonidine Tartrate,D000068438,Antihypertensive Agents,D000959,UK 14304,T095047
315,Brimonidine Tartrate,D000068438,Antihypertensive Agents,D000959,Ratio-Brimonidine,T000875829
316,Brimonidine Tartrate,D000068438,Antihypertensive Agents,D000959,Ratio Brimonidine,T000875829
317,Brimonidine Tartrate,D000068438,Antihypertensive Agents,D000959,"UK 14,304-18",T095046


In [11]:
# export entry terms for one pharmacological action
pa_et.loc[pa_et['pharmacological_action_name'] == 'Antihypertensive Agents'][['pharmacological_action_name', 'pharmacological_action_ui', 'name', 'ui', 'entry_term_name', 'entry_term_ui']].sort_values(by='entry_term_name').to_csv('mesh2018_antihypertensive_agents.tsv', sep='\t', index=False)

In [12]:
# export only entry terms for one pharmacological action
pa_et.loc[pa_et['pharmacological_action_name'] == 'Antihypertensive Agents']['entry_term_name'].to_csv('mesh2018_antihypertensive_agents_entry_terms.txt', sep='\t', index=False)