Skip to content

jaytimm/mesh-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Some MeSH resources

A simple library that makes available the Medical Subject Headings (MeSH) vocabulary and tree structure in simple data frame format. Tables for both Descriptor Records and Supplementary Concept Records are included. Pharmacological Actions have have been extracted from both descriptor and supplementary concept files, and collated in a single table. Lastly, descriptor-level word embeddings derived in/made available by Noh & Kavuluru (2021) are included. The R code for XML extraction and restructuring processes is available here. These data are utilitzed in the R package pubmedtk.

MeSH ontology

Based on MeSH files desc2024, mtrees2024 & supp2024.

descriptor-terms

Raw data

readRDS('data/data_mesh_thesaurus.rds') |>
  head() |> knitr::kable()
DescriptorUI DescriptorName ConceptUI TermUI TermName ConceptPreferredTermYN IsPermutedTermYN LexicalTag RecordPreferredTermYN
D000001 Calcimycin M0000001 T000002 Calcimycin Y N NON Y
D000001 Calcimycin M0000001 T001124965 4-Benzoxazolecarboxylic acid, 5-(methylamino)-2-((3,9,11-trimethyl-8-(1-methyl-2-oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7-dioxaspiro(5.5)undec-2-yl)methyl)-, (6S-(6alpha(2S*,3S*),8beta(R*),9beta,11alpha))- N N NON N
D000001 Calcimycin M0353609 T000001 A-23187 Y N LAB N
D000001 Calcimycin M0353609 T000001 A 23187 N Y LAB N
D000001 Calcimycin M0353609 T000004 A23187 N N LAB N
D000001 Calcimycin M0353609 T000003 Antibiotic A23187 N N NON N

descriptor-tree-numbers

Raw data

readRDS('data/data_mesh_trees.rds') |>
  head() |> knitr::kable()
DescriptorUI DescriptorName tree_location code cats mesh1 mesh2 tree1 tree2
D000001 Calcimycin D02.355.291.933.125 D Chemicals and Drugs Organic Chemicals Ethers D02 D02.355
D000001 Calcimycin D02.540.576.625.125 D Chemicals and Drugs Organic Chemicals Lactones D02 D02.540
D000001 Calcimycin D03.633.100.221.173 D Chemicals and Drugs Heterocyclic Compounds Heterocyclic Compounds, Fused-Ring D03 D03.633
D000001 Calcimycin D04.345.241.654.125 D Chemicals and Drugs Polycyclic Compounds Macrocyclic Compounds D04 D04.345
D000001 Calcimycin D04.345.674.625.125 D Chemicals and Drugs Polycyclic Compounds Macrocyclic Compounds D04 D04.345
D000002 Temefos D02.705.400.625.800 D Chemicals and Drugs Organic Chemicals Organophosphorus Compounds D02 D02.705

supplemental-terms

Raw data

readRDS('data/data_scr_thesaurus.rds') |>
  head() |> knitr::kable()
DescriptorUI DescriptorName ConceptUI TermUI TermName ConceptPreferredTermYN IsPermutedTermYN LexicalTag RecordPreferredTermYN
C114158 quantum dye macrocyclic europium-chelate M0294520 T324525 quantum dye macrocyclic europium-chelate Y N NON Y
C114158 quantum dye macrocyclic europium-chelate M0294520 T324524 QD macrocyclic N N NON N
C008718 osteoclast activating factor M0052808 T082811 osteoclast activating factor Y N NON Y
C008720 otoline M0052812 T082815 otoline Y N NON Y
C002540 miracil A M0043189 T073192 miracil A Y N NON Y
C055240 Leakadine M0155620 T185625 Leakadine Y N TRD Y

Pharmacological Actions

For drugs included in both MeSH-proper and Supplementary Concept Records.

readRDS('data/data_pharm_action.rds') |>
  head() |> knitr::kable()
DescriptorUI DescriptorName PharmActionUI PharmActionName
D000001 Calcimycin D000900 Anti-Bacterial Agents
D000001 Calcimycin D061207 Calcium Ionophores
D000002 Temefos D007306 Insecticides
D000040 Abscisic Acid D010937 Plant Growth Regulators
D000068180 Aripiprazole D000928 Antidepressive Agents
D000068180 Aripiprazole D014150 Antipsychotic Agents

Notes & useful links:

‘SCR records are created for some chemicals, drugs, and other concepts such as rare diseases. They are labeled as “MeSH Supplementary Concept Data” and the unique ID begins with the letter “C.”’

‘Supplementary Concept Records - these are not full MeSH Headings and do not fall under the MeSH tree hierarchy. Many times they are used to identify substances that are not included in the MeSH terms.’

‘These do not belong to the controlled vocabulary as such and are not used for indexing MEDLINE articles; instead they enlarge the thesaurus and contain links to the closest fitting descriptor to be used in a MEDLINE search. Many of these records describe chemical substances.’

Descriptor embeddings

Noh, J., & Kavuluru, R. (2021). Improved biomedical word embeddings in the transformer era. Journal of Biomedical Informatics, 120, 103867.

https://zenodo.org/record/4383195#.Y1wDBb7MJhE

“BERT-CRel is a transformer model for fine-tuning biomedical word embeddings that are jointly learned along with concept embeddings using a pre-training phase with fastText and a fine-tuning phase with a transformer setup. The goal is to provide high quality pre-trained biomedical embeddings that can be used in any downstream task by the research community. The corpus used for BERT-CRel contains biomedical citations from PubMed and the concepts are from the Medical Subject Headings (MeSH codes) terminology used to index citations.”

readRDS('data/data_scr_embeddings.rds')
readRDS('data/data_mesh_embeddings.rds')

About

Medical Subject Headings (MeSH) vocabulary and tree structure in simple data frame format.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published