# Using UMLS Concepts with MeSH

The Medical Subject Headings (MeSH) terms returned from a PubMed search can be further analyzed
by mapping them to Unified Medical Language System (UMLS) concepts, as well as
filtering the MeSH Terms by concepts.

For both mapping MeSH to UMLS Concepts and filtering MeSH by concept, the following backends are supported:
* MySQL
* SQLite
* DataFrames

### Set Up

In [1]:
using SQLite
using MySQL
using BioMedQuery.DBUtils
using BioMedQuery.Processes
using BioServices.UMLS
using BioMedQuery.PubMed
using DataFrames

Credentials are environment variables (e.g set in your .juliarc.jl)

In [2]:
umls_user = ENV["UMLS_USER"];
umls_pswd = ENV["UMLS_PSSWD"];
email = ""; # Only needed if you want to contact NCBI with inqueries
search_term = """(obesity[MeSH Major Topic]) AND ("2010"[Date - Publication] : "2012"[Date - Publication])""";
umls_concept = "Disease or Syndrome";
max_articles = 5;
results_dir = ".";
verbose = true;

results_dir = ".";

## MySQL

### Map Medical Subject Headings (MeSH) to UMLS

This example demonstrates the typical workflow to populate a MESH2UMLS database
table relating all concepts associated with all MeSH terms in the input database.

*Note: this example reuses the MySQL DB from the PubMed Search and Save example.*

Create MySQL DB connection

In [3]:
host = "127.0.0.1";
mysql_usr = "root";
mysql_pswd = "";
dbname = "pubmed_obesity_2010_2012";

db_mysql = MySQL.connect(host, mysql_usr, mysql_pswd, db = dbname);

Map MeSH to UMLS

In [4]:
@time map_mesh_to_umls_async!(db_mysql, umls_user, umls_pswd; append_results=false, timeout=3);

----------Matching MESH to UMLS-----------
String["Adult", "Aged", "Aged, 80 and over", "Analysis of Variance", "Body Weight", "C-Reactive Protein", "Child", "Cross-Sectional Studies", "Fatigue", "Female", "Fibromyalgia", "Germany", "Health Status", "Humans", "Japan", "Male", "Middle Aged", "Nutrition Surveys", "Obesity", "Pain", "Pain Measurement", "Physical Fitness", "Prognosis", "Quality of Life", "Surveys and Questionnaires", "Reference Values", "Risk Factors", "ROC Curve", "Severity of Illness Index", "Sports", "Television", "Thyrotropin", "Biomarkers", "Weight Gain", "Exercise", "Body Mass Index", "Incidence", "Prevalence", "Logistic Models", "Odds Ratio", "Case-Control Studies", "Age Distribution", "Sex Distribution", "Sleep Apnea, Obstructive", "Metabolic Syndrome", "Overweight", "Waist Circumference", "Young Adult", "Obesity, Abdominal", "Republic of Korea", "Sedentary Lifestyle", "Pediatric Obesity"]
INFO: UTS: Reading TGT from file
243.870113 seconds (2.26 M allocations: 254

#### Explore the output table

In [5]:
db_query(db_mysql, "SELECT * FROM mesh2umls")

Unnamed: 0,mesh,umls


### Filtering MeSH terms by UMLS concept

Getting the descriptor to index dictionary and the occurence matrix

In [6]:
@time labels2ind, occur = umls_semantic_occurrences(db_mysql, umls_concept);

Filter mesh query string : SELECT mesh FROM mesh2umls WHERE umls IN ('Disease or Syndrome')
-------------------------------------------------------------
Found 0 articles with valid descriptors
-------------------------------------------------------------
  0.844031 seconds (347.45 k allocations: 18.242 MiB)


Descriptor to Index Dictionary

In [7]:
labels2ind

Dict{String,Int64} with 0 entries

Output Data Matrix

In [8]:
full(occur)

0×5 Array{Float64,2}

## SQLite

This example demonstrates the typical workflow to populate a MESH2UMLS database
table relating all concepts associated with all MeSH terms in the input database.

*Note: this example reuses the SQLite DB from the PubMed Search and Save example.*

Create SQLite DB connection

In [9]:
db_path = "$(results_dir)/pubmed_obesity_2010_2012.db";
db_sqlite = SQLite.DB(db_path);

if isfile(db_path) # hide
    rm(db_path) # hide
end # hide
db_sqlite = SQLite.DB(db_path); # hide
PubMed.create_tables!(db_sqlite); # hide
Processes.pubmed_search_and_save!(email, search_term, max_articles, db_sqlite, false) # hide

Getting 5 articles, starting at index 0
------ESearch--------
------EFetch--------
------Save to database--------
Saving 5 articles to database
Finished searching, total number of articles: 5


### Map MeSH to UMLS

In [10]:
@time map_mesh_to_umls_async!(db_sqlite, umls_user, umls_pswd; append_results=false, timeout=3);

----------Matching MESH to UMLS-----------
Union{Missings.Missing, String}["Reference Values", "Republic of Korea", "ROC Curve", "Fatigue", "Obesity", "Risk Factors", "Logistic Models", "Severity of Illness Index", "Male", "Case-Control Studies", "Analysis of Variance", "Sedentary Lifestyle", "Prevalence", "Quality of Life", "Odds Ratio", "Exercise", "Body Mass Index", "Aged", "Child", "Sex Distribution", "Adult", "Germany", "Sports", "Thyrotropin", "Pediatric Obesity", "Humans", "Japan", "Cross-Sectional Studies", "Weight Gain", "Middle Aged", "Surveys and Questionnaires", "Health Status", "Young Adult", "Incidence", "Prognosis", "Body Weight", "Pain Measurement", "Waist Circumference", "Metabolic Syndrome", "Pain", "Nutrition Surveys", "Fibromyalgia", "Sleep Apnea, Obstructive", "Television", "Age Distribution", "Overweight", "Physical Fitness", "Female", "Biomarkers", "Obesity, Abdominal", "C-Reactive Protein", "Aged, 80 and over"]
INFO: UTS: Reading TGT from file
215.963451 seconds

Explore the output table

In [11]:
db_query(db_sqlite, "SELECT * FROM mesh2umls;")

Unnamed: 0,mesh,umls


### Filtering MeSH terms by UMLS concept

Getting the descriptor to index dictionary and occurence matrix

In [12]:
@time labels2ind, occur = umls_semantic_occurrences(db_sqlite, umls_concept);

Filter mesh query string : SELECT mesh FROM mesh2umls WHERE umls IN ('Disease or Syndrome')
-------------------------------------------------------------
Found 0 articles with valid descriptors
-------------------------------------------------------------
  0.309646 seconds (104.09 k allocations: 5.471 MiB)


Descriptor to Index Dictionary

In [13]:
labels2ind

Dict{String,Int64} with 0 entries

Output Data Matrix

In [14]:
full(occur)

0×5 Array{Float64,2}

## DataFrames

This example demonstrates the typical workflow to create a MeSH to UMLS map as a DataFrame
relating all concepts associated with all MeSH terms in the input dataframe.

Get the articles (same as example in PubMed Search and Parse)

In [15]:
dfs = Processes.pubmed_search_and_parse(email, search_term, max_articles, verbose)

Getting 5 articles, starting at index 0
------ESearch--------
------EFetch--------
------Save to dataframes--------


Dict{String,DataFrames.DataFrame} with 8 entries:
  "basic"               => 5×13 DataFrames.DataFrame. Omitted printing of 9 col…
  "mesh_desc"           => 52×2 DataFrames.DataFrame…
  "mesh_qual"           => 9×2 DataFrames.DataFrame…
  "pub_type"            => 10×3 DataFrames.DataFrame…
  "abstract_full"       => 5×2 DataFrames.DataFrame. Omitted printing of 1 colu…
  "author_ref"          => 35×8 DataFrames.DataFrame. Omitted printing of 3 col…
  "mesh_heading"        => 78×5 DataFrames.DataFrame…
  "abstract_structured" => 4×4 DataFrames.DataFrame. Omitted printing of 1 colu…

Map MeSH to UMLS and explore the output table

In [16]:
@time res = map_mesh_to_umls_async(dfs["mesh_desc"], umls_user, umls_pswd)

----------Matching MESH to UMLS-----------
Any["Reference Values", "Republic of Korea", "ROC Curve", "Fatigue", "Obesity", "Risk Factors", "Logistic Models", "Severity of Illness Index", "Male", "Case-Control Studies", "Analysis of Variance", "Sedentary Lifestyle", "Prevalence", "Quality of Life", "Odds Ratio", "Exercise", "Body Mass Index", "Aged", "Child", "Sex Distribution", "Adult", "Germany", "Sports", "Thyrotropin", "Pediatric Obesity", "Humans", "Japan", "Cross-Sectional Studies", "Weight Gain", "Middle Aged", "Surveys and Questionnaires", "Health Status", "Young Adult", "Incidence", "Prognosis", "Body Weight", "Pain Measurement", "Waist Circumference", "Metabolic Syndrome", "Pain", "Nutrition Surveys", "Fibromyalgia", "Sleep Apnea, Obstructive", "Television", "Age Distribution", "Overweight", "Physical Fitness", "Female", "Biomarkers", "Obesity, Abdominal", "C-Reactive Protein", "Aged, 80 and over"]
INFO: UTS: Reading TGT from file
201.797513 seconds (1.85 M allocations: 230.66

Unnamed: 0,descriptor,concept


Getting the descriptor to index dictionary and occurence matrix

In [17]:
@time labels2ind, occur = umls_semantic_occurrences(dfs, res, umls_concept);

-------------------------------------------------------------
Found 0 articles with valid descriptors
-------------------------------------------------------------
  0.270515 seconds (164.13 k allocations: 8.419 MiB)


Descriptor to Index Dictionary

In [18]:
labels2ind

Dict{String,Int64} with 0 entries

Output Data Matrix

In [19]:
full(occur)

0×5 Array{Float64,2}

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*