# Search pubmed and save results

This example demonstrates the typical workflow to query pubmed and store
the results. The following backends are supported for storing the results:

* [MySQL](#MySQL)
* [SQLite](#SQLite)
* [Citation (endnote/bibtex)](#citations)

Authors: Isabel Restrepo <br>
BCBI - Brown University <br>
Version: Julia 0.6

In [1]:
using BioMedQuery.Processes
using BioMedQuery.PubMed



In [2]:
email= "" #Only needed if you want to contact NCBI with inqueries
search_term="(obesity[MeSH Major Topic]) AND (\"2010\"[Date - Publication] : \"2012\"[Date - Publication])"
max_articles = 5
results_dir = "./results"
verbose = false;

<a id='MySQL'></a>
## MySQL backend

In [11]:
mysql_config = Dict(:host=>"localhost",
                    :dbname=>"pubmed_obesity_2010_2012",
                    :username=>"root",
                    :pswd=>"",
                    :overwrite=>true)
db_mysql = pubmed_search_and_save(email, search_term, max_articles,
    save_efetch_mysql, mysql_config, verbose)

Getting 5 articles, starting at index 0
------ESearch--------
------EFetch--------
------Save to database--------
Initializing MySQL Database
Set to overwrite MySQL database pubmed_obesity_2010_2012
Database pubmed_obesity_2010_2012 created and initialized
Saving 5 articles to database
Finished searching, total number of articles: 5


MySQL Handle
------------
Host: localhost
User: root
DB:   pubmed_obesity_2010_2012


### Access all PMIDS

In [4]:
display(all_pmids(db_mysql))

5-element DataArrays.DataArray{Int32,1}:
 24315250
 24444198
 24533500
 24694474
 25548090

### Explore tables

* You may use the MySQL command directly or you can use BioMedQuery.DBUtils for a common interface to MySQL/SQLite

In [5]:
tables = ["author", "author2article", "mesh_descriptor",
"mesh_qualifier", "mesh_heading"]

for t in tables
    query_str = "SELECT * FROM "*t*" LIMIT 5;"
    q = BioMedQuery.DBUtils.db_query(db_mysql, query_str)
    println(q)
end

5×3 DataFrames.DataFrame
│ Row │ id │ forename            │ lastname          │
├─────┼────┼─────────────────────┼───────────────────┤
│ 1   │ 31 │ "A"                 │ "Carbonell-Baeza" │
│ 2   │ 26 │ "Alexandre Paulino" │ "de Faria"        │
│ 3   │ 23 │ "Andrea Maculano"   │ "Esteves"         │
│ 4   │ 32 │ "C"                 │ "Gatto-Cardia"    │
│ 5   │ 24 │ "Carolina"          │ "Ackel-D'Elia"    │
5×2 DataFrames.DataFrame
│ Row │ aid │ pmid     │
├─────┼─────┼──────────┤
│ 1   │ 29  │ 24315250 │
│ 2   │ 30  │ 24315250 │
│ 3   │ 31  │ 24315250 │
│ 4   │ 32  │ 24315250 │
│ 5   │ 33  │ 24315250 │
5×2 DataFrames.DataFrame
│ Row │ id    │ name                   │
├─────┼───────┼────────────────────────┤
│ 1   │ 328   │ "Adult"                │
│ 2   │ 17677 │ "Age Distribution"     │
│ 3   │ 368   │ "Aged"                 │
│ 4   │ 369   │ "Aged, 80 and over"    │
│ 5   │ 704   │ "Analysis of Variance" │
5×2 DataFrames.DataFrame
│ Row │ id  │ name            │
├─────┼─────┼────────

<a id='SQLite'></a>
## SQLite backend

In [6]:
sqlite_config = Dict(:db_path=>"$(results_dir)/pubmed_obesity_2010_2012.db",
              :overwrite=>true)
db_sqlite = pubmed_search_and_save(email, search_term, max_articles,
    save_efetch_sqlite, sqlite_config, verbose)

Getting 5 articles, starting at index 0
------ESearch--------
------EFetch--------
------Save to database--------
Initializing SQLite Database
Saving 5 articles to database
Finished searching, total number of articles: 5


SQLite.DB("./results/pubmed_obesity_2010_2012.db")

### Access all PMIDS

In [7]:
display(all_pmids(db_sqlite))

5-element Array{Int64,1}:
 24315250
 24444198
 24533500
 24694474
 25548090

### Explore tables

* You may use the SQLite commands directly or you can use BioMedQuery.DBUtils for a common interface to MySQL/SQLite

In [8]:
tables = ["author", "author2article", "mesh_descriptor",
"mesh_qualifier", "mesh_heading"]

for t in tables
    query_str = "SELECT * FROM "*t*" LIMIT 5;"
    q = BioMedQuery.DBUtils.db_query(db_sqlite, query_str)
    println(q)
end

5×3 DataFrames.DataFrame
│ Row │ id │ forename    │ lastname   │
├─────┼────┼─────────────┼────────────┤
│ 1   │ 1  │ "Eun Sun"   │ "So"       │
│ 2   │ 2  │ "Kwang Soo" │ "Yoo"      │
│ 3   │ 3  │ "Masaru"    │ "Sakurai"  │
│ 4   │ 4  │ "Koshi"     │ "Nakamura" │
│ 5   │ 5  │ "Katsuyuki" │ "Miura"    │
5×2 DataFrames.DataFrame
│ Row │ aid │ pmid     │
├─────┼─────┼──────────┤
│ 1   │ 1   │ 25548090 │
│ 2   │ 2   │ 25548090 │
│ 3   │ 3   │ 24694474 │
│ 4   │ 4   │ 24694474 │
│ 5   │ 5   │ 24694474 │
5×2 DataFrames.DataFrame
│ Row │ id   │ name                   │
├─────┼──────┼────────────────────────┤
│ 1   │ 328  │ "Adult"                │
│ 2   │ 368  │ "Aged"                 │
│ 3   │ 369  │ "Aged, 80 and over"    │
│ 4   │ 704  │ "Analysis of Variance" │
│ 5   │ 1835 │ "Body Weight"          │
5×2 DataFrames.DataFrame
│ Row │ id  │ name            │
├─────┼─────┼─────────────────┤
│ 1   │ 32  │ "analysis"      │
│ 2   │ 97  │ "blood"         │
│ 3   │ 150 │ "complications" │
│ 4  

<a id='citations'></a>

## Citations

* Citation type can be "endnote" or "bibtex"

In [9]:
enw_file = "$(results_dir)/pubmed_obesity_2010_2012.enw"
endnote_config = Dict(:type => "endnote", 
                      :output_file => enw_file, 
                      :overwrite=> true)
pubmed_search_and_save(email, search_term, max_articles,
    save_article_citations, endnote_config, verbose);

Getting 5 articles, starting at index 0
------ESearch--------
------EFetch--------
------Save to database--------
Saving citation for 5 articles
Finished searching, total number of articles: 5


In [10]:
println(readstring(enw_file))

%0 Journal Article
%A So, ES
%A Yoo, KS
%D 2012
%T Waist circumference cutoff points for central obesity in the Korean elderly population.
%J J Appl Gerontol
%V 34
%N 1
%P 102-17
%M 25548090
%U http://www.ncbi.nlm.nih.gov/pubmed/25548090
%X The aim is to determine the appropriate cutoff values of waist circumference (WC) for an increased risk of the metabolic syndrome in the Korean elderly population. We analyzed the WC cutoff values of four groups divided according to sex and age with a total of 2,224 elderly participants aged 65 years old and above from the Fourth Korean National Health and Nutrition Examination Survey using the receiver operating characteristic curve and multiple logistic regression. The WC cutoff values associated with an increased risk of metabolic syndrome were 89.6 cm for men and 90.5 cm for women for those who were 65 to 74 years old, and 89.9 cm for men and 87.9 cm for women for those who were 75 years old or older. WC cutoff points for estimating metabolic ri