Skip to content

An R package that provides streamlined access to multiple PubMed databases, including abstracts, bibliometrics, NER annotations, and full texts.

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

jaytimm/puremoe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

puremoe

PubMed Unified REtrieval for Multi-Output Exploration

An R package that provides a single interface for accessing a range of NLM/PubMed databases, including PubMed abstract records, iCite bibliometric data, PubTator3 named entity annotations, and full-text entries from PubMed Central (PMC). This unified interface simplifies the data retrieval process, allowing users to interact with multiple PubMed services/APIs/output formats through a single R function.

The package also includes MeSH thesaurus resources as simple data frames, including Descriptor Terms, Descriptor Tree Structures, Supplementary Concept Terms, and Pharmacological Actions; it also includes descriptor-level word embeddings (Noh & Kavuluru 2021). Via the mesh-resources library.

Installation

Get the released version from CRAN:

install.packages('puremoe')

Or the development version from GitHub with:

devtools::install_github("jaytimm/puremoe")

Usage

PubMed search

The package has two basic functions: search_pubmed and get_records. The former fetches PMIDs from the PubMed API based on user search; the latter scrapes PMID records from a user-specified PubMed endpoint – pubmed_abstracts, pubmed_affiliations, pubtations, icites, or pmc.

Search syntax is the same as that implemented in standard PubMed search.

pmids <- puremoe::search_pubmed('("political ideology"[TiAb])',
                                 use_pub_years = F)

# pmids <- puremoe::search_pubmed('immunity', 
#                                  use_pub_years = T,
#                                  start_year = 2022,
#                                  end_year = 2024) 

Get record-level data

pubmed <- pmids |> 
  puremoe::get_records(endpoint = 'pubmed_abstracts', 
                       cores = 3, 
                       sleep = 1) 

affiliations <- pmids |> 
  puremoe::get_records(endpoint = 'pubmed_affiliations', 
                       cores = 1, 
                       sleep = 0.5)

icites <- pmids |>
  puremoe::get_records(endpoint = 'icites',
                       cores = 3,
                       sleep = 0.25)

pubtations <- pmids |> 
  puremoe::get_records(endpoint = 'pubtations',
                       cores = 2)

When the endpoint is PMC, the `get_records() function takes a vector of filepaths (from the PMC Open Access list) instead of PMIDs.

pmclist <- puremoe::data_pmc_list(force_install = F)
pmc_pmids <- pmclist[PMID %in% pmids]

pmc_fulltext <- pmc_pmids$fpath[1:5] |> 
  puremoe::get_records(endpoint = 'pmc', cores = 2)

Summary

About

An R package that provides streamlined access to multiple PubMed databases, including abstracts, bibliometrics, NER annotations, and full texts.

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages