Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate any extra data that is available via Pubmed API #1668

Open
peetucket opened this issue Dec 15, 2023 · 2 comments
Open

Investigate any extra data that is available via Pubmed API #1668

peetucket opened this issue Dec 15, 2023 · 2 comments

Comments

@peetucket
Copy link
Member

peetucket commented Dec 15, 2023

Question from Tina in Slack:

I am looking into what information is available to us from PubMed from the import.  
When reviewing the PubMed website, for example https://pubmed.ncbi.nlm.nih.gov/26422724/, 
I can see information such as Cited by, Associated Data, Related Information, and Grants and funding.  
Is any of this information available in the feed we get from PubMed?  Thank you.  
(note:  the PubMed link above is not from a SoM profile, so they are not part of Stanford.)

Investigate what comes back from the Pubmed API, and is there a way we can quest extra data (such as "Cited by", "Related information", "Funding", etc.) See the web page results view for the fields shown.

@peetucket
Copy link
Member Author

peetucket commented Dec 15, 2023

I just took a quick look at what the Pubmed API sends us back for the example record above. I'll attach the full XML response below, but as I quickly scanned it, I didn't see the extra data (Cited by, Related pubs, Grants, Funding, etc.) I did see the reference list in a simple text citation format. At the moment, when we parse that XML we only store what is needed for the current data model response we send back to Profiles, the other parts of the XML are ignored.

rec = Pubmed::Client.new.fetch_records_for_pmid_list('26422724')
puts rec

I briefly looked at the Pubmed API documentation (https://www.ncbi.nlm.nih.gov/books/NBK25500/#chapter1.Searching_a_Database) and didn't see any obvious extra params we could send to increase the amount of data returned. Could be an area of more investigation.

Suggest a bit further investigation/reading.

pubmed_26422724.xml.zip

@edsu edsu changed the title Invesitgate any extra data that is available via Pubmed API Investigate any extra data that is available via Pubmed API Dec 15, 2023
@edsu
Copy link
Contributor

edsu commented Dec 15, 2023

The Cited By results in https://pubmed.ncbi.nlm.nih.gov/26422724/ appear to all be citations from other PubMed articles? It looks like they have a separate API endpoint for those, which would require another lookup by ID to get the metadata?

$ curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&linkname=pubmed_pmc_refs&id=26422724" | xmllint --format -
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE eLinkResult PUBLIC "-//NLM//DTD elink 20101123//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20101123/elink.dtd">
<eLinkResult>
  <LinkSet>
    <DbFrom>pubmed</DbFrom>
    <IdList>
      <Id>26422724</Id>
    </IdList>
    <LinkSetDb>
      <DbTo>pmc</DbTo>
      <LinkName>pubmed_pmc_refs</LinkName>
      <Link>
        <Id>10676511</Id>
      </Link>
      <Link>
        <Id>10650975</Id>
      </Link>
      <Link>
        <Id>10439485</Id>
      </Link>
      <Link>
        <Id>10275576</Id>
      </Link>
      <Link>
        <Id>10118745</Id>
      </Link>
      <Link>
        <Id>10106992</Id>
      </Link>
      <Link>
        <Id>10080461</Id>
      </Link>
      ...
  </LinkSet>
</eLinkResult>

For the Associated Data it looks like the item that was mentioned is in the XML, but would require some kind of look up to get

<DataBankList CompleteYN="Y">
  <DataBank>
    <DataBankName>ClinicalTrials.gov</DataBankName>
    <AccessionNumberList>
      <AccessionNumber>NCT01681875</AccessionNumber>
    </AccessionNumberList>
  </DataBank>
  ...
</DataSetBankList>

The grants are in another XML stanza:

<GrantList CompleteYN="Y">
  <Grant>
    <GrantID>P30 CA077598</GrantID>
    <Acronym>CA</Acronym>
    <Agency>NCI NIH HHS</Agency>
    <Country>United States</Country>
  </Grant>
  <Grant>
    <GrantID>P30 ES013508</GrantID>
    <Acronym>ES</Acronym>
    <Agency>NIEHS NIH HHS</Agency>
    <Country>United States</Country>
  </Grant>
  <Grant>
    <GrantID>U54 DA031659</GrantID>
    <Acronym>DA</Acronym>
    <Agency>NIDA NIH HHS</Agency>
    <Country>United States</Country>
  </Grant>
</GrantList>

The Related Information appears to use the article ID to link out to various services?

Perhaps some of these have APIs that could be queried if they have valuable information.

Hopefully this helps a bit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants