# Download citation data from NIH-OCC
NIH-OCC: Nation Institute of Health's Open Citation Collection https://icite.od.nih.gov/

## 1) Load Python interface to NIH-OCC's API
One NIH entry per PubMed ID (PMID) will be downloaded to the directory, `./icite`.

It is not advisable to add these files to a repo or revision manage them because the number of files will multitudinous.
So ensure `./icite` is listed in the `.gitignore` file.

NOTE: The Python API always downloads data from the NIH-OCC, even if it has been requested before. The Python downloader, `NIHiCiteDownloader`, will load NIH-OCC data from a file if it exists, otherwise will download the data using the API.

In [1]:
import sys
from pmidcite.icite.api import NIHiCiteAPI

api = NIHiCiteAPI('./icite', prt=None)

## 2) Load the NIH Downloader
The NIH downloader will use the API to download data from NIH if it is not stored locally or if the user has requested to always download and over-write the older citation file, allowing new citations to be seen.

The NIH downloader will read already downloaded NIH-OCC data if it is available. This makes it possible to work offline using previously downloaded citation data.

In [2]:
from pmidcite.icite.pmid_dnlder import NIHiCiteDownloader

force_download = False
dnldr = NIHiCiteDownloader(force_download, api)

## 3) Download NIH-OCC data for one PMID

The first paper, `TOP`, is the requested paper. It is followed by a list of citations (`CIT`), then references (`REF`).

Citations are stored in two data members, `cited_by` and `cited_by_clin`. In this example, there are no clinical papers which cited the chosen paper. But we show how union can be used to merge the two sets.

In [3]:
pmid = 22882545
pmids = [pmid]
pmid2paper = dnldr.get_pmid2paper(pmids)

paper = pmid2paper[pmid]

# set of NIHiCiteEntry
all_cites = paper.cited_by.union(paper.cited_by_clin)

## 4) Default sort of NIHiCiteEntry objects is by PMIDs

In [4]:
for nih_entry in sorted(all_cites):
    print(nih_entry)

24383934 R. .AM..  61 2 2014    21  0  51 au[17](Marie Louis) Habitat-driven population structure of bottlenose dolphins, Tursiops truncatus, in the North-East Atlantic.
25052415 R. .AM..  66 2 2015    21  0  37 au[09](A E Moura) Phylogenomics of the killer whale indicates ecotype divergence in sympatry.
25244680 R. .A...  60 2 2014    22  0  58 au[10](Andre E Moura) Population genomics of the killer whale indicates ecotype evolution in sympatry involving both selection and drift.
25297864 R. .A...  59 2 2014    16  0  39 au[10](Marie Louis) Ecological opportunities and specializations shaped genetic divergence in a highly mobile marine top predator.
25738698 R. .A...  30 2 2015     5  0   5 au[06](Marta Söffker) The impact of predation by marine mammals on patagonian toothfish longline fisheries.
25883362 .. .A...  84 2 2015    36  0  85 au[02](Neil P Kelley) Vertebrate evolution. Evolutionary innovation and ecology in marine tetrapods from the Triassic to the Anthropocene.
26937049 R

## 5) Sort by NIH percentile
NIH entries that are too new to have been given a NIH percentile are set to 999 in *pmidcite*.    

It is important to highlight new papers.    

The 999 value makes the newest papers appear next to the papers having the highest NIH percentiles so the new papers are highlighted.

In [5]:
for nih_entry in sorted(all_cites, key=lambda o: o.dct['nih_perc'], reverse=True):
    print(nih_entry)

31631360 .. HA... 999 i 2019     1  0  55 au[01](Jenny A Allen) Community through Culture: From Insects to Whales: How Social Learning and Culture Manifest across Diverse Animal Communities.
31120038 R. .A... 999 i 2019     0  0  14 au[08](Maíra Laeta) Osteochondromatosis (multiple cartilaginous exostoses) in an immature killer whale Orcinus orca.
30992478 R. HA... 999 i 2019     2  0  32 au[09](Salvador J Jorgensen) Killer whales redistribute white shark foraging pressure on seals.
31131963 RP .AM.. 999 i 2019     5  0  72 au[35](Andrew D Foote) Killer whale genomes reveal a complex history of recurrent admixture and vicariance.
31230140 R. .A... 999 i 2019     0  0  36 au[12](Charlotte Curé) Evidence for discrimination between feeding sounds of familiar fish and unfamiliar mammal-eating killer whale ecotypes by long-finned pilot whales.
31215081 R. .A... 999 i 2019     0  0  16 au[04](Paula Sánchez-Hernández) Social interaction analysis in captive orcas (Orcinus orca).
25883362 .. .A

## 6) Sort by year first, then citation count

In [6]:
nih_cites = sorted(all_cites, key=lambda o: [o.dct['year'], o.dct['total_cites']], reverse=True)
for nih_entry in nih_cites:
    print(nih_entry)

31131963 RP .AM.. 999 i 2019     5  0  72 au[35](Andrew D Foote) Killer whale genomes reveal a complex history of recurrent admixture and vicariance.
30992478 R. HA... 999 i 2019     2  0  32 au[09](Salvador J Jorgensen) Killer whales redistribute white shark foraging pressure on seals.
31631360 .. HA... 999 i 2019     1  0  55 au[01](Jenny A Allen) Community through Culture: From Insects to Whales: How Social Learning and Culture Manifest across Diverse Animal Communities.
31120038 R. .A... 999 i 2019     0  0  14 au[08](Maíra Laeta) Osteochondromatosis (multiple cartilaginous exostoses) in an immature killer whale Orcinus orca.
31230140 R. .A... 999 i 2019     0  0  36 au[12](Charlotte Curé) Evidence for discrimination between feeding sounds of familiar fish and unfamiliar mammal-eating killer whale ecotypes by long-finned pilot whales.
31215081 R. .A... 999 i 2019     0  0  16 au[04](Paula Sánchez-Hernández) Social interaction analysis in captive orcas (Orcinus orca).
30051821 RP .A

## 7) Print the keys which can be used for sorting
Pick out one NIH entry (NIHiCiteEntry object) and print available keys

In [7]:
nih_entry = next(iter(nih_cites))
print('\n{N} key-value pairs in an NIH entry:\n'.format(N=len(nih_entry.dct)))
for key, value in nih_entry.dct.items():
    print("{KEY:>27} {VAL}".format(KEY=key, VAL=value))


30 key-value pairs in an NIH entry:

                       pmid 31131963
                       year 2019
                      title Killer whale genomes reveal a complex history of recurrent admixture and vicariance.
                    authors ['Andrew D Foote', 'Michael D Martin', 'Marie Louis', 'George Pacheco', 'Kelly M Robertson', 'Mikkel-Holger S Sinding', 'Ana R Amaral', 'Robin W Baird', 'Charles Scott Baker', 'Lisa Ballance', 'Jay Barlow', 'Andrew Brownlow', 'Tim Collins', 'Rochelle Constantine', 'Willy Dabin', 'Luciano Dalla Rosa', 'Nicholas J Davison', 'John W Durban', 'Ruth Esteban', 'Steven H Ferguson', 'Tim Gerrodette', 'Christophe Guinet', 'M Bradley Hanson', 'Wayne Hoggard', 'Cory J D Matthews', 'Filipa I P Samarra', 'Renaud de Stephanis', 'Sara B Tavares', 'Paul Tixier', 'John A Totterdell', 'Paul Wade', 'Laurent Excoffier', 'M Thomas P Gilbert', 'Jochen B W Wolf', 'Phillip A Morin']
                    journal Mol. Ecol.
        is_research_article True
    relativ

Copyright (C) 2019-present, DV Klopfenstein. All rights reserved.