# Download citation data from NIH-OCC
NIH-OCC: Nation Institute of Health's Open Citation Collection https://icite.od.nih.gov/

## 1) Load the NIH Downloader
Download the details about a paper's citations by setting the `NIHiCiteDownloader` argument `details_cites_refs` to `"citations"`.

In [1]:
from pmidcite.icite.downloader import get_downloader

dnldr = get_downloader(details_cites_refs="citations")

Values for the `NIHiCiteDownloader` argument `details_cites_refs` include:
* `"citations"`
* `"references"`
* `"all"` (downloads details for both citations and references)

## 2) Download NIH-OCC data for one PMID

The first paper, `TOP`, is the requested paper. It is followed by a list of citations (`CIT`), then references (`REF`).

Citations are stored in two data members, `cited_by` and `cited_by_clin`. In this example, there are no clinical papers which cited the chosen paper. But we show how union can be used to merge the two sets.

In [2]:
pmid = 22882545
pmids = [pmid]
pmid2paper = dnldr.get_pmid2paper(pmids)

paper = pmid2paper[pmid]

# set of NIHiCiteEntry
all_cites = paper.cited_by.union(paper.cited_by_clin)

## 3) Default sort of NIHiCiteEntry objects is by PMIDs

In [3]:
for nih_entry in sorted(all_cites):
    print(nih_entry)

24383934 R. .AM..  58 2 2014    24  0  52 au[17](Marie Louis) Habitat-driven population structure of bottlenose dolphins, Tursiops truncatus, in the North-East Atlantic.
25052415 R. .AM..  63 2 2015    25  0  37 au[09](A E Moura) Phylogenomics of the killer whale indicates ecotype divergence in sympatry.
25244680 R. .A...  57 2 2014    25  0  58 au[10](Andre E Moura) Population genomics of the killer whale indicates ecotype evolution in sympatry involving both selection and drift.
25297864 R. .A...  52 2 2014    18  0  39 au[10](Marie Louis) Ecological opportunities and specializations shaped genetic divergence in a highly mobile marine top predator.
25738698 R. .A...  27 2 2015     6  0   5 au[06](Marta Söffker) The impact of predation by marine mammals on patagonian toothfish longline fisheries.
25883362 .. .A...  84 3 2015    47  0  85 au[02](Neil P Kelley) Vertebrate evolution. Evolutionary innovation and ecology in marine tetrapods from the Triassic to the Anthropocene.
26937049 R

## 4) Sort by NIH percentile
NIH entries that are too new to have been given a NIH percentile are set to 999 in *pmidcite*.    

It is important to highlight new papers.    

The 999 value makes the newest papers appear next to the papers having the highest NIH percentiles so the new papers are highlighted.

In [4]:
for nih_entry in sorted(all_cites, key=lambda o: o.dct['nih_perc'], reverse=True):
    print(nih_entry)

33798257 R. HA...  -1 i 2021     1  0  21 au[03](Cory J D Matthews) Amino acid δ15N differences consistent with killer whale ecotypes in the Arctic and Northwest Atlantic.
25883362 .. .A...  84 3 2015    47  0  85 au[02](Neil P Kelley) Vertebrate evolution. Evolutionary innovation and ecology in marine tetrapods from the Triassic to the Anthropocene.
26937049 R. .A...  69 2 2015    20  0  34 au[04](Todd R Robeck) Comparisons of life-history parameters between free-ranging and captive killer whale (<i>Orcinus orca</i>) populations for application toward species management.
31131963 RP .AM..  64 2 2019    12  0  72 au[35](Andrew D Foote) Killer whale genomes reveal a complex history of recurrent admixture and vicariance.
25052415 R. .AM..  63 2 2015    25  0  37 au[09](A E Moura) Phylogenomics of the killer whale indicates ecotype divergence in sympatry.
27039511 R. .A...  61 2 2016    11  0  25 au[07](Saana Isojunno) Sperm whales reduce foraging effort during exposure to 1-2 kHz sonar a

## 5) Sort by NIH group, then by year
This places the newest papers (NIH group `i`) first, followed by papers that perform well (NIH groups `2` and above). The lowest performing papers (NIH groups `0` and `1`) are last.

In [5]:
nih_cites = sorted(all_cites, key=lambda o: [o.dct['nih_group'], o.dct['year']], reverse=True)
for nih_entry in nih_cites:
    print(nih_entry)

33798257 R. HA...  -1 i 2021     1  0  21 au[03](Cory J D Matthews) Amino acid δ15N differences consistent with killer whale ecotypes in the Arctic and Northwest Atlantic.
25883362 .. .A...  84 3 2015    47  0  85 au[02](Neil P Kelley) Vertebrate evolution. Evolutionary innovation and ecology in marine tetrapods from the Triassic to the Anthropocene.
31120038 RP .A...  42 2 2019     3  0  14 au[08](Maíra Laeta) Osteochondromatosis (multiple cartilaginous exostoses) in an immature killer whale Orcinus orca.
30992478 RP .A...  50 2 2019     5  0  32 au[09](Salvador J Jorgensen) Killer whales redistribute white shark foraging pressure on seals.
31631360 .P HA...  53 2 2019     6  0  55 au[01](Jenny A Allen) Community through Culture: From Insects to Whales: How Social Learning and Culture Manifest across Diverse Animal Communities.
31131963 RP .AM..  64 2 2019    12  0  72 au[35](Andrew D Foote) Killer whale genomes reveal a complex history of recurrent admixture and vicariance.
29876075 

## 6) Print the keys which can be used for sorting
Pick out one NIH entry (NIHiCiteEntry object) and print available keys

In [6]:
nih_entry = next(iter(nih_cites))
print('\n{N} key-value pairs in an NIH entry:\n'.format(N=len(nih_entry.dct)))
for key, value in nih_entry.dct.items():
    print("{KEY:>27} {VAL}".format(KEY=key, VAL=value))


31 key-value pairs in an NIH entry:

                       pmid 33798257
                       year 2021
                      title Amino acid δ15N differences consistent with killer whale ecotypes in the Arctic and Northwest Atlantic.
                    authors ['Cory J D Matthews', 'Jack W Lawson', 'Steven H Ferguson']
                    journal PLoS One
        is_research_article True
    relative_citation_ratio None
             nih_percentile None
                      human 0.2
                     animal 0.8
         molecular_cellular 0.0
                        apt 0.05
                is_clinical False
             citation_count 1
         citations_per_year 1.0
expected_citations_per_year 1.9875929018390903
        field_citation_rate 2.888821105807652
                provisional False
                    x_coord 0.6928203230275509
                    y_coord -0.2
              cited_by_clin []
                   cited_by [33762671]
                 references [28181

## 7) Expand NIH group `3` (well performing papers) to include NIH percentiles 50% or higher

In [7]:
from pmidcite.icite.nih_grouper import NihGrouper

grpr = NihGrouper(group3_min=50.0)

dnldr = get_downloader(details_cites_refs="citations", nih_grouper=grpr)
paper = dnldr.get_paper(22882545)

for nihentry in sorted(paper.cited_by, key=lambda o: [o.dct['nih_group'], o.dct['year']], reverse=True):
    print(nihentry)

33798257 R. HA...  -1 i 2021     1  0  21 au[03](Cory J D Matthews) Amino acid δ15N differences consistent with killer whale ecotypes in the Arctic and Northwest Atlantic.
31631360 .P HA...  53 3 2019     6  0  55 au[01](Jenny A Allen) Community through Culture: From Insects to Whales: How Social Learning and Culture Manifest across Diverse Animal Communities.
31131963 RP .AM..  64 3 2019    12  0  72 au[35](Andrew D Foote) Killer whale genomes reveal a complex history of recurrent admixture and vicariance.
30992478 RP .A...  50 3 2019     5  0  32 au[09](Salvador J Jorgensen) Killer whales redistribute white shark foraging pressure on seals.
29876075 R. .A...  56 3 2018    10  0  30 au[02](Mauricio Cantor) Simple foraging rules in competitive environments can generate socially structured populations.
28371192 R. .A...  58 3 2017    15  0 121 au[03](Katherine L Moon) Reconsidering connectivity in the sub-Antarctic.
27039511 R. .A...  61 3 2016    11  0  25 au[07](Saana Isojunno) Sperm 

Copyright (C) 2019-present, DV Klopfenstein. All rights reserved.