# Abstract

**Author:** [Charles Tapley Hoyt](https://github.com/cthoyt)

This notebook outlines a simple way to explore the citations, authors, and provenance information in a graph and its subgraphs.

### Notebook Setup

In [1]:
import itertools as itt
import os
import time
from collections import defaultdict, Counter
from operator import itemgetter

import pybel
from pybel.constants import *

import pybel_tools as pbt

### Notebook Provenance

The time of execution and the versions of the software packegs used are displayed explicitly.

In [2]:
time.asctime()

'Tue Mar 21 23:25:13 2017'

In [3]:
pybel.__version__

'0.4.4-dev'

In [4]:
pbt.__version__

'0.1.4-dev'

### Local Path Definitions

To make this notebook interoperable across many machines, locations to the repositories that contain the data used in this notebook are referenced from the environment, set in `~/.bashrc` to point to the place where the repositories have been cloned. Assuming the repositories have been `git clone`'d into the `~/dev` folder, the entries in `~/.bashrc` should look like:

```bash
...
export BMS_BASE=~/dev/bms
...
```

#### BMS 

The biological model store (BMS) is the internal Fraunhofer SCAI repository for keeping BEL models under version control. It can be downloaded from https://tor-2.scai.fraunhofer.de/gf/project/bms/

In [5]:
bms_base = os.environ['BMS_BASE']

## Loading

The graph is loaded from a precompiled gpickle.

In [6]:
pickle_path = os.path.join(bms_base, 'aetionomy', 'alzheimers.gpickle')

In [7]:
graph = pybel.from_pickle(pickle_path)

pbt.summary.print_summary(graph)

Name: Alzheimer's Disease Model
Number of nodes: 11420
Number of edges: 64013
Network density: 0.0004908784925238285
Number weakly connected components: 65
Average in-degree: 5.605341506129597
Average out-degree: 5.605341506129597


## Publication Summary

In [8]:
pmid_counter = pbt.summary.count_pmids(graph)

The total number of PubMed references:

In [9]:
len(pmid_counter)

8318

The top 35 most informative papers, in terms of edges contributed:

In [10]:
for pmid, count in pmid_counter.most_common(35):
    print('https://www.ncbi.nlm.nih.gov/pubmed/{}\t{}'.format(pmid, count))

https://www.ncbi.nlm.nih.gov/pubmed/20044591	2535
https://www.ncbi.nlm.nih.gov/pubmed/20938992	1980
https://www.ncbi.nlm.nih.gov/pubmed/19244175	1835
https://www.ncbi.nlm.nih.gov/pubmed/19619570	839
https://www.ncbi.nlm.nih.gov/pubmed/19549813	799
https://www.ncbi.nlm.nih.gov/pubmed/18951874	795
https://www.ncbi.nlm.nih.gov/pubmed/14699072	751
https://www.ncbi.nlm.nih.gov/pubmed/20106945	658
https://www.ncbi.nlm.nih.gov/pubmed/20436886	617
https://www.ncbi.nlm.nih.gov/pubmed/19167446	583
https://www.ncbi.nlm.nih.gov/pubmed/17404688	563
https://www.ncbi.nlm.nih.gov/pubmed/20660070	523
https://www.ncbi.nlm.nih.gov/pubmed/19484750	461
https://www.ncbi.nlm.nih.gov/pubmed/19059307	434
https://www.ncbi.nlm.nih.gov/pubmed/21185374	365
https://www.ncbi.nlm.nih.gov/pubmed/20847424	307
https://www.ncbi.nlm.nih.gov/pubmed/23019147	296
https://www.ncbi.nlm.nih.gov/pubmed/21179406	270
https://www.ncbi.nlm.nih.gov/pubmed/22574217	268
https://www.ncbi.nlm.nih.gov/pubmed/22496686	265
https://www.ncbi.

## Citation Enrichment

In [11]:
pbt.mutation.parse_authors(graph)

In [12]:
%%time
pbt.mutation.fix_pubmed_citations(graph, stringify_authors=False)



CPU times: user 2.65 s, sys: 185 ms, total: 2.83 s
Wall time: 2min 17s


In [13]:
author_publication_counter = pbt.summary.count_author_publications(graph)

The total number of authors:

In [14]:
len(author_publication_counter)

34596

The top 35 authors, in terms of number of publication contributions:

In [15]:
author_publication_counter.most_common(35)

[('Zhang Y', 59),
 ('Wang Y', 55),
 ('Wang X', 49),
 ('Liu Y', 43),
 ('Zhang J', 41),
 ('Li Y', 41),
 ('Liu J', 37),
 ('Chen J', 35),
 ('Li X', 34),
 ('Zhang H', 33),
 ('Chen Y', 32),
 ('Wang H', 32),
 ('Liu X', 32),
 ('Wang L', 32),
 ('Zhang L', 32),
 ('Zhang X', 32),
 ('Safe S', 32),
 ('Wang J', 31),
 ('Wang Z', 30),
 ('Li J', 30),
 ('Zhang C', 29),
 ('Li H', 29),
 ('Zhang W', 26),
 ('Kim HJ', 26),
 ('Lee JH', 25),
 ('Liu H', 24),
 ('Kim SH', 24),
 ('Aggarwal BB', 24),
 ('Liu W', 23),
 ('Wang Q', 23),
 ('Xu J', 23),
 ('Takeuchi K', 23),
 ('Zhao Y', 22),
 ('Wang S', 21),
 ('Li L', 20)]

The top 35 authors, in terms of the number of edges contributed:

In [16]:
author_counter = pbt.summary.count_authors(graph)

author_counter.most_common(35)

[('Kleinjans JC', 3300),
 ('van Herwijnen MH', 3195),
 ('Briedé JJ', 2537),
 ('van Delft JM', 2537),
 ('de Kok TM', 2537),
 ('Maas LM', 2537),
 ('Gottschalk RW', 2537),
 ('Pogribny IP', 2039),
 ('Wang X', 2034),
 ('Ross SA', 2025),
 ('Tryndyak VP', 2019),
 ('Han T', 2019),
 ('Beland FA', 2019),
 ('Muskhelishvili L', 1986),
 ('Fuscoe JC', 1986),
 ('Kim J', 1873),
 ('Wen S', 1852),
 ('Vakar-Lopez F', 1848),
 ('Menter DG', 1845),
 ('Lippman SM', 1844),
 ('Tsavachidou D', 1843),
 ('McDonnell TJ', 1843),
 ('Pisters LL', 1843),
 ('Pettaway CA', 1843),
 ('Wood CG', 1843),
 ('Do KA', 1843),
 ('Thall PF', 1843),
 ('Stephens C', 1843),
 ('Efstathiou E', 1843),
 ('Taylor R', 1843),
 ('Troncoso P', 1843),
 ('Logothetis CJ', 1843),
 ('Leitman DC', 1471),
 ('Spink BC', 1220),
 ('Spink DC', 1220)]

## Filtering

The graph is filtered by to a specific subgraph - the Apoptosis signaling subgraph.

In [17]:
target_subgraph = 'Apoptosis signaling subgraph'

In [18]:
subgraph = pbt.selection.get_subgraph_by_annotation(graph, target_subgraph)

pbt.summary.print_summary(subgraph)

Name: Alzheimer's Disease Model - (Subgraph: Apoptosis signaling subgraph)
Number of nodes: 130
Number of edges: 211
Network density: 0.012581991651759094
Number weakly connected components: 10
Average in-degree: 1.623076923076923
Average out-degree: 1.623076923076923


The unique citations for every pair of nodes is calculated. This helps to remove the bias from edges that have many notations and have a cartesian explosion. This process can be repeated with [pbt.summary.count_pmids](http://pybel-tools.readthedocs.io/en/latest/summary.html#pybel_tools.summary.count_pmids).

In [19]:
citations = defaultdict(set)

for u, v, d in subgraph.edges_iter(data=True):
    c = d[CITATION]
    citations[u, v].add((c[CITATION_TYPE], c[CITATION_REFERENCE], c[CITATION_NAME]))
    
counter = Counter(itt.chain.from_iterable(citations.values()))

for (_, pmid, name), v in counter.most_common(35):
    print('https://www.ncbi.nlm.nih.gov/pubmed/{}\t{}\t{}' .format(int(pmid.strip()), v, name))

https://www.ncbi.nlm.nih.gov/pubmed/19499146	27	Acta biochimica et biophysica Sinica
https://www.ncbi.nlm.nih.gov/pubmed/22496686	11	Journal of toxicology
https://www.ncbi.nlm.nih.gov/pubmed/16153637	9	European journal of pharmacology
https://www.ncbi.nlm.nih.gov/pubmed/17869087	7	The Journal of nutritional biochemistry
https://www.ncbi.nlm.nih.gov/pubmed/22122372	7	Journal of neurochemistry
https://www.ncbi.nlm.nih.gov/pubmed/19918364	6	PloS one
https://www.ncbi.nlm.nih.gov/pubmed/11592846	6	Neurobiology of disease
https://www.ncbi.nlm.nih.gov/pubmed/12548636	6	Proteomics
https://www.ncbi.nlm.nih.gov/pubmed/14744432	5	Cell
https://www.ncbi.nlm.nih.gov/pubmed/18997293	4	Journal of Alzheimer's disease : JAD
https://www.ncbi.nlm.nih.gov/pubmed/22236693	4	Journal of negative results in biomedicine
https://www.ncbi.nlm.nih.gov/pubmed/24821282	4	Journal of neurochemistry
https://www.ncbi.nlm.nih.gov/pubmed/17316167	4	Current Alzheimer research
https://www.ncbi.nlm.nih.gov/pubmed/19734902	4	

# Conclusions

The top 5 density papers that contributed to the Apoptosis signaling subgraph were:

1. Acta Biochim Biophys Sin (Shanghai). 2009 Jun;41(6):437-45. ([pmid:19499146](www.ncbi.nlm.nih.gov/pubmed/19499146)) with (27) 
2.  J Toxicol. 2012;2012:187297. Epub 2012 Feb 8. ([pmid:22496686](www.ncbi.nlm.nih.gov/pubmed/22496686)) with (11)
3. Eur J Pharmacol. 2005 Sep 27;520(1-3):1-11 ([pmid:16153637](www.ncbi.nlm.nih.gov/pubmed/16153637)) ( 9) 
4. J Nutr Biochem. 2008 Jul;19(7):459-66. Epub 2007 Sep 14 ([pmid:17869087](www.ncbi.nlm.nih.gov/pubmed/17869087)) with (7) 
5. J Neurochem. 2012 Jan;120 Suppl 1:9-21. doi: 10.1111/j.1471-4159.2011.07519.x. Epub 2011 Nov 28. ([pmid:22122372](www.ncbi.nlm.nih.gov/pubmed/22122372)) with (7) 