# Part B 

In [1]:
import sys
import random
from pprint import pprint as pp
random.seed(42)
sys.version

'3.7.4 (default, Oct 15 2019, 22:29:14) \n[GCC 7.4.0]'

In [8]:
import neo4j
import py2neo
print(neo4j.__version__)
print(py2neo.__version__)

1.7.6
4.3.0


In [12]:
%load_ext cypher
# from https://ipython-cypher.readthedocs.io/en/latest/
# pip install ipython-cypher

The cypher extension is already loaded. To reload it, use:
  %reload_ext cypher


In [3]:
from neo4j import GraphDatabase
from py2neo import Graph

# instantiate drivers
NEO4J_URI="bolt://localhost:7687"
gdb = GraphDatabase.driver(uri=NEO4J_URI, auth=None)
graph = Graph(NEO4J_URI)

The graph has the following structure

### Query 1
H-index, From wikipedia

> The h-index is defined as the maximum value of h such that the given author/journal has published h papers that have each been cited at least h times.[6] The index is designed to improve upon simpler measures such as the total number of citations or publications.[citation needed] The index works properly only for comparing scientists working in the same field; citation conventions differ widely among different fields.[citation needed]
>
> Formally, if f is the function that corresponds to the number of citations for each publication, we compute the h-index as follows. First we order the values of f from the largest to the lowest value. Then, we look for the last position in which f is greater than or equal to the position (we call h this position). For example, if we have a researcher with 5 publications A, B, C, D, and E with 10, 8, 5, 4, and 3 citations, respectively, the h-index is equal to 4 because the 4th publication has 4 citations and the 5th has only 3. In contrast, if the same publications have 25, 8, 5, 3, and 3 citations, then the index is 3 because the fourth paper has only 3 citations.
>
>        f(A)=10, f(B)=8, f(C)=5, f(D)=4, f(E)=3　→ h-index=4
>        f(A)=25, f(B)=8, f(C)=5, f(D)=3, f(E)=3　→ h-index=3
>
> If we have the function f ordered in decreasing order from the largest value to the lowest one, we can compute the h-index as follows:
>
>    $$\text{h-index} (f) = {\displaystyle \max _{i}\min(f(i),i)}$$

- obtain $f$, the number of citations of each article
- order $f$ from largest to lowest
- from the last element in f

The citation count can be obtained by matching a pattern

```cypher
(author:Author)-[:AUTHORS]->(article:Article)-[:CITED_BY]->(article2:Article)
```

and counting the number of articles with alias `article2`.

In [11]:
%%cypher
MATCH (author:Author)-[:AUTHORS]->(article:Article)-[:CITED_BY]->(article2:Article)
WHERE article <> article2
WITH author, count(article) as citations
ORDER BY citations DESC
WITH author, collect(citations) as c_citations
WITH author, [i in range(0,size(c_citations)-1) 
    WHERE i <= c_citations[i] | i][-1] + 1 as h_index
RETURN author.name as author, h_index
ORDER BY h_index DESC
LIMIT 10

10 rows affected.


author,h_index
Salih Zeki Kadioglu,1
Ilker Iskender,1
Gonul Sagiroglu,1
Tugba Cosgun,1
Altug Kosar,1
Altan Kir,1
Taka-aki Nakada,1
Masashi Taniguchi,1
Tetsuya Matsuoka,1
Hasan Oguz Kapicibasi,1


### Query 2

In [14]:
%%cypher
MATCH(a1:Article)-[:CITED_BY]->(a2:Article)-[:PUBLISHED_IN]->(ed:Edition)-[:OF]->(cf:Conference)
WHERE a1 <> a2
WITH a1, cf, count(a1) as citations
ORDER BY citations DESC
WITH cf, collect(a1 {.title, citations}) as c_citations
RETURN cf.name AS conference_name, c_citations[..3] as three_most_cited
LIMIT 5

5 rows affected.


conference_name,three_most_cited
Lance Rose,"[{'citations': 66, 'title': 'Pulmonary carcinosarcoma with heterologous component: report of two cases with literature review.'}, {'citations': 63, 'title': 'A Partition-Based Relaxation For Steiner Trees'}, {'citations': 61, 'title': 'Preexcitation Syndromes.'}]"
Rebecca Lane,"[{'citations': 60, 'title': 'Predation, seed size partitioning and the evolution of body size in seed-eating finches'}, {'citations': 54, 'title': 'Adenocarcinoma of the Prostate Presenting as an Obstructing Rectal Mass'}, {'citations': 50, 'title': 'Misunderstanding and potential unintended misuse of acetaminophen among adolescents and young adults.'}]"
Dr. Gregory Arnold,"[{'citations': 58, 'title': 'Interferon-induced Ifit proteins: their role in viral pathogenesis.'}, {'citations': 55, 'title': 'Cardiac arrest due to airway obstruction in hereditary angioedema.'}, {'citations': 55, 'title': ""Type 1 5'-deiodinase activity is inhibited by oxidative stress and restored by alpha-lipoic acid in HepG2 cells.""}]"
Jennifer Smith,"[{'citations': 56, 'title': 'Development of Work Stress Scale for Correctional Officers'}, {'citations': 44, 'title': 'Topical Antimicrobials and the Open Surgical Wound.'}, {'citations': 42, 'title': 'Determination of phenobarbital in hair matrix by liquid phase microextraction (LPME) and gas chromatography-mass spectrometry (GC-MS).'}]"
Stephanie Hoffman,"[{'citations': 55, 'title': 'Effect of erythrosine- and LED-mediated photodynamic therapy on buccal candidiasis infection of immunosuppressed mice and Candida albicans adherence to buccal epithelial cells.'}, {'citations': 53, 'title': 'Predation, seed size partitioning and the evolution of body size in seed-eating finches'}, {'citations': 53, 'title': 'Determination of phenobarbital in hair matrix by liquid phase microextraction (LPME) and gas chromatography-mass spectrometry (GC-MS).'}]"


### Query 3

In [18]:
%%cypher
MATCH (au:Author)-[:AUTHORS]->(ar:Article)-[e1:PUBLISHED_IN]->
(ed:Edition)-[:OF]->(co:Conference)
WITH au, collect(ed) as editions, co
WHERE size(editions) >= 4
RETURN au.name AS author, co.name AS conference, size(editions) AS n_publications
ORDER BY n_publications DESC
LIMIT 5

5 rows affected.


author,conference,n_publications
Kalyan Veeramachaneni,Alyssa Zimmerman,20
Pramod K. Varshney,Alyssa Zimmerman,20
Lisa Ann Osadciw,Alyssa Zimmerman,20
Ilker Iskender,Rebecca Lane,18
Hasan Oguz Kapicibasi,Rebecca Lane,18


### Query 4

This one needs an argument, the year, and I can't pass arguments to the cypher extension.

In [25]:
from pandas import DataFrame

q4 = """MATCH (article:Article {year: $year})
WITH collect(article) AS current_papers
MATCH (article:Article)-[:PUBLISHED_IN]->(:Volume)-[:OF]->(journal:Journal)
WHERE article.year = ($year - 1) OR article.year = ($year - 2)
MATCH (article)-[:CITED_BY]->(citer:Article)
WHERE citer IN current_papers
WITH count(citer) AS n_cites, article, journal
RETURN journal.name as journal, toFloat(SUM(n_cites)/COUNT(article)) AS impact_factor;
"""

q4_out = graph.run(q4, year=2011).data()
DataFrame(q4_out)

Unnamed: 0,journal,impact_factor
0,International journal of cardiology,1.0
1,International journal of offender therapy and ...,1.0
2,BMC Neurology,5.0
3,Eng. Appl. of AI,2.0
4,2009 33rd Annual IEEE International Computer S...,1.0
5,Cell,1.0
6,Child's Nervous System,3.0
7,Human reproduction,1.0
8,Biotechnology advances,3.0
9,Nature,3.0
