# Part B 

In [1]:
import sys
import random
from pprint import pprint as pp
random.seed(42)
sys.version

'3.7.4 (default, Oct 15 2019, 22:29:14) \n[GCC 7.4.0]'

In [2]:
import neo4j
import py2neo
print(neo4j.__version__)
print(py2neo.__version__)

1.7.6
4.3.0


In [4]:
from neo4j import GraphDatabase
from py2neo import Graph

# instantiate drivers
NEO4J_URI="bolt://localhost:7687"
gdb = GraphDatabase.driver(uri=NEO4J_URI, auth=None)
graph = Graph(NEO4J_URI)

The graph has the following structure

![graph](./schemas/dblp_slim_after/graph.png)

## H-index

From wikipedia

> The h-index is defined as the maximum value of h such that the given author/journal has published h papers that have each been cited at least h times.[6] The index is designed to improve upon simpler measures such as the total number of citations or publications.[citation needed] The index works properly only for comparing scientists working in the same field; citation conventions differ widely among different fields.[citation needed]
>
> Formally, if f is the function that corresponds to the number of citations for each publication, we compute the h-index as follows. First we order the values of f from the largest to the lowest value. Then, we look for the last position in which f is greater than or equal to the position (we call h this position). For example, if we have a researcher with 5 publications A, B, C, D, and E with 10, 8, 5, 4, and 3 citations, respectively, the h-index is equal to 4 because the 4th publication has 4 citations and the 5th has only 3. In contrast, if the same publications have 25, 8, 5, 3, and 3 citations, then the index is 3 because the fourth paper has only 3 citations.
>
>        f(A)=10, f(B)=8, f(C)=5, f(D)=4, f(E)=3　→ h-index=4
>        f(A)=25, f(B)=8, f(C)=5, f(D)=3, f(E)=3　→ h-index=3
>
> If we have the function f ordered in decreasing order from the largest value to the lowest one, we can compute the h-index as follows:
>
>    $$\text{h-index} (f) = {\displaystyle \max _{i}\min(f(i),i)}$$

- obtain $f$, the number of citations of each article
- order $f$ from largest to lowest
- from the last element in f

The citation count can be obtained by matching a pattern

```cypher
(a2:Article)<-[:CITED_BY]-(a1:Article)-[:AUTHORED_BY]->(p:Author)
```

and counting the number of articles with alias `a2`.

```
Primera query:
MATCH (a:article)-[hs:has_citation]->(c:cite),
(a)-[:authored_by]->(auth:author)
WITH a,auth, count(a) as citations
ORDER BY citations DESC
WITH auth,collect(citations) as c_citations
WITH auth,[i in range(0,size(c_citations)-1) WHERE i <= c_citations[i] | i][-1]+1 as h_index
RETURN auth.author,h_index
```
```
Segunda query:
MATCH(a:article)-[:published_in]->(j:journal)
WITH a,j,size(()<-[:has_citation]-(a)) as citations
ORDER BY citations DESC
WITH j,collect(a{.title,.author,citations}) as c_citations
RETURN j.journal,c_citations[..3]
```

```cypher
MATCH (a:Article)-[:CITED_BY]->(:Article),
(a)-[:AUTHORED_BY]->(auth:Author)
WITH auth,count(a) as citations
ORDER BY citations DESC
WITH auth, collect(citations) as c_citations
WITH auth, [i in range(0,size(c_citations)-1) WHERE i <= c_citations[i] | i][-1]+1 as h_index
RETURN auth.affiliation_institution_name,h_index
```

In [9]:
q1 = """MATCH (a:Article)-[:CITED_BY]->(:Article),
(auth:Author)-[:AUTHORS]->(a)
WITH auth,count(a) as citations
ORDER BY citations DESC
WITH auth, collect(citations) as c_citations
WITH auth, [i in range(0,size(c_citations)-1) WHERE i <= c_citations[i] | i][-1]+1 as h_index
RETURN auth.name as author_name, h_index"""

graph.run(q1).data()[:5]

[{'author_name': 'Reiner Leidl', 'h_index': 1},
 {'author_name': 'Martin Haupt', 'h_index': 1},
 {'author_name': 'Lutz Frölich', 'h_index': 1},
 {'author_name': 'Hans Förstl', 'h_index': 1},
 {'author_name': 'Thomas Mittendorf', 'h_index': 1}]

### Query 2

In [11]:
q2 = """MATCH(a1:Article)-[:CITED_BY]->(a2:Article),
(a)-[:PUBLISHED_IN]->(ed:Edition)<-[:IN]-(cf:Conference)
WHERE a1<>a2
WITH a1, cf, count(a1) as citations
ORDER BY citations DESC
WITH cf, collect(a1 {.title,citations}) as c_citations
RETURN cf.name AS conference_name, c_citations[..3] as three_most_cited"""

_out_q2 = graph.run(q2).data()
_out_q2[:4]

[{'conference_name': 'Christopher Smith',
  'three_most_cited': [{'title': 'Linear Hypopigmentation After Triamcinolone Injection: A Rare Complication of a Common Procedure',
    'citations': 4464},
   {'title': "Quality of Life as an outcome in Alzheimer's disease and other dementias- obstacles and goals",
    'citations': 4464},
   {'title': 'Acetic acid as a sclerosing agent for renal cysts: Comparison with ethanol in follow-up results',
    'citations': 4371}]},
 {'conference_name': 'Patrick Green',
  'three_most_cited': [{'title': 'Linear Hypopigmentation After Triamcinolone Injection: A Rare Complication of a Common Procedure',
    'citations': 4320},
   {'title': "Quality of Life as an outcome in Alzheimer's disease and other dementias- obstacles and goals",
    'citations': 4320},
   {'title': 'Acetic acid as a sclerosing agent for renal cysts: Comparison with ethanol in follow-up results',
    'citations': 4230}]},
 {'conference_name': 'Betty Jones',
  'three_most_cited': [{'t

### Query 3