# Setup

Install the necessary library in your Colab notebook environment and connect to your hosted Neo4J Sandbox.

In [4]:
!pip install neo4j

Collecting neo4j
  Using cached https://files.pythonhosted.org/packages/0b/22/9b6d28613e8a564b9e82cf3b871b6df1f58a99cf5ac8a100439f9e895b5f/neo4j-4.3.1.tar.gz
Building wheels for collected packages: neo4j
  Building wheel for neo4j (setup.py) ... [?25l[?25hdone
  Created wheel for neo4j: filename=neo4j-4.3.1-cp37-none-any.whl size=99332 sha256=d0dc1b1fd13b33850396e4490d45d32db1290a171b73c2d6d725bcdac1875012
  Stored in directory: /root/.cache/pip/wheels/23/13/72/0cc2405898bd9a7baef6512df3abf83873da9ba48c04acc818
Successfully built neo4j
Installing collected packages: neo4j
Successfully installed neo4j-4.3.1


In [5]:
ip = "54.172.14.140"
bolt_port = "7687"
username = "neo4j"
password = "rifle-sponsor-beliefs"

In [6]:
from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://" + ip + ":" + bolt_port, auth=(username, password))

print(driver.address) # your-sandbox-ip:your-sandbox-bolt-port

54.172.14.140:7687


# Build the co-author graph

We are going to build an inferred graph of co-authors based on *Author*s that have collaborated on the same papers. 

We are also going to store a property on the *CO_AUTHOR* relationship to indicate the year of their first collaboration and another one to indicate the number of their collaborations.

Complete the query below to create the CO_AUTHOR relationships, each one with *year* and *collaborations* properties. Feel free to experiment in the Neo4J Browser first.

Note: we will be using the [apoc.periodic.iterate](https://neo4j.com/labs/apoc/4.1/overview/apoc.periodic/apoc.periodic.iterate/) procedure to commit in batches to the DB.


In [None]:
query = """
CALL apoc.periodic.iterate(

  " 
  /*
   Step 1: Co-author pattern matching
  */

  MATCH ... // Use "a1" and "a2" naming to handle the matched Authors nodes
  WITH a1, a2, paper 
  ORDER BY ... // Ensure we consider the oldest collab between authors to get the year of their first collaboration 
  RETURN a1, a2, ... AS year, ... AS nb_collaborations 
  ",

  "
  /* 
  Step 2: Create CO_AUTHOR relationships and set properties
  */
  
  MERGE ... // Use a "coauthor" naming to handle the CO_AUTHOR relationship
  SET coauthor.collaborations = ...
  ",

  {batchSize: 100})
"""

In [None]:
#@title Hint

query = """
CALL apoc.periodic.iterate(

  "
  MATCH (a1: ... )<-[...]-(paper)-[...]->(a2: ...)
  WITH a1, a2, paper 
  ORDER BY a1, paper.year  
  RETURN a1, a2, collect(paper)[0].year AS year, ... AS nb_collaborations
  ",

  "
  MERGE (a1)-[...]-(a2)
  SET coauthor.collaborations = ... 
  ",

  {batchSize: 100})
"""

In [8]:
#@title Solution

query = """
CALL apoc.periodic.iterate(

  "
  MATCH (a1:Author)<-[:AUTHOR]-(paper)-[:AUTHOR]->(a2:Author)
  WITH a1, a2, paper 
  ORDER BY a1, paper.year
  RETURN a1, a2, collect(paper)[0].year AS year, COUNT(*) AS nb_collaborations
  ",

  "
  MERGE (a1)-[coauthor:CO_AUTHOR {year: year}]-(a2)
  SET coauthor.collaborations = nb_collaborations
  ",

  {batchSize: 100})
"""

In [9]:
with driver.session() as session:
  result = session.run(query)
  for row in result:
    print(row)

<Record batches=3105 total=310448 timeTaken=26 committedOperations=310448 failedOperations=0 failedBatches=0 retries=0 errorMessages={} batch={'total': 3105, 'committed': 3105, 'failed': 0, 'errors': {}} operations={'total': 310448, 'committed': 310448, 'failed': 0, 'errors': {}} wasTerminated=False failedParams={} updateStatistics={'nodesDeleted': 0, 'labelsAdded': 0, 'relationshipsCreated': 155224, 'nodesCreated': 0, 'propertiesSet': 465672, 'relationshipsDeleted': 0, 'labelsRemoved': 0}>


# Queries

- Check the previous query results by using this new "CO_AUTHOR" relationship. 

Previous query: Find the Author with whom "Salvatore Greco" has co-authored the most with.

Hint: use the *collaboration* property.

In [None]:
#@title Solution

MATCH (a:Author {name: "Salvatore Greco"})-[r:CO_AUTHOR]-(c:Author) 
RETURN a.name, c.name, r.collaborations as collabs
ORDER BY collabs DESC
LIMIT 1

- Recommend a new author to collaborate with Salvatore Greco based on the collaborations of his co-authors. 

Hint: find and rank Salvatore's co-authors' own co-authors, with whom Salvatore has not collaborated yet.

In [None]:
#@title Solution

MATCH (a:Author {name:'Salvatore Greco'})-[r1:CO_AUTHOR]-(ca:Author)-[r2:CO_AUTHOR]-(caofca:Author)
WITH a, caofca, r1.collaborations as nb_ca_collabs, r2.collaborations as nb_caofca_collabs
WHERE a <> caofca AND NOT (a)-[:CO_AUTHOR]-(caofca)
RETURN caofca.name, nb_ca_collabs, nb_caofca_collabs 
ORDER BY nb_ca_collabs DESC, nb_caofca_collabs DESC 
LIMIT 1

Now that we have created our co-author graph, we want to come up with an approach that will allow us to predict future links (relationships) that will be created between people.