### Details

This document contains the details of Task 2 for ICS2205. The task will be marked out of 100%, however it is equivalent to 10% of the total mark for this unit. <br> 
While discussions between individual students are considered as healthy, the final deliverable needs to be that produced by you and **not plagiarised** in any way. The **deadline** to submit this task is **12:00pm Monday 28th November 2022**.<br>
You need to compile your answer to the task described below in this same notebook. Then upload it, together with a duely filled plagiarism form, onto the appropriate space on the VLE. Deliverables submitted late will be penalised or may not be accepted.

### Interfacing NetworkX with Neo4j

Neo4j is an important graph platform and is more than a persistant storage for graph data. It provides graph algorithms that are scaleable and production-ready. In this task you will need to combine Neo4j with NetworkX. To do this you need to use the **nxneo4j** Python library.


Install the latest version of nxneo4j as follows:

In [1]:
pip install git+https://github.com/ybaktir/networkx-neo4j

Collecting git+https://github.com/ybaktir/networkx-neo4jNote: you may need to restart the kernel to use updated packages.


  Running command git clone -q https://github.com/ybaktir/networkx-neo4j 'C:\Users\mvass\AppData\Local\Temp\pip-req-build-0hcu16v1'



  Cloning https://github.com/ybaktir/networkx-neo4j to c:\users\mvass\appdata\local\temp\pip-req-build-0hcu16v1
  Resolved https://github.com/ybaktir/networkx-neo4j to commit 97dc9563bf992ea9714cbdb99cb9e6a41c7cce65


In [2]:
pip install networkx-neo4j

Note: you may need to restart the kernel to use updated packages.


#### Connect to Neo4j

In [3]:
from neo4j import GraphDatabase, basic_auth

For this task you can use a [Neo4j blank sandbox](https://neo4j.com/sandbox/). Once the instance has started check the connection details tab to find the **Bolt URL** and the **password**. By default the user name is **neo4j**. Update the code below with the details to connect to Neo4j sandbox. You can also use the Neo4j desktop version.

In [4]:
graph = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j","GoTWi1357!?"))

Access the Neo4j sandbox and inspect the database by openning it with the browser

In [5]:
import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd
%matplotlib notebook
print('NetworkX version: {}'.format(nx.__version__))

NetworkX version: 2.7.1


In [6]:
# Define CSV URLs
csv_urls = [
    "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book1-edges.csv",
    "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book2-edges.csv",
    "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book3-edges.csv",
    "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book45-edges.csv"
]

# Create a directed graph
G = nx.Graph()

#G.add_edges_from([('1', '2'), ('2', '3'), ('2', '4'),('3', '5'), ('4', '5')])

# Load data into NetworkX graph
for book, url in enumerate(csv_urls, start=1):
    df = pd.read_csv(url)
    for _, row in df.iterrows():
        #print(row['Source'])
        #print(row['Target'])
        G.add_node(row['Source'])
        G.add_node(row['Target'])
        G.add_edge(row['Source'], row['Target'], weight=int(row['weight']), book=book)

#### Analyse the Game of Thrones dataset

nxneo4j contains a number of built-in datasets. One of these datasets is build around the popular TV series of Game of Thrones. The dataset is based around that created by [Andrew Beveridge](https://networkofthrones.wordpress.com/) and contains the interactions between the characters of the popular TV series. The nodes are labelled "Character" while the relationships include "INTERACTS1", "INTERACTS2", "INTERACTS3" and "INTERACTS45" which represent the interactions between the characters across the various books (1 to 5).

Draw the graph using nxneo4j **(5 marks)**

In [7]:
#add code here
nx.draw(G, with_labels=True)

<IPython.core.display.Javascript object>

Find how many nodes the graph contains **(5 marks)**

In [8]:
# Getting number of nodes using the number_of_nodes submethod on the graph G
G.number_of_nodes()

796

Compute PageRank, sort the results and print out the first 5 results **(20 marks)**

In [9]:
# Using nx.pagerank() to get the PageRank for graph G
pr = nx.pagerank(G)

# Sort the PageRank scores
sorted_pagerank = sorted(pr.items(), key=lambda x: x[1], reverse=True)

# Print the top 5 results
print("Top 5 PageRank results:")
for node, score in sorted_pagerank[:5]:
    print(f"Node {node}: {score}")

Top 5 PageRank results:
Node Jon-Snow: 0.028341863510436185
Node Tyrion-Lannister: 0.025287510501955164
Node Cersei-Lannister: 0.02007497006626844
Node Daenerys-Targaryen: 0.018307913097464816
Node Jaime-Lannister: 0.01789478227089134


Compute Betweenness Centrality. Sort the results and print out the first 5 results. **(20 marks)**

In [10]:
# Using nx.betweenness_centrality() to get the Betweenness Centrality for graph G
bc = nx.betweenness_centrality(G)

# Sort the Betweenness Centrality scores
sorted_betweenness = sorted(bc.items(), key=lambda x: x[1], reverse=True)

# Print the top 5 results
print("Top 5 Betweenness Centrality results:")
for node, score in sorted_betweenness[:5]:
    print(f"Node {node}: {score}")

Top 5 Betweenness Centrality results:
Node Jon-Snow: 0.19211961968354493
Node Tyrion-Lannister: 0.16219109611159838
Node Daenerys-Targaryen: 0.11841801916269228
Node Theon-Greyjoy: 0.11128331813470263
Node Stannis-Baratheon: 0.11013955266679575


Now switch to the Neo4j sandbox (or your desktop version) and access the database through the browser. Query directly the database using Cypher to find out the following:

1. Count the number of edges. **(10 marks)**
2. Display the graph based on the relationships of the character with the highest PageRank (from above). **(20 marks)**
3. Degree centrality is simply the number of connections that a node has in the network. In this context the degree centrality of a character is simply the number of other characters that interacted with it. Compute the degree centrality by considering **only** the **INTERACTIONS2** relation. **(20 marks)**

**Add the Cypher queries below:**

Cypher query (1)<br>
MATCH (n) RETURN count(*)

Cypher query (2)

// Find the character with the highest PageRank

MATCH (c:Character)
WITH c ORDER BY c.pagerank DESC LIMIT 1

// Match relationships of the character and retrieve connected nodes

MATCH path = (c)-[r]-(related)
RETURN path

Cypher query (3)<br>
MATCH (c:Character)-[:INTERACTIONS2]-(other)
WITH c, COUNT(DISTINCT other) AS degreeCentrality
RETURN c.name AS characterName, degreeCentrality
ORDER BY degreeCentrality DESC;

#### References

1. Further information to how to use Neo4j from Python: https://neo4j.com/developer/python/