# Lord Of The Rings PyRaphtory Example Notebook 🧝🏻‍♀️🧙🏻‍♂️💍

## Setup environment and download data 💾

Import all necessary dependencies needed to build a graph from your data in PyRaphtory. Download csv data from github into your tmp folder (file path: /tmp/lotr.csv).

In [None]:
pip install pyvis

In [None]:
from pathlib import Path
from pyraphtory.context import PyRaphtory
from pyraphtory.vertex import Vertex
from pyraphtory.spouts import FileSpout
from pyraphtory.builder import *
from pyvis.network import Network
import csv
import pandas as pd
import numpy as np

!curl -o /tmp/lotr.csv https://raw.githubusercontent.com/Raphtory/Data/main/lotr.csv

## Preview data  👀

Preview the head of the dataset.

In [None]:
!head /tmp/lotr.csv

In [None]:
filename = '/tmp/lotr.csv'

## Create a new Raphtory graph 📊

Turn on logs to see what is going on in PyRaphtory. Initialise Raphtory by creating a PyRaphtory object. Create your new graph.

In [None]:
graph = PyRaphtory.new_graph()

## Ingest the data into a graph 😋

Write a parsing method to parse your csv file and ultimately create a graph.

In [None]:
with open(filename, 'r') as csvfile:
    datareader = csv.reader(csvfile)
    for row in datareader:
        source_node = row[0]
        src_id = graph.assign_id(source_node)
        target_node = row[1]
        tar_id = graph.assign_id(target_node)
        time_stamp = int(row[2])
        graph.add_vertex(time_stamp, src_id, Properties(ImmutableProperty("name", source_node)), Type("Character"))
        graph.add_vertex(time_stamp, tar_id, Properties(ImmutableProperty("name", target_node)), Type("Character"))
        graph.add_edge(time_stamp, src_id, tar_id, Type("Character_Co-occurence"))

## Collect simple metrics 📈

Select certain metrics to show in your output dataframe. Here we have selected vertex name, degree, out degree and in degree. 

In [None]:
from pyraphtory.graph import Row
df = graph \
      .select(lambda vertex: Row(vertex.name(), vertex.degree(), vertex.out_degree(), vertex.in_degree())) \
      .to_df(["name", "degree", "out_degree", "in_degree"])

In [None]:
df

**Clean the dataframe, we have deleted the unused window column.** 🧹

In [None]:
## clean
df.drop(columns=['window'], inplace=True)

### Preview the dataframe  👀

In [None]:
df

**Sort by highest degree, top 10**

In [None]:
df.sort_values(['degree'], ascending=False)[:10]

**Sort by highest in-degree, top 10**

In [None]:
df.sort_values(['in_degree'], ascending=False)[:10]

**Sort by highest out-degree, top 10**

In [None]:
df.sort_values(['out_degree'], ascending=False)[:10]

# Run a PageRank algorithm 📑

Run your selected algorithm on your graph, here we run PageRank. Your algorithms can be obtained from the PyRaphtory object you created at the start. Specify where you write the result of your algorithm to, e.g. the additional column results in your dataframe.

In [None]:
cols = ["prlabel"]

df_pagerank = graph.at(32674) \
                .past() \
                .transform(PyRaphtory.algorithms.generic.centrality.PageRank())\
                .execute(PyRaphtory.algorithms.generic.NodeList(*cols)) \
                .to_df(["name"] + cols)

**Clean your dataframe** 🧹

In [None]:
## clean
df_pagerank.drop(columns=['window'], inplace=True)

In [None]:
df_pagerank

**The top ten most ranked**

In [None]:
df_pagerank.sort_values(['prlabel'], ascending=False)[:10]

## Run a connected components algorithm 

Example running connected components algorithm on the graph.

In [None]:
cols = ["cclabel"]
df_cc = graph.at(32674) \
                .past() \
                .transform(PyRaphtory.algorithms.generic.ConnectedComponents)\
                .execute(PyRaphtory.algorithms.generic.NodeList(*cols)) \
                .to_df(["name"] + cols)

**Clean dataframe.**

In [None]:
## clean
df_cc.drop(columns=['window'], inplace=True)

**Preview dataframe.**

In [None]:
df_cc

### Number of distinct components 

Extract number of distinct components, which is 3 in this dataframe.

In [None]:
len(set(df_cc['cclabel']))

### Size of components 

Calculate the size of the 3 connected components.

In [None]:
df_cc.groupby(['cclabel']).count().reset_index().drop(columns=['timestamp'])

### Run chained algorithms at once 

In this example, we chain PageRank, Connected Components and Degree algorithms, running them one after another on the graph. Specify all the columns in the output dataframe, including an output column for each algorithm in the chain.

In [None]:
cols = ["inDegree", "outDegree", "degree","prlabel","cclabel"]

df_chained = graph.at(32674) \
                .past() \
                .transform(PyRaphtory.algorithms.generic.centrality.PageRank())\
                .transform(PyRaphtory.algorithms.generic.ConnectedComponents)\
                .transform(PyRaphtory.algorithms.generic.centrality.Degree())\
                .execute(PyRaphtory.algorithms.generic.NodeList(*cols)) \
                .to_df(["name"] + cols)

In [None]:
df_chained.drop(columns=['window'], inplace=True)

In [None]:
df_chained['degree_numeric'] = df_chained['degree'].astype(float)

In [None]:
df_chained

### Create visualisation by adding nodes 🔎

In [None]:
def visualise(graph, df_chained):
    # Create network object
    net = Network(notebook=True, height='750px', width='100%', bgcolor='#222222', font_color='white')
    # Set visualisation tool
    net.force_atlas_2based()
    # Get the node list 
    df_node_list = graph.at(32674) \
                .past() \
                .execute(PyRaphtory.algorithms.generic.NodeList()) \
                .to_df(['name'])
    
    nodes = df_node_list['name'].tolist()
    
    node_data = []
    ignore_items = ['timestamp', 'name', 'window']
    for node_name in nodes:
        for i, row in df_chained.iterrows():
            if row['name']==node_name:
                data = ''
                for k,v in row.iteritems():
                    if k not in ignore_items:
                        data = data+str(k)+': '+str(v)+'\n'
                node_data.append(data)
                continue
    # Add the nodes
    net.add_nodes(nodes, title=node_data, value = df_chained.prlabel)
    # Get the edge list
    df_edge_list = graph.at(32674) \
            .past() \
            .execute(PyRaphtory.algorithms.generic.EdgeList()) \
            .to_df(['from', 'to'])
    edges = []
    for i, row in df_edge_list[['from', 'to']].iterrows():
        edges.append([row['from'], row['to']])
    # Add the edges
    net.add_edges(edges)
    # Toggle physics
    net.toggle_physics(True)
    return net

In [None]:
net = visualise(graph, df_chained)

## Show the html file of the visualisation

In [None]:
%%html
net.show('preview.html')

## Shut down PyRaphtory  🛑

In [None]:
PyRaphtory.close_graphs