# Getting data into a graph

Now that we know PyRaphtory is installed and running, let's look at the different ways to get some real data into a graph. 

For this first set of tutorials we are going to be building graphs from a Lord of the Rings 🧝🏻‍♀️🧙🏻‍♂️💍 dataset, looking at when characters interact throughout the trilogy.
 
<p align="center">
 <img src="../images/lotr-graphic.png" width="700px" style="padding: 15px" alt="Intro Graphic of LOTR slices"/>
</p>

As with the quick start install guide, this and all following python pages are built as iPython notebooks. If you want to follow along on your own machine, click the `open on github` link in the top right of this page.

## Let's have a look at the example data

The data is a `csv` file (comma-separated values) and is pulled from our <a href="https://github.com/Raphtory/Data/blob/main/lotr.csv" target="_blank">Github data repository</a>. Each line contains two characters that appeared in the same sentence, along with the sentence number, which we will use as a `timestamp`. The first line of the file is `Gandalf,Elrond,33` which tells
us that Gandalf and Elrond appears together in sentence 33.

### Downloading the csv from Github 💾

The following `curl` command will download the csv file and save it in the `tmp` directory on your computer. This will be deleted when you restart your computer, but it's only a couple of KB in any case.





In [1]:
!curl -o /tmp/lotr.csv https://raw.githubusercontent.com/Raphtory/Data/main/lotr.csv
!head /tmp/lotr.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 52206  100 52206    0     0   161k      0 --:--:-- --:--:-- --:--:--  161k
Gandalf,Elrond,33
Frodo,Bilbo,114
Blanco,Marcho,146
Frodo,Bilbo,205
Thorin,Gandalf,270
Thorin,Bilbo,270
Gandalf,Bilbo,270
Gollum,Bilbo,286
Gollum,Bilbo,306
Gollum,Bilbo,308


## Setting up our imports and Raphtory Context

In [2]:
import os
from pathlib import Path
import csv
import pandas as pd
import numpy as np

os.environ["RAPHTORY_CORE_LOG"] = "ERROR" 

from pyraphtory.context import PyRaphtory
from pyraphtory.input import ImmutableProperty
from pyraphtory.input import Type
from pyraphtory.input import Properties
from pyraphtory.input import GraphBuilder
from pyraphtory.spouts import FileSpout
from pyraphtory.sources import CSVEdgeListSource
from pyraphtory.sources import Source
from pyraphtory.graph import Row

filename = "/tmp/lotr.csv"
ctx = PyRaphtory.local()

def graph_degree(graph):
    return graph \
    .select(lambda vertex: Row(vertex.name(), vertex.out_degree(), vertex.in_degree())) \
    .to_df(["name", "out_degree", "in_degree"])


openjdk version "11.0.15" 2022-04-19 LTS
OpenJDK Runtime Environment Corretto-11.0.15.9.1 (build 11.0.15+9-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.15.9.1 (build 11.0.15+9-LTS, mixed mode)


Java found!
Getting JAVA_HOME
JAVA_HOME found = /Users/bensteer/.sdkman/candidates/java/current/bin/java




## Adding data directly into the Graph

The simplest way to add data into a graph is to directly call the `add_vertex` and `add_edge` functions, which we saw in the quick start guide. These functions, however, have several additional arguments allowing us to add `properties` and `types` to both vertices and edges. 



In [3]:
graph = ctx.new_graph()
with open(filename, 'r') as csvfile:
    datareader = csv.reader(csvfile)
    for row in datareader:
        source_node = row[0]
        src_id = graph.assign_id(source_node)
        target_node = row[1]
        tar_id = graph.assign_id(target_node)
        time_stamp = int(row[2])
        graph.add_vertex(time_stamp, src_id, Properties(ImmutableProperty("name", source_node)), Type("Character"))
        graph.add_vertex(time_stamp, tar_id, Properties(ImmutableProperty("name", target_node)), Type("Character"))
        graph.add_edge(time_stamp, src_id, tar_id, Type("Character_Co-occurence"))

### Lets check the data has ingested

In [4]:
df = graph_degree(graph)
df

Unnamed: 0,timestamp,name,out_degree,in_degree
0,32674,Hirgon,2,0
1,32674,Hador,1,2
2,32674,Horn,1,3
3,32674,Galadriel,6,16
4,32674,Isildur,18,0
...,...,...,...,...
134,32674,Faramir,3,29
135,32674,Bain,1,1
136,32674,Walda,3,10
137,32674,Thranduil,0,2


## Ingesting data with in-built Sources

As this data is in an `edge list` format, another way to convert it into a graph is to use our `CSVEdgeListSource`. `Sources` let Raphtory know where to pull the data from and how to convert them into `graph updates`. This particular source will parse each line as two `vertex additions` and an `edge addition` at the given timestamp. 

Notably for this `Source` if the vertex ids are `Strings` instead of `Integers` it will turn the strings into a number and allocate the orginal value as a property on each vertex called `name`. 

In [5]:
graph2 = ctx.new_graph()
spout = FileSpout("/tmp/lotr.csv")
source = CSVEdgeListSource(spout)

graph2.load(source)

com.raphtory.api.analysis.graphview.PyDeployedTemporalGraph@40b57c2d

### Lets compare the graphs

In [6]:
df2 = graph_degree(graph2)
different_results = pd.DataFrame.compare(df,df2)
len(different_results)

0

## Creating our own custom Source

In [7]:
def parse(graph, tuple: str):
    parts = [v.strip() for v in tuple.split(",")]
    source_node = parts[0]
    src_id = graph.assign_id(source_node)
    target_node = parts[1]
    tar_id = graph.assign_id(target_node)
    time_stamp = int(parts[2])

    graph.add_vertex(time_stamp, src_id, Properties(ImmutableProperty("name", source_node)), Type("Character"))
    graph.add_vertex(time_stamp, tar_id, Properties(ImmutableProperty("name", target_node)), Type("Character"))
    graph.add_edge(time_stamp, src_id, tar_id, Type("Character_Co-occurence"))

graph3 = ctx.new_graph()
spout = FileSpout("/tmp/lotr.csv")
source = Source(spout,GraphBuilder(parse))

graph3.load(source)

graph_degree(graph3)

Unnamed: 0,timestamp,name,out_degree,in_degree
0,29084,Hirgon,2,0
1,29084,Hador,1,2
2,29084,Horn,1,3
3,29084,Galadriel,4,7
4,29084,Isildur,18,0
...,...,...,...,...
119,29084,Odo,1,0
120,29084,Faramir,2,23
121,29084,Bain,1,1
122,29084,Thranduil,0,2


## Closing down the context

In [8]:
ctx.close()