<img src="https://user-images.githubusercontent.com/6665739/130641943-fa7fcdb8-a0e7-4aa4-863f-3df61b5de775.png" width="400"/>

# Getting Started

Welcome to this PyRaphtory binder demo.

This notebook will explain and demo how to:
- Run PyRaphtory
- Create a graph
- Add data to your graph
- Run an algorithm 

If you want an extended example, please see the Lord Of The Rings notebook. 

# Running PyRaphtory

Once installed, let's set up the most bare bones PyRaphtory graph, test that we can add some data to it and run our first query. Once this is all working we can move on to some much more exciting examples in the next section! 

To do this we first we need to import `PyRaphtory` alongside some helper classes. The second import here is for the `Row` class we use in the `select` function to get out our query results.  

You will see some references to `Java` in the logs here, this is because under the hood Raphtory is written in `Scala`. You don't have to worry about any of that though as its all hidden away! 

In [7]:
from pyraphtory.context import PyRaphtory

## Creating your first graph

Once Raphtory is installed we can create our first graph! To do this we first need a `context` which we can get from the PyRaphtory object. 

Our two options here are `local` and `remote`. As we are just testing it on our laptops we can use `local`, meaning the Raphtory code will run within your python process. We will dig into `remote` contexts later when you want to deploy in a seperate process or scale your graph past what your laptop can handle.

Once we have our context we can call `new_graph()` to create a graph which we can add data into and run queries on. 

If you check the logs you will see you (the client) and your graph get an auto-generated name. This isn't important now, but will be when you leave your graphs running or want other people to access them.

In [8]:
context = PyRaphtory.local()
graph = context.new_graph()

## Adding data to your Graph

Once a graph is created, we need to add some data to it if we want run anything interesting. There are loads of ways of doing this in Raphtory, which we will cover in the next section, but for simplicity lets just add some vertices and edges without any properties. 

As Raphtory is focused on dynamic and temporal analysis, all events in the graph's history  (adding, updating or deleting nodes/edges) must happen at a given time. This can all be at the same time (if, for example, you are working with snapshots) but we still need a time.

As such, when we add a vertex we have two arguments: the `timestamp` and the `vertex ID`. Simiarly, when adding an edge, we have three arguments: the `timestamp`, the `source vertex` and the `destination vertex`. 

**Note:** All graphs are directed by default in Raphtory, but can be `projected` into an undirected graph - we will go indepth into graph projections later in the tutorial.   

In the following code block we have five updates for our graph, adding three vertices (`1`,`2`,`3`) at time `1` and two edges (`1->2`, `1->3`) at time `2` . 






In [9]:
graph.add_vertex(1, 1)
graph.add_vertex(1, 2)
graph.add_vertex(1, 3)
graph.add_edge(2, 1, 2)
graph.add_edge(2, 1, 3)

## Running your first Query
Now that our data is loaded we can start interrogating it! 

While we can write some very complicated algorithms in Raphtory, lets start off with something simple, getting the `indegree` and `outdegree` of our nodes. 

For this we call `select` on the graph, which takes a function to run on every vertex. This will return a `Table` full of `Rows` which represent the result for each node. From this point we can either write our results to a `Sink` (file, database, etc.), which we will cover later in the tutorial, or convert it into a dataframe for further analysis.

In this example we have called `to_df` to get a dataframe, giving it a list of the variables we want to be included. If you are wondering where these variable names come from, we will be explaining very shortly so bear with us! 

If you have a look in the logs you can see that your query is given a `Job ID` and Raphtory will report how long it took for it to run.

In [10]:
df = graph \
.step(lambda vertex: vertex.set_state("name", vertex.name()))\
.step(lambda vertex: vertex.set_state("out_degree", vertex.out_degree()))\
.step(lambda vertex: vertex.set_state("in_degree", vertex.in_degree())) \
.select("name", "out_degree", "in_degree")\
.to_df()


## Checking out the output
Finally, once our query has run and we have got our dataframe, we can take a look at the results. 

One aspect which is notable here is that we requested three variables, but we have five columns. This is because algorithms in Raphtory run at set points in time, meaning the values for each vertex must be associated with a `timestamp` (in this case the most recent one `2`). 

Sometimes when we are running queries we may also apply a `window`, which filters updates outside a given bound into the past or future. We, therefore, also have a column to specify if one has been applied. As we didn't do any of that here, the `window` column states `None`. 

As with every other cool feature I have hinted at, you will soon be an expert in queries, windowing and much more. All you have to do is continue on to the next page!

In [11]:
df

Unnamed: 0,timestamp,name,out_degree,in_degree
0,2,1,2,0
1,2,2,0,1
2,2,3,0,1
