# GraphletLift tutorial
In this notebook, we take you through an example use case of our GraphletLift algorithm.

We will walk you through: opening a graph file, running a graphlet estimate, and interpreting the results.

In [9]:
import networkx as nx
import lift as lt

SystemError: Parent module '' not loaded, cannot perform relative import

Suppose we have an edge list file in our graph directory. This file is formatted so that each line contains an edge, specified by a "starting_edge_id" "terminating_edge_id". For example,

```
2 1
2 3
2 4
```

Represents a graph with edges between node 2 and nodes 1,3,4. Since our method is for undirected graphs, this would represent the 3-star graph.

The graphs that come with our repo can be loaded by name. Here we load the graph "as-caida" and get an estimate of the count of the 3-node graphlets. The num_steps parameter controls the number of samples to use.

In [2]:
lift = lt.Lift("as-caida", k=3)
lift.get_graphlet_count(num_steps=5000)

{'2-star': 14450372, '3-cycle': 40749}

Compare this with the true counts for the "as-caida" data set (obtained via PGD):

```
2-star: 14797176
3-cycle: 36365
```

By the way, the names correspond to the following graphlets (we provide names for graphlets up to k=4).

![title](graphlets.png)

From top left to bottom right: 2-path, 2-star, 3-cycle, 3-star, 4-path, 4-tailedtriangle, 4-cycle, 4-chordcycle, 4-clique.

We can also feed GraphletLift networkx graphs. Here we open "as-caida" from its .edgelist file using networkx and feed it to GraphletLift.

In [3]:
graph = nx.read_edgelist('Graphs/as-caida.edgelist', create_using=nx.Graph())
lift = lt.Lift(graph, k=4)
lift.get_graphlet_count(num_steps=8000)

{'3-star': 7610211465,
 '4-chordcycle': 1281472,
 '4-clique': 13536,
 '4-cycle': 18402,
 '4-path': 301312379,
 '4-tailedtriangle': 64512379}

Compare this to the true counts:
```
'3-star': 7788726198,
'4-chordcycle': 1719022,
'4-clique': 53875,
'4-cycle': 406702,
'4-path': 284784486,
'4-tailedtriangle': 47227249
```

This should be enough to get you using GraphletLift. To get better estimates, increase the num_steps parameter. This will need to be balanced with the computation time, which is linear in num_steps. 

The code is not parallelized, but in principle is embarassingly parallel. We believe that someone with some parallelization experience could easily speed this algorithm to multiple cores using, for instance, the multiprocessing package. 