# cuGraph Intro

*Original Authors: Bradley Rees and James Wyles, updated and edited by Adam Breindel*

### cuGraph is a RAPIDS library focusing on graph analytics and algorithms

API is inspired by the popular NetworkX library for Python (https://networkx.github.io/).

## Zachary Karate Club Data

We will use a small, well-known graph dataset representing a university karate club.

You can read an overview here: https://en.wikipedia.org/wiki/Zachary%27s_karate_club

Or read the original paper, *W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977)* (paywalled, but limited free trial: https://www.jstor.org/stable/3629752)


![Karate Club](images/zgraph.png)

This is a small graph which allows for easy visual inspection to validate results. 

### Read the data

In [None]:
! head data/karate-data.csv

In [None]:
import cudf

karate = cudf.read_csv('data/karate-data.csv', names=['src','dst'], delimiter='\t', dtype={'src':'int32', 'dst':'int32'} )

In [None]:
karate.head()

In [None]:
import cugraph

G = cugraph.Graph()
G.from_cudf_edgelist(karate, source='src', destination='dst')
G.view_edge_list()

## Breadth-First Search (BFS) 

First, we'll compute the Breadth-First Search path from a starting vertex to every other vertex in our training dataset.

As the name implies, BFS traverses the given graph in a breadth first manner. Starting at a specified vertex, the algorithms iteratively searches neighboring vertices. (See https://en.wikipedia.org/wiki/Breadth-first_search)

To compute BFS in cuGraph use: __bfs(G, start_id)__

* __G__: A cugraph.Graph object
* __start_id__ : the starting vertex ID

Returns

* __df__: cudf.DataFrame with three named columns:
    * df["vertex"]: vertex id
    * df["distance"]: distance to the starting vertex
    * df["predecessor"]: id of the vertex that was used to reach this vertex

In [None]:
df = cugraph.bfs(G,1)
df

In [None]:
# define a print path function that take the dataframe and a vertex ID

def print_path(df, id):
    
    # Use the BFS predecessors and distance to trace the path 
    # from vertex id back to the starting vertex ( vertex 1 in this example)
    
    dist = int(df[df.vertex==id]['distance'].values)
    lastVert = id
    
    for i in range(dist):
        nextVert = int(df[df.vertex==lastVert]['predecessor'].values)
        
        d = df[df.vertex==lastVert]['distance'].values
        print("Vertex: " + str(lastVert) + " was reached from vertex " + str(nextVert) + 
              " and distance to start is " + str(d) )
        lastVert = nextVert

In [None]:
print_path(df, 22)

In [None]:
print_path(df, 30)

__Since we can see in the graph illustration above that vertex 17 is at the edge of the graph, let's run BFS with that as the starting vertex__

In [None]:
# Call BFS on the graph starting from vertex 17
df2 = cugraph.bfs(G,17)

In [None]:
# Print the max distance
df2["distance"].max()

In [None]:
# Print path to vertex 30

print_path(df2, 30)

## Single-Source Shortest Path (SSSP)

We can use cuGraph to compute the shortest path from a starting vertex to every other vertex in our training dataset.

Single-source shortest path computes the shortest paths from the given starting vertex to all other reachable vertices.

To compute SSSP for a graph in cuGraph we use:
**cugraph.sssp(G, source)**

Input
* __G__: cugraph.Graph object
* __source__: int, Index of the source vertex

Returns 
* __df__: a cudf.DataFrame object with two columns
    * df['vertex']: vertex identifier for the vertex
    * df['distance']: computed distance from the source vertex to this vertex

In [None]:
# Call cugraph.sssp to get the distances from vertex 0:
shortest_paths_from_node_1 = cugraph.sssp(G, 1)
shortest_paths_from_node_1

__What are the farthest vertices from the source?__

In [None]:
shortest_paths_from_node_1.sort_values('distance', ascending=False)

## Shortest Paths with Unequal Costs

BFS looks a lot like Shortest Paths when all of the edges have weight 1.0

Let's see how this looks if we make edge costs 0.5 to/from node 3

In [None]:
karate.head()

We need to add a weight column

In [None]:
karate['weight'] = 1.0

In [None]:
karate.head()

In [None]:
karate.loc[karate.src == 3, 'weight'] = 0.5
karate.loc[karate.dst == 3, 'weight'] = 0.5

In [None]:
G2 = cugraph.Graph()
G2.from_cudf_edgelist(karate, source='src', destination='dst', edge_attr='weight')
G3 = G2.to_directed()

In [None]:
shortest_paths_with_node_3_shortcut = cugraph.sssp(G3, 1)
shortest_paths_with_node_3_shortcut

For comparison:

In [None]:
shortest_paths_from_node_1.sort_values('distance', ascending=False).head(10)

In [None]:
shortest_paths_with_node_3_shortcut.sort_values('distance', ascending=False).head(10)