# Symmetrize

In this notebook, we will use the _symmetrize_ function to create bi-directional edges in an undirected graph

Notebook Credits
* Original Authors: Bradley Rees and James Wyles
* Created:   08/13/2019
* Updated:   10/28/2019

RAPIDS Versions: 0.10.0    

Test Hardware

* GV100 32G, CUDA 10.0


## Introduction
In many cases, an Undirected graph is saved as a single edge between vertex pairs.  That saves a lot of space in the data file.  However, in order to process that data in cuGraph, there needs to be an edge in each direction for undirected.  Converting from a single edge to two edges, one in each direction, is called symmetrization.  

To symmerize an edge list (COO data) use:<br>

**cugraph.symmetrize(source, destination, value)**
* __source__: cudf.Series
* __destination__: cudf.Series
* __value__: cudf.Series


Returns:
* __triplet__: three variables are returned:
    * __source__: cudf.Series
    * __destination__: cudf.Series
    * __value__: cudf.Series


### Test Data
We will be using an undirected unsymmetrized version of the Zachary Karate club dataset.  The result of symmetrization shopuld be a dataset equal to to the version used in the PageRank notebook.

*W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of
Anthropological Research 33, 452-473 (1977).*


![Karate Club](../img/zachary_black_lines.png)


In [1]:
# Import needed libraries
import cugraph
import cudf

In [2]:
# Read the unsymmetrized data  
unsym_data ='../data/karate_undirected.csv'
gdf = cudf.read_csv(unsym_data, names=["src", "dst"], delimiter='\t', dtype=["int32", "int32"] )

In [3]:
# load the full symmetrized dataset for comparison
datafile='../data/karate-data.csv'
test_gdf = cudf.read_csv(datafile, names=["src", "dst"], delimiter='\t', dtype=["int32", "int32"] )

In [4]:
print("Unsymmetrized Graph")
print("\tNumber of Vertices: " + str(len(gdf)))
print("Baseline Graph")
print("\tNumber of Vertices: " + str(len(test_gdf)))

Unsymmetrized Graph
	Number of Vertices: 78
Symmetrized Graph
	Number of Vertices: 156


_Since the unsymmetrized graph only has one edge between vertices, that underlying code treats that as a directed graph_

In [5]:
G = cugraph.Graph()
G.add_edge_list(gdf["src"], gdf["dst"])
gdf_page = cugraph.pagerank(G)

In [6]:
# best PR score is
m = gdf_page['pagerank'].max()
df = gdf_page.query('pagerank == @m')
df

Unnamed: 0,vertex,pagerank
34,34,0.255203


### Now Symmetrize the dataset

In [8]:
df = cugraph.symmetrize_df(gdf, 'src', 'dst')

In [9]:
print("Unsymmetrized Graph")
print("\tNumber of Vertices: " + str(len(gdf)))
print("Symmetrized Graph")
print("\tNumber of Vertices: " + str(len(df)))
print("Baseline Graph")
print("\tNumber of Vertices: " + str(len(test_gdf)))

Unsymmetrized Graph
	Number of Vertices: 78
Symmetrized Graph
	Number of Vertices: 156
Baseline Graph
	Number of Vertices: 156
