### **Introduction to Arachne**
Arachne is a Python package for graph analysis that is built as an extension to Arkouda, a Python package for analysis on tabular data, akin to NumPy and Pandas. In this notebook we will show examples on how to run each algorithm that has been implemented on different types of graphs: undirected, directed, and property.

In [1]:
import arkouda as ak
import arachne as ar
import networkx as nx
import os

    _         _                   _       
   / \   _ __| | _____  _   _  __| | __ _ 
  / _ \ | '__| |/ / _ \| | | |/ _` |/ _` |
 / ___ \| |  |   < (_) | |_| | (_| | (_| |
/_/   \_\_|  |_|\_\___/ \__,_|\__,_|\__,_|
                                          

Client Version: v2024.03.18


In [2]:
# NOTE: Make sure to change the server name to whatever is applicable in your environment. If running locally, then use only ak.connect().
ak.connect("n21", 5555)

connected to arkouda server tcp://*:5555


### **Graph Generation and Loading**
Graphs can be built from existing data or generated with our suite of random graph generators. The preferred way to load a graph into memory is from Arkouda arrays, however we provide a method to read a graph in from matrix market format or randomly generate some graphs.

In [3]:
# Read in a graph from a matrix market file.
absolute_path_to_karate = os.path.abspath("data/karate.mtx")
karate = ar.read_matrix_market_file(absolute_path_to_karate)

In [4]:
# Generate a random graph using any variety of random generators available.
n = 10_000
rmat_graph = ar.rmat(25, create_using=ar.Graph)
gnp_graph = ar.gnp(n, 0.75, create_using=ar.Graph)
rtree_graph = ar.random_tree(n, create_using=ar.Graph)
ws_graph = ar.watts_strogatz_graph(n, 25, 0.56, create_using=ar.Graph)
graph_list = [rmat_graph, gnp_graph, rtree_graph, ws_graph]
for g in graph_list:
    print(f"Generated graph has {len(g)} vertices and {g.size()} edges")

Generated graph has 17060912 vertices and 523606315 edges
Generated graph has 10000 vertices and 26381136 edges
Generated graph has 10000 vertices and 10000 edges
Generated graph has 10000 vertices and 249439 edges


In [5]:
# Create a property graph from Arkouda dataframes, usually read in from HDF5, Parquet, or CSV files. For demonstrative purposes, we create some random dataset here.
n = 1_000           # Number of vertices.
m = 1_000_000       # Number of edges.
k = 2               # Value to cap the randomness at.

In [6]:
# Create variously different random arrays of different types: integers, unsigned integers, floats, booleans, strings, and categoricals.
src_array = ak.randint(0, n, m, dtype=ak.dtype('int64'), seed=2)
dst_array = ak.randint(0, n, m, dtype=ak.dtype('int64'), seed=4)
int_array = ak.randint(-1, k, m, dtype=ak.dtype('int64'), seed=6)
uint_array = ak.randint(0, k, m, dtype=ak.dtype('uint64'), seed=8)
real_array = ak.randint(0, k, m, dtype=ak.dtype('float64'), seed=10)
bool_array = ak.randint(0, k, m, dtype=ak.dtype('bool'), seed=12)
strings_array = ak.random_strings_uniform(0, k, m, characters="abcdefghijklmonpqrstuvwxyz", seed=14)
categorical_array = ak.Categorical(ak.random_strings_uniform(0, k, m, characters="abcdefghijklmonpqrstuvwxyz", seed=14))

In [7]:
# Initialize an empty graph object.
prop_graph = ar.PropGraph()

In [8]:
# Create a dataframe with the edge data.
test_edge_dict = {
    "src":src_array,
    "dst":dst_array,
    "data1":int_array,
    "data2":uint_array,
    "data3":real_array,
    "data4":bool_array,
    "data5":strings_array,
    "data6":categorical_array
}
test_edge_df = ak.DataFrame(test_edge_dict)

In [9]:
# Load in the edge attributes, with sorts the edges and handles storing their data.
prop_graph.load_edge_attributes(test_edge_df, source_column="src", destination_column="dst", relationship_columns=["data5", "data1"])

In [10]:
# Create sizes for vertex information.
m = len(prop_graph)
k = 2

In [11]:
# Create data of different types for vertices.
int_array = ak.randint(-1, k, m, dtype=ak.dtype('int64'), seed=6)
uint_array = ak.randint(0, k, m, dtype=ak.dtype('uint64'), seed=8)
real_array = ak.randint(0, k, m, dtype=ak.dtype('float64'), seed=10)
bool_array = ak.randint(0, k, m, dtype=ak.dtype('bool'), seed=12)
strings_array = ak.random_strings_uniform(0, k, m, characters="abcdefghijklmonpqrstuvwxyz", seed=14)
categorical_array = ak.Categorical(ak.random_strings_uniform(0, k, m, characters="abcdefghijklmonpqrstuvwxyz", seed=14))

In [12]:
# Create a dataframe with vertex data.
test_node_dict = {
    "nodes":prop_graph.nodes(),
    "data1":int_array,
    "data2":uint_array,
    "data3":real_array,
    "data4":bool_array,
    "data5":strings_array,
    "data6":categorical_array
}
test_node_df = ak.DataFrame(test_node_dict)

In [13]:
# Load in the vertex data.
prop_graph.load_node_attributes(test_node_df, node_column="nodes", label_columns=["data5", "data2"])

### **Graph Processing and Querying**
Treating the graphs as dataframes allows us to exploit Arkouda's array searches to generate subgraphs in seconds.

In [14]:
# Create filters for vertices.
def node_filter(node_attributes):
    return node_attributes["data2"] == 0

In [15]:
# Create filters for edges.
def edge_filter(edge_attributes):
    return edge_attributes["data1"] > -1

In [19]:
# Create different subgraphs from each demo and together.
subgraph_nodes = prop_graph.subgraph_view(filter_node=node_filter)
print(f"Subgraph generated with edge size: {subgraph_nodes.size()}")
subgraph_edges = prop_graph.subgraph_view(filter_edge=edge_filter)
print(f"Subgraph generated with edge size: {subgraph_edges.size()}")
subgraph_together = prop_graph.subgraph_view(filter_node=node_filter, filter_edge=edge_filter)
print(f"Subgraph generated with edge size: {subgraph_together.size()}")

Subgraph generated with edge size: 467134
Subgraph generated with edge size: 420930
Subgraph generated with edge size: 100891


### **Graph Algorithms**
Let's run some graph algorithms and see how we can use the returned information to draw a full picture of our datasets from above.