# PageRank Algorithm with Neptune Analytics

This notebook demonstrates how the PageRank algorithm computation can be offloaded to a remote AWS Neptune Analytics instance. We'll compare the results between local NetworkX execution and Neptune Analytics execution.

## Setup and Imports

First, let's import the necessary libraries and set up logging.

In [None]:
import networkx as nx
from nx_neptune import NeptuneGraph, export_csv_to_s3
from nx_neptune.clients import Node
from nx_neptune.utils.utils import get_stdout_logger
import logging
import os
import matplotlib.pyplot as plt

In [None]:
logger = get_stdout_logger(__name__,[
                    'nx_neptune.algorithms.link_analysis.pagerank',
                    'nx_neptune.na_graph', 'nx_neptune.utils.decorators',
                    'nx_neptune.instance_management', __name__])

# Ignore cache warnings
nx.config.warnings_to_ignore.add("cache")

## Check for Neptune Analytics Graph ID

We need to ensure that the NETWORKX_GRAPH_ID environment variable is set. You can also set it directly in this notebook.

In [None]:
# Read and load graphId from environment variable
graph_id = os.getenv('NETWORKX_GRAPH_ID')

# If not set, you can set it here
if not graph_id:
    # Uncomment and set your Graph ID
    # %env NETWORKX_GRAPH_ID=your-neptune-analytics-graph-id
    # graph_id = os.getenv('NETWORKX_GRAPH_ID')
    print("Warning: Environment Variable NETWORKX_GRAPH_ID is not defined")
    print("You can set it using: %env NETWORKX_GRAPH_ID=your-neptune-analytics-graph-id")
else:
    print(f"Using Neptune Analytics Graph ID: {graph_id}")

## Example 1: Clear the Neptune Analytics Graph

Let's start with a clean slate by clearing any existing data in the Neptune Analytics graph.

In [None]:
# Clear the Neptune Analytics graph
g = nx.Graph()
na_graph = NeptuneGraph.from_config(graph=g)
na_graph.clear_graph()
print("Neptune Analytics graph cleared successfully")

In [None]:
# Create a directed graph
g = nx.DiGraph()

# Test data - explicitly defining the graph with alphabetical nodes and directed edges
# Add nodes
nodes = ['A', 'B', 'C', 'D', 'E']
g.add_nodes_from(nodes)

# Add directed edges to create a directed path graph (A→B→C→D→E)
g.add_edge('A', 'B')
g.add_edge('B', 'C')
g.add_edge('C', 'D')
g.add_edge('D', 'E')

# Add a cycle by connecting E back to C
g.add_edge('E', 'C')

# Add an isolated node
g.add_node("X(DCd)")

# Visualize the graph
plt.figure(figsize=(10, 6))
pos = nx.spring_layout(g, seed=42)
nx.draw(g, pos, with_labels=True, node_color='lightblue', 
        node_size=1500, font_size=12, arrows=True, arrowsize=20)
plt.title("Directed Graph for PageRank Demonstration")
plt.show()

print("Graph structure:")
print("""
   A→B→C→D→E
       ↑   |
       └───┘

   X(DCd)
""")

## Run PageRank with Standard NetworkX

First, let's run the PageRank algorithm using the standard NetworkX implementation.

In [None]:
# Scenario: Local execution
r_local = nx.pagerank(g)
print("PageRank results using standard NetworkX:")
for key, value in sorted(r_local.items(), key=lambda x: (x[1], x[0]), reverse=True):
    print(f"{key}: {value:.6f}")

## Run PageRank with Neptune Analytics

Now, let's run the same algorithm using Neptune Analytics as the backend.

In [None]:
# Scenario: AWS Neptune Analytics
try:
    r_neptune = nx.pagerank(g, backend="neptune")
    print("PageRank results using Neptune Analytics:")
    for key, value in sorted(r_neptune.items(), key=lambda x: (x[1], x[0]), reverse=True):
        print(f"{key}: {value:.6f}")
except Exception as e:
    print(f"Error: {e}")
    print("Make sure GRAPH_ID is set and your AWS credentials are configured correctly.")

## Visualize PageRank Results

Let's visualize the PageRank results by adjusting the node sizes according to their PageRank values.

In [None]:
def visualize_pagerank(graph, pagerank_values, title):
    plt.figure(figsize=(10, 6))
    pos = nx.spring_layout(graph, seed=42)
    
    # Scale node sizes based on PageRank values
    node_sizes = [pagerank_values[node] * 10000 for node in graph.nodes()]
    
    # Draw the graph
    nx.draw(graph, pos, with_labels=True, 
            node_color='lightblue', 
            node_size=node_sizes, 
            font_size=12, 
            arrows=True, 
            arrowsize=20)
    
    plt.title(title)
    plt.show()

# Visualize NetworkX PageRank results
visualize_pagerank(g, r_local, "PageRank Results with NetworkX (Node Size = PageRank Value)")

# Visualize Neptune Analytics PageRank results if available
try:
    visualize_pagerank(g, r_neptune, "PageRank Results with Neptune Analytics (Node Size = PageRank Value)")
except NameError:
    print("Cannot visualize Neptune Analytics results because execution failed.")

## Example 2: Undirected Graph

Clear the neptune graph and create an undirected graph case

In [None]:
# Clear the Neptune Analytics graph
na_graph = NeptuneGraph.from_config()
na_graph.clear_graph()
print("Neptune Analytics graph cleared successfully")

In [None]:
# Create a directed graph
g = nx.tadpole_graph(5,4,create_using=nx.Graph())

# Visualize the graph
plt.figure(figsize=(10, 6))
pos = nx.spring_layout(g, seed=42)
nx.draw(g, pos, with_labels=True, node_color='lightblue',
        node_size=1500, font_size=12, arrows=True, arrowsize=20)
plt.title("Directed Graph for PageRank Demonstration")
plt.show()


## Run PageRank with Standard NetworkX

First, let's run the PageRank algorithm using the standard NetworkX implementation.

In [None]:
# Scenario: Local execution
r_local = nx.pagerank(g)
print("PageRank results using standard NetworkX:")
for key, value in sorted(r_local.items(), key=lambda x: (x[1], x[0]), reverse=True):
    print(f"{key}: {value:.6f}")

## Run PageRank with Neptune Analytics

Now, let's run the same algorithm using Neptune Analytics as the backend.

In [None]:
# Scenario: AWS Neptune Analytics
try:
    r_neptune_string_keys = nx.pagerank(g, backend="neptune")
    print("PageRank results using Neptune Analytics:")
    for key, value in sorted(r_neptune_string_keys.items(), key=lambda x: (x[1], x[0]), reverse=True):
        print(f"{key}: {value:.6f}")
        r_neptune[int(key)] = value
except Exception as e:
    print(f"Error: {e}")
    print("Make sure GRAPH_ID is set and your AWS credentials are configured correctly.")

## Visualize PageRank Results

Let's visualize the PageRank results by adjusting the node sizes according to their PageRank values.

In [None]:
def visualize_pagerank(graph, pagerank_values, title):
    plt.figure(figsize=(10, 6))
    pos = nx.spring_layout(graph, seed=42)

    # Scale node sizes based on PageRank values
    node_sizes = [pagerank_values[node] * 10000 for node in graph.nodes()]

    # Draw the graph
    nx.draw(graph, pos, with_labels=True,
            node_color='lightblue',
            node_size=node_sizes,
            font_size=12,
            arrows=True,
            arrowsize=20)

    plt.title(title)
    plt.show()

# Visualize NetworkX PageRank results
visualize_pagerank(g, r_local, "PageRank Results with NetworkX (Node Size = PageRank Value)")

# Visualize Neptune Analytics PageRank results if available
try:
    visualize_pagerank(g, r_neptune, "PageRank Results with Neptune Analytics (Node Size = PageRank Value)")
except NameError:
    print("Cannot visualize Neptune Analytics results because execution failed.")

### Example 3: Execute Personalized PageRank Algorithm

Execute Personalized PageRank algorithm against the dataset on remote Neptune Analytics instance.

In [None]:
r = nx.pagerank(g, backend="neptune", personalization={"A": 1, "B": 2.4})
logger.info("Algorithm execution - Neptune Analytics: ")
logger.info(r)

### Example 4: Execute PageRank Mutation Algorithm

Execute PageRank Mutation algorithm against the dataset on remote Neptune Analytics instance, then retrieve the first 10 nodes to verify result indeed being written into remote graph.

In [None]:
r = nx.pagerank(g, backend="neptune", write_property="rank")

logger.info("Algorithm execution - Neptune Analytics: ")
for item in na_graph.get_all_nodes()[:10]:
    logger.info(Node.from_neptune_response(item))

### Example 4.1 Export to s3 bucket
Because the mutation algorithm commits results directly to designated fields without returning a value, it is recommended that users perform an S3 export afterward to ensure the mutated graph's state is preserved.

In [None]:
# Export - blocking
nx.config.backends.neptune.s3_iam_role = "<your-role>"
nx.config.backends.neptune.export_s3_bucket = "<your-s3-bucket>/export"

r = nx.pagerank(g, backend="neptune", write_property="rank")

logger.info("Algorithm execution - Neptune Analytics: ")
for item in na_graph.get_all_nodes()[:10]:
    logger.info(Node.from_neptune_response(item))


## Conclusion

This notebook demonstrated how to use the PageRank algorithm with both standard NetworkX and Neptune Analytics as a backend. The results show that:

1. Node C has the highest PageRank value, which makes sense as it receives links from both B and E
2. Nodes D and E also have high PageRank values due to being part of the cycle
3. The isolated node X(DCd) has a low PageRank value
4. There may be slight differences between the NetworkX and Neptune Analytics implementations

Using Neptune Analytics as a backend allows you to offload the computation to AWS, which can be beneficial for larger graphs where local computation would be resource-intensive.