# Label Propagation algorithms with Neptune Analytics

This notebook demonstrates how the Label Propagation algorithms computation can be offloaded to a remote AWS Neptune Analytics instance.

## Setup and Imports

First, let's import the necessary libraries and set up logging.

In [None]:
import networkx as nx
from nx_neptune import NeptuneGraph
from nx_neptune.clients import Node
import logging
import os
import matplotlib.pyplot as plt
from nx_neptune.utils.utils import get_stdout_logger
import requests
import pandas as pd

In [None]:
logger = get_stdout_logger(__name__,[
                    'nx_neptune.algorithms.communities.label_propagation',
                    'nx_neptune.na_graph', 'nx_neptune.utils.decorators',
                    'nx_neptune.instance_management',__name__])

# Ignore cache warnings
nx.config.warnings_to_ignore.add("cache")

## Check for Neptune Analytics Graph ID

We need to ensure that the NETWORKX_GRAPH_ID environment variable is set. You can also set it directly in this notebook.

In [None]:
# Read and load graphId from environment variable
graph_id = os.getenv('NETWORKX_GRAPH_ID')

# If not set, you can set it here
if not graph_id:
    # Uncomment and set your Graph ID
    # %env NETWORKX_GRAPH_ID=your-neptune-analytics-graph-id
    # graph_id = os.getenv('NETWORKX_GRAPH_ID')
    print("Warning: Environment Variable NETWORKX_GRAPH_ID is not defined")
    print("You can set it using: %env NETWORKX_GRAPH_ID=your-neptune-analytics-graph-id")
else:
    print(f"Using Neptune Analytics Graph ID: {graph_id}")

## Download and configure Air route dataset

Then download the air route dataset for testing purpose.

In [None]:
# Download routes data
routes_url = "https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat"
routes_file = "resources/notebook_test_data_routes.dat"

# Ensure the directory exists
os.makedirs(os.path.dirname(routes_file), exist_ok=True)

# Download only if file doesn't exist
if not os.path.isfile(routes_file):
    with open(routes_file, "wb") as f:
        f.write(requests.get(routes_url).content)


cols = [
    "airline", "airline_id", "source_airport", "source_airport_id",
    "dest_airport", "dest_airport_id", "codeshare", "stops", "equipment"
]

routes_df = pd.read_csv("resources/notebook_test_data_routes.dat", names=cols, header=None)

air_route_graph = nx.Graph()  # use DiGraph for directed air routes

for _, row in routes_df.iterrows():
    src = row["source_airport"]
    dst = row["dest_airport"]
    if pd.notnull(src) and pd.notnull(dst):
        air_route_graph.add_edge(src, dst)
logger.info(f'Populated test dataset with nodes:{air_route_graph.number_of_nodes()} and edges:{air_route_graph.number_of_edges()}')

### Example 1: Label Propagation Communities

Let's start with running the label_propagation_communities() Algorithm against the air route data and list out the top 10 results

In [None]:
result_na = nx.community.label_propagation_communities(air_route_graph, backend="neptune")
logger.info("Algorithm execution - Neptune Analytics: ")

# Sort 
sorted_lists = sorted(list(result_na), key=len, reverse=True)
# Get top 5 and print first 3 items of each
for i, lst in enumerate(sorted_lists[:5], 1):
    logger.info(f"Top {i}: {lst[:3]}")

### Example 2: Fast Label Propagation Communities

Execute **fast_label_propagation_communities()** against air route dataset on remote Neptune Analytics instance

**In Neptune Analytics, label propagation maps all NetworkX variants to the same algorithm, using a fixed label update strategy. Variant-specific control over the update method (e.g., synchronous vs. asynchronous) is not configurable.**

Reference: https://docs.aws.amazon.com/neptune-analytics/latest/userguide/label-propagation.html

In [None]:
result_na = nx.community.fast_label_propagation_communities(air_route_graph, backend="neptune")

# Sort 
sorted_lists = sorted(list(result_na), key=len, reverse=True)
# Get top 5 and print first 3 items of each
for i, lst in enumerate(sorted_lists[:5], 1):
    logger.info(f"Top {i}: {lst[:3]}")

### Example 3: Async Label Propagation Communities

Execute **asyn_lpa_communities** against air route dataset on remote Neptune Analytics instance

**In Neptune Analytics, label propagation maps all NetworkX variants to the same algorithm, using a fixed label update strategy. Variant-specific control over the update method (e.g., synchronous vs. asynchronous) is not configurable.**

Reference: https://docs.aws.amazon.com/neptune-analytics/latest/userguide/label-propagation.html

In [None]:
result_na = nx.community.asyn_lpa_communities(air_route_graph, backend="neptune")

# Sort 
sorted_lists = sorted(list(result_na), key=len, reverse=True)
# Get top 5 and print first 3 items of each
for i, lst in enumerate(sorted_lists[:5], 1):
    logger.info(f"Top {i}: {lst[:3]}")

### Example 4: Execute Label Propagation Communities Mutation Algorithm

Execute Label Propagation Communities Mutation algorithm against air route dataset on remote Neptune Analytics instance, then retrieve the first 10 nodes to verify result indeed being written into each node with property name 'communities'.

In [None]:
result_na = nx.community.label_propagation_communities(air_route_graph, backend="neptune", write_property="communities")
logger.info("Algorithm execution - Neptune Analytics: ")

"""List 10 nodes"""
nx_graph = NeptuneGraph.from_config(graph=air_route_graph)
for item in nx_graph.get_all_nodes()[:10]:
    logger.info(Node.from_neptune_response(item))