# Neptune Analytics Instance Management With S3 Table Projections

This notebook uses the SessionManager to create projections from S3 Table datalake, load the projection into Neptune Analytics through S3. We will use the Louvain algorithm to find potential fraudulent nodes, and export the mutated graph back into S3 for our datalake.

This notebook demonstrates how to:
1. Create a projection from S3 Tables bucket.
2. Import the projection into Neptune Analytics.
3. Run Louvain algorithm on the provisioned instance to create communities.
4. Export the graph back into S3 Tables bucket.

## Setup

Import the necessary libraries and set up logging.

In [None]:
import logging
import sys
import os
from pprint import pprint

import dotenv

dotenv.load_dotenv()

from nx_neptune.session_manager import SessionManager

In [None]:
# Configure logging to see detailed information about the instance creation process
logging.basicConfig(
    level=logging.INFO,
    format='%(levelname)s - %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S',
    stream=sys.stdout  # Explicitly set output to stdout
)
# Enable debug logging for the instance management module
for logger_name in [
    'nx_neptune.instance_management',
    'nx_neptune.session_manager',
]:
    logging.getLogger(logger_name).setLevel(logging.INFO)
logger = logging.getLogger(__name__)

## Configuration

Check for environment variables necessary for the notebook.

In [None]:
def check_env_vars(var_names):
    values = {}
    for var_name in var_names:
        value = os.getenv(var_name)
        if not value:
            print(f"Warning: Environment Variable {var_name} is not defined")
            print(f"You can set it using: %env {var_name}=your-value")
        else:
            print(f"Using {var_name}: {value}")
        values[var_name] = value
    return values
    
# Check for optional environment variables
env_vars = check_env_vars([
    'NETWORKX_S3_IMPORT_BUCKET_PATH',
    'NETWORKX_S3_EXPORT_BUCKET_PATH',
    'NETWORKX_S3_TABLES_CATALOG',
    'NETWORKX_S3_TABLES_DATABASE',
    'NETWORKX_S3_TABLES_TABLENAME',
])

# Get environment variables
s3_location_import = os.getenv('NETWORKX_S3_IMPORT_BUCKET_PATH')
s3_location_export = os.getenv('NETWORKX_S3_EXPORT_BUCKET_PATH')
s3_tables_catalog = os.getenv('NETWORKX_S3_TABLES_CATALOG')
s3_tables_database = os.getenv('NETWORKX_S3_TABLES_DATABASE')
s3_tables_tablename = os.getenv('NETWORKX_S3_TABLES_TABLENAME')
session_name = "nx-athena-test-full"

## Create a New/Get existing Neptune Analytics Instance

Provision a new Neptune Analytics instance on demand, or retrieve an existing neptune-graph. Creating a new instance may take several minutes to complete.

In [None]:
session = SessionManager.session(session_name)
graph_list = session.list_graphs(with_details=False)
print("The following graphs are available:")
pprint(graph_list)

In [None]:
session = SessionManager.session(session_name)
graph = await session.get_or_create_graph(config={"provisionedMemory": 32})
print("Retrieved graph:")
pprint(graph)

## Import Data from S3

Import data from S3 into the Neptune Analytics graph and wait for the operation to complete. <br>
IAM permisisons required for import: <br>
 - s3:GetObject, kms:Decrypt, kms:GenerateDataKey, kms:DescribeKey

In [6]:
SOURCE_AND_DESTINATION_BANK_CUSTOMERS = f"""
SELECT DISTINCT "~id", 'customer' AS "~label"
FROM (
     SELECT "nameOrig" as "~id"
     FROM {s3_tables_tablename}
     WHERE "nameOrig" IS NOT NULL
     UNION ALL
     SELECT "nameDest" as "~id"
     FROM {s3_tables_tablename}
     WHERE "nameDest" IS NOT NULL
);"""

BANK_TRANSACTIONS = f"""
SELECT
    "nameOrig" as "~from",
    "nameDest" as "~to",
    "type" AS "~label",
    "step" AS "step:Int",
    "amount" AS "amount:Float",
    "oldbalanceOrg" AS "oldbalanceOrg:Float",
    "newbalanceOrig" AS "newbalanceOrig:Float",
    "oldbalanceDest" AS "oldbalanceDest:Float",
    "newbalanceDest" AS "newbalanceDest:Float",
    "isFraud" AS "isFraud:Int",
    "isFlaggedFraud" AS "isFlaggedFraud:Int"
FROM {s3_tables_tablename}
WHERE "nameOrig" IS NOT NULL AND "nameDest" IS NOT NULL"""

await session.import_from_table(
    graph["id"],
    s3_location_import,
    [SOURCE_AND_DESTINATION_BANK_CUSTOMERS, BANK_TRANSACTIONS],
    catalog=s3_tables_catalog,
    database=s3_tables_database
)

INFO - Importing to graph g-d6ucxnv1f3
INFO - Creating table using statement:
SELECT DISTINCT "~id", 'customer' AS "~label"
FROM (
     SELECT "nameOrig" as "~id"
     FROM transactions
     WHERE "nameOrig" IS NOT NULL
     UNION ALL
     SELECT "nameDest" as "~id"
     FROM transactions
     WHERE "nameDest" IS NOT NULL
);
INFO - Executing query: 9c61c8e0-1dbd-4328-9bf8-b3d36142f100
INFO - Creating table using statement:
SELECT
    "nameOrig" as "~from",
    "nameDest" as "~to",
    "type" AS "~label",
    "step" AS "step:Int",
    "amount" AS "amount:Float",
    "oldbalanceOrg" AS "oldbalanceOrg:Float",
    "newbalanceOrig" AS "newbalanceOrig:Float",
    "oldbalanceDest" AS "oldbalanceDest:Float",
    "newbalanceDest" AS "newbalanceDest:Float",
    "isFraud" AS "isFraud:Int",
    "isFlaggedFraud" AS "isFlaggedFraud:Int"
FROM transactions
WHERE "nameOrig" IS NOT NULL AND "nameDest" IS NOT NULL
INFO - Executing query: 9a0634d3-7bce-4647-8378-76d08af73d2f
INFO - [2025-12-30 15:34:42] T

ValueError: Invalid ARN format for '9c61c8e0-1dbd-4328-9bf8-b3d36142f100': Provided ARN: 9c61c8e0-1dbd-4328-9bf8-b3d36142f100 must be of the format: arn:partition:service:region:account:resource

## Execute Louvain Algorithm

Create a NetworkX graph and initialize the connection to the Neptune Analytics instance.

We will run the Louvain Community Detection Algorithm and mutate the graph storing the results of the vertex community in the "community" property

You can see the results in the console by removing the `write_property` argument.

In [None]:
from nx_neptune import NeptuneGraph, set_config_graph_id

config = set_config_graph_id(graph["id"])
na_graph = NeptuneGraph.from_config(config)

# sanity check: print out 10 vertices and edges from the Neptune Analytics graph
ALL_NODES = "MATCH (n) RETURN n LIMIT 10"
all_nodes = na_graph.execute_call(ALL_NODES)
print(f"all nodes: {all_nodes}")

ALL_EDGES = "MATCH ()-[r]-() RETURN r LIMIT 10"
all_edges = na_graph.execute_call(ALL_EDGES)
print(f"all edges: {all_edges}")

In [None]:
import networkx as nx

nx.config.backends.neptune.graph_id = graph["id"]

# using Neptune Analytics, run the Louvain Community Detection Algorithm and mutate
# the graph storing the results of the vertex community in the "community" property
result = nx.community.louvain_communities(nx.Graph(), backend="neptune", write_property="community")
print(f"louvain result: \n{result}")


## Export the Neptune Analytics data and add it to S3 Tables as an Iceberg table

Export the Neptune Analytics graph and a CSV export, and convert it to Iceberg format.  Use Athena to add it to S3 Tables Bucket.

In [None]:
# for the CSV table
csv_catalog = 'AwsDataCatalog'
csv_database = 'bank_fraud_full'
csv_table_name = 'transactions_csv'

# for the iceberg table
iceberg_vertices_table_name = 'customers_updated'
iceberg_edges_table_name = 'transactions_updated'
iceberg_catalog = 's3tablescatalog/nx-fraud-detection-data'
iceberg_database = 'bank_fraud_full'

await session.export_to_table(
    graph["id"],
    s3_location_export,
    csv_table_name,
    csv_catalog,
    csv_database,
    iceberg_vertices_table_name,
    iceberg_edges_table_name,
    iceberg_catalog,
    iceberg_database
)

## Conclusion

This notebook demonstrated the complete lifecycle of running analytics from a datalake projection into Neptune Analytics instance:

1. **Creation**: We created a new Neptune Analytics instance on demand
2. **Import**: We imported a projection of the datalake
3. **Usage**: We ran graph algorithms (Louvain) on the instance and mutated the data
4. **Deletion**: We exported the updated data back into the datalake into an iceberg table

The session manager (`SessionManager`) provides an easy mechanism to execute general datalake functionality.