# Neptune Analytics S3 Import/Export Demo

This notebook demonstrates how to:
1. Import data from an S3 bucket into a Neptune Analytics graph
2. Export graph data to an S3 bucket

The notebook uses boto3 to interact with the Neptune Analytics API and includes functions to wait for operations to complete before proceeding.

## Setup

Import the necessary libraries and set up logging.

In [None]:
import asyncio
import logging
import sys
import os

import boto3
import networkx as nx
from nx_neptune import NeptuneGraph, import_csv_from_s3, export_csv_to_s3
import matplotlib.pyplot as plt

In [None]:
logging.basicConfig(
        level=logging.WARNING,
        format='%(levelname)s - %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S',
        stream=sys.stdout  # Explicitly set output to stdout
    )
for logger_name in ['IAMClient', 'nx_neptune.clients.instance_management']:
    logging.getLogger(logger_name).setLevel(logging.DEBUG)
logger = logging.getLogger(__name__)

## Configuration

Set up the necessary environment variables for S3 bucket location and IAM role ARN. You can either set these as environment variables before starting the notebook or define them directly here.

In [None]:
def check_env_vars(var_names):
    values = {}
    for var_name in var_names:
        value = os.getenv(var_name)
        if not value:
            print(f"Warning: Environment Variable {var_name} is not defined")
            print(f"You can set it using: %env {var_name}=your-value")
        else:
            print(f"Using {var_name}: {value}")
        values[var_name] = value
    return values
    
env_vars = check_env_vars([
    'NETWORKX_S3_IMPORT_BUCKET_PATH',
    'NETWORKX_S3_EXPORT_BUCKET_PATH',
    'NETWORKX_ARN_IAM_ROLE',
    'NETWORKX_GRAPH_ID'
])

# Get environment variables or set them directly
# s3://BUCKET_NAME/FOLDER_NAME
s3_location_import = os.getenv('NETWORKX_S3_IMPORT_BUCKET_PATH')
s3_location_export = os.getenv('NETWORKX_S3_EXPORT_BUCKET_PATH')
# arn:aws:iam::AWS_ACCOUNT:role/IAM_ROLE_NAME
role_arn = os.getenv('NETWORKX_ARN_IAM_ROLE')
# You can also set the Neptune Analytics Graph ID if needed
graph_id = os.getenv('NETWORKX_GRAPH_ID')



## Initialize Neptune Graph

Create a NetworkX graph and initialize the Neptune Analytics graph.

In [None]:
# Initialize a directed graph
g = nx.DiGraph()

# Create a Neptune Analytics graph instance
na_graph = NeptuneGraph.from_config(graph=g)
BACKEND = "neptune"

## Import Data from S3 (Blocking)

Import data from S3 into the Neptune Analytics graph and wait for the operation to complete. <br>
IAM permisisons required for import: <br>
 - s3:GetObject, kms:Decrypt, kms:GenerateDataKey, kms:DescribeKey

In [None]:
future = import_csv_from_s3(
        na_graph, s3_location_import)
import_blocking_status = await future
print("Import completed with status: " + import_blocking_status)

## Import Data from S3 (Non-blocking)

Import data from S3 into the Neptune Analytics graph while performing other operations.
In this scenario, the user has the freedom to carry out other Python workloads and executions, check the job status periodically, and then proceed further.

In [None]:
future = import_csv_from_s3(
        na_graph, s3_location_import)
# Carry on with some other Non NA workload on python
# .....

# Periodic check on the  job status
while not future.done():
    print("Simulate analytics workload on local")
    await asyncio.sleep(60)
print("Import completed in async fashion.")

In [None]:
## BFS Execution

In [None]:
# BFS on Air route
r = list(nx.bfs_edges(g, source="48", backend=BACKEND))
print('BFS search on NePtune Analytics with source=48 (Vanouver international airport): ')
print(f"Total size of the result: {len(r)}")


## Export Data to S3 (Blocking)

Export data from the Neptune Analytics graph to S3 and wait for the operation to complete. <br>
After the job is completed, an additional folder—named using the job ID—will be added to the S3 path specified by the user, containing the exported files and data.

IAM permisisons required for export: <br>
 - s3:GetObject, kms:Decrypt, kms:GenerateDataKey, kms:DescribeKey


In [None]:
future = export_csv_to_s3(
        na_graph, s3_location_export)
await future
print("Export completed with export location: " + s3_location_export)

## Export Data to S3 (Non-blocking)

Export data from the Neptune Analytics graph to S3 while performing other operations.

In [None]:
future = export_csv_to_s3(
        na_graph, s3_location_export)
while not future.done():
    print("Simulate analytics workload on local")
    await asyncio.sleep(60)

## Conclusion

This notebook demonstrated how to import data from S3 into a Neptune Analytics graph and export data from the graph to S3. Both blocking and non-blocking approaches were shown, allowing you to choose the most appropriate method for your workflow.