Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0

# Implementing Time-to-Live in Amazon Neptune

Time-to-Live (TTL) is used to determine the longevity or lifespan of an object, be it data, resource, file, or even a whole environment. Common uses for TTL include data caching for user sissions, security auditing, compliance, or when data is no longer relevant past a certain amount of time. In our example, we use TTL to determine when a particular node or edge should be removed. 

The following examples will walk you through how to test the deployed TTL setup.

### Configuring neptune_python_utils

Before we begin, we'll need to fetch and install `neptune-python-utils`. [`neptune-python-utils`](https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-python-utils) is a Python 3 library that simplifies using Gremlin-Python to connect to Amazon Neptune. The library makes it easy to configure your driver to support IAM DB Authentication, create sessioned interactions with Neptune, and write data to Amazon Neptune from AWS Glue jobs.

In [None]:
!git clone https://github.com/awslabs/amazon-neptune-tools.git

In [None]:
%%bash

amt_dir=$(pwd)
amt_dir+="/amazon-neptune-tools/neptune-python-utils/neptune_python_utils/"
npu_dir=$(python -c 'import site; print(site.getsitepackages()[0])')
npu_dir+="/neptune_python_utils/"
echo "Copying from" $amt_dir "to" $npu_dir
cp -r $amt_dir $npu_dir


### Preparing sample data

Run the following to set up a Gremlin connection to your Neptune cluster. It also defines example functions that will be used to create your graph with associated TTL values.

In [None]:
import os
import json
import sys
import subprocess
import boto3
import time
import random
import math
from neptune_python_utils.endpoints import Endpoints
from neptune_python_utils.gremlin_utils import GremlinUtils
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.driver.aiohttp.transport import AiohttpTransport
from gremlin_python.process.traversal import *

# get neptune endpoint from environment vars
stream = os.popen("source ~/.bashrc ; echo $GRAPH_NOTEBOOK_HOST; echo $GRAPH_NOTEBOOK_PORT ; echo $NEPTUNE_TTL_PROPERTY_NAME ; echo $AWS_REGION;")
nep_settings = stream.read().split("\n")

endpoints = Endpoints(
        neptune_endpoint=nep_settings[0], 
        region_name=nep_settings[3])

# create gremlin connection, get traversal object
GremlinUtils.init_statics(globals())
gremlin_utils = GremlinUtils(endpoints)

conn = gremlin_utils.remote_connection()
g = gremlin_utils.traversal_source(connection=conn)

TTL_PROP_NAME = nep_settings[2]
LABEL='TTL-test'
ID_PREFIX='ttl_'
PROP1='prop1'
PROP1_VAL='xyz'
PROP2='prop2'
PROP2_VAL='abc'

EDGE_LABEL='linkedTo'

'''
Create vertex with label LABEL and id ID_PREFIX_(idx + id_offset)
It has props PROP1 and PROP2.
To set TTL:
   If ttl_offset >= 0, set prop TTL to currtime + ttl_offset. 
   Otherwise, set prop "noTTL" to "dontcare
'''
def create_vertex(g, idx, id_offset, ttl_offset):

    vid = ID_PREFIX + str(idx + id_offset)
    
    ttl_prop = TTL_PROP_NAME
    ttl = int(time.time()) + ttl_offset
    if ttl_offset < 0:
        ttl_prop = "no" + TTL_PROP_NAME
        ttl = 0
        
    try:
        g.addV(LABEL).property(T.id, vid).property('testBatch', id_offset) \
            .property(PROP1, PROP1_VAL).property(PROP2, PROP2_VAL) \
            .property(ttl_prop, ttl).next()
        #print("Created vertex " + vid +  " ttl " + str(ttl))
    except Exception as ex:
        template = "An exception of type {0} occurred. Arguments:\n{1!r}"
        message = template.format(type(ex).__name__, ex.args)
        print("Exception processing Neptune object " + vid + " exception " + message)
        raise
      
'''
Create edge with label EDGE_LABEL and id <src>-<target>
It has props PROP1 and PROP2 and can have TTLs. These work the same as with a vertex
You pass a single source vertex but a bunch of target vertices (min_tgt_idx to max_tgt_idx)
You can put a probability that the edge exists (prob_edge)
'''
def create_edges(g, src_idx, id_offset, min_tgt_idx, max_tgt_idx, prob_edge, ttl_offset):
    # source vertex
    src_vid = ID_PREFIX + str(src_idx + id_offset)
    
    # edge TTL
    ttl_prop = TTL_PROP_NAME
    ttl = int(time.time()) + ttl_offset
    if ttl_offset < 0:
        ttl_prop = "no" + TTL_PROP_NAME
        ttl = 0
        
    # for each possible target, edge, create an edge (given probability)
    for e in range(min_tgt_idx, max_tgt_idx):
        if (e - min_tgt_idx) % 1000 == 0:
            print(str(e - min_tgt_idx))

        if random.random() > (1.0 - prob_edge):
            tgt_vid = ID_PREFIX + str(e + id_offset)
            tgt_node = g.V(tgt_vid).toList()[-1]
            edge_id = src_vid + "-" + tgt_vid
            try:
                g.V(src_vid).addE(EDGE_LABEL).to(tgt_node).property('testBatch', id_offset) \
                    .property(PROP1, PROP1_VAL).property(PROP2, PROP2_VAL) \
                    .property(ttl_prop, ttl).next()
                #print("Edge " + edge_id + " ttl " + str(ttl))

            except Exception as ex:
                template = "An exception of type {0} occurred. Arguments:\n{1!r}"
                message = template.format(type(ex).__name__, ex.args)
                print("Exception processing Neptune object " + edge_id + " exception " + message)
                raise



### Validating an empty graph

Run the following to validate that your graph is currently empty. The expected output is 0.

In [None]:
%%gremlin

g.V().count()

### Create nodes and edges with object TTL

Use the following variables to set the minimum and maximum TTL ranges for your test nodes (in seconds). These values will set the TTL values used for your sample nodes and edges. For example, a `min_ttl` of 30 and a `max_ttl` of 60 means that when you run the following script, it will create nodes and edges set to expire in 30-60 seconds after object creation.

In [None]:
min_ttl = 30

In [None]:
max_ttl = 60

Now run the following to add 100 nodes, and a random number of edges. All objects will have a random TTL property value between the minimum and maximum values you specified - calculated from when you run it - added to them.

In [None]:
id_offset = 100000
object_batch = id_offset
start_idx = id_offset
src_node = create_vertex(g, start_idx, id_offset, 1000)
src_graph_size = 100
for i in range(1, src_graph_size):
    src_node = create_vertex(g, start_idx + i, id_offset, random.randint(min_ttl, max_ttl))
create_edges(g, start_idx, id_offset, start_idx + 1, start_idx + src_graph_size, 0.5, random.randint(min_ttl, max_ttl))


Run the following to view the nodes - there should be 100 entries.

In [None]:
%%gremlin

g.V().hasLabel('${LABEL}').has('testBatch', ${object_batch}).elementMap()

Run the following to view the edges - there should be a non-zero amount of entries.

In [None]:
%%gremlin

g.E().hasLabel('${EDGE_LABEL}').has('testBatch', ${object_batch}).elementMap()

We can also use the `%stream_viewer` to look at the records in Neptune Streams. You should see multiple entries added. Make sure the dropdown option is set to `PropertyGraph`.

In [None]:
%stream_viewer

### Validating dropped objects

Navigate to the [DynamoDB](https://us-east-1.console.aws.amazon.com/dynamodbv2/) console, and select "Explore items" from the left-hand sidebar. Select the table that is prefixed with `NeptuneObject2TTL`. You should see the objects that you added, along with their TTL values. 

Once the objects in DynamoDB are dropped, run the following query to validate that they also were dropped in Neptune. Note that DynamoDB's TTL is a background process, so the actual delete operation of an expired item can vary, and DynamoDB's TTL typically deletes expired items within 48 hours of expiration. Additional details on how this works can be found [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html).

In [None]:
%%gremlin

g.V().hasLabel('${LABEL}').has('testBatch', ${object_batch}).elementMap()

We can also check for nodes that are still present in the graph, vs nodes that have been "expired" by using a date/time comparison on the TTL property.

In [None]:
import time
current_time = round(time.time())

In [None]:
%%gremlin

g.V().has('TTLBlog#TTL',gt(${current_time}))

You may notice nodes that exist in the graph, even though the current time is past their current TTL time. This is because the solution is based off [DynamoDB TTL](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html) expiring items, which uses "best effort" to expire items within a few days. To learn more about this process, refer to the [documentation](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html).