# Inductive Inference

In this notebook, we go through an example of real-time inductive inference with Amazon Neptune ML. Inductive inference allows customers to enable machine learning (ML) predictions on nodes, edges and properties (entities) that were added to the graph after the ML model training process.

Specifically, we will be doing inductive inference on an unseen news node that we will add to the database. We will also add an edge between the new news node and an existing user node to show that the news was spread by the specified user.

### Setup

In [None]:
# import required libraries
import boto3
import sagemaker
import pandas as pd
import utils.neptune_ml_utils as neptune_ml
# Check to make sure your Neptune cluster is configured to run Neptune ML.
neptune_ml.check_ml_enabled()

### Understanding the existing graph data

First, let's check the status of the Neptune database to ensure it is healthy before we begin making gremlin queries

In [None]:
%status

#### Overview of nodes and edges
Let's take a look at the nodes andd edges, each counted by type

In [None]:
%%gremlin
g.V().groupCount().by(label).unfold().order().by(keys)

In [None]:
%%gremlin
g.E().groupCount().by(label).unfold().order().by(keys)

We can see that we have the same number of nodes and edges we saw in the previous notebooks.

## View the existing user
Since we are going to connect the news node to a user node, let's select a user that will serve as the vertex for the edge we create. This means the news will be spread by this user.

In [None]:
%%gremlin
g.V().hasLabel('user')
        .elementMap().limit(2)

We can see that the users contain a lot of features. Let's take a look at user_1 to keep things simple for the example.

In [None]:
%%gremlin

g.V().hasLabel("user").hasId("user_1")
        .elementMap()

## Create the unseen news node and edge

First, let's define the news title. We can make it whatever we want. In reality, this would be the unseen news title you want to add to the database. 

In [None]:
news_title_str = "This is dummy news title used for the inductive inference!"

Now, let's add a node with this news title and create a `spread_by` edge connecting it to user_1. Note that we gave pseudo names of `node_1` and `node_2` to the news and the user, respectively

In [None]:
%%gremlin
g.addV('news').
    property('news_title','${news_title_str}')
    .as('node_1')
    .V().hasLabel("user")
        .hasId("user_1")
    .as('node_2')
    .addE('spread_by').from('node_1').to('node_2')

##### Verify Successful Node Creation
Run the code below and click the graph tab. You should see a news node spread by a user node

In [None]:
%%gremlin -p v,oute,inv

g.V().hasLabel('news')
    .has('news_title','${news_title_str}')
    .outE().inV().path()
    .by(elementMap())

## Inference
Now that we have our unseen news node and relevant edges added, let's perform inference to get the node classification.

### Define endpoint

First, let's define the name of our endpoint so that we can use it to make the prediction. If you have previously run the detect-fake-news notebook without restarting the kernel, you should be able to retrieve the endpoint using the cell below.

In [None]:
%%capture captured_output 
%store -r endpoint 

In [None]:
from IPython.display import display, Markdown
if 'no stored variable or alias' in captured_output.stdout:
    display(Markdown("Looks like you don't have the endpoint variable, so you'll need to follow the steps below to get your endpoint name: \n\n 1. Open the [SageMaker console](https://console.aws.amazon.com/sagemaker/)   \n  2. Select **Inference** in the left hand panel   \n  3. Click on **Endpoints**  \n   4. Copy the name of the endpoint that you want to invoke \n\n Once you have the name of the endpoint, change it in the line below:"))
else:
    print("successfully loaded the endpoint name. No action required - continue running the cells.")

In [None]:
if "endpoint" not in locals():
    endpoint = "yourEndpointNameHereIfNotAlreadyDefined" # only required if above cell outputs instructions to find it

### Transductive and Inductive Inference
Information on each type of inference is described in the corresponding sections below. Click [here](https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning-overview.html) for a detailed overview about these two types of inference.

#### Transductive inference

When performing transductive inference, Neptune looks up and returns predictions that were pre-computed at the time of training.

The below scenario involves transductive inference, and it is important to note that the newly created node was not present during the Neptune ML model training in previous notebook. Consequently, the expected outcome of this situation is a blank output.

In [None]:
%%gremlin 
g.with("Neptune#ml.endpoint", "${endpoint}")
    .V().has('news_title', "${news_title_str}")
    .properties("news_type","Neptune#ml.score")
    .with("Neptune#ml.classification")
    .value()

#### Inductive Inference

In below scenario, we are explicitly employing inductive inference on the new node we created (and the relevant information / edges) using the endpoint specified. Therefore, we will be making predictions on data that was not part of the Neptune ML model training process. As a result, the cell will produce an output along with a confidence score to indicate the level of certainty in the predictions.

In [None]:
%%gremlin
g.with("Neptune#ml.endpoint", "${endpoint}").
    V().hasLabel("news").
    has('news_title', '${news_title_str}').
    properties("news_type",  "Neptune#ml.score").
    with("Neptune#ml.inductiveInference").
    with("Neptune#ml.classification").value()
    

#### Inductive inference with minimum 60% confidance
Now let's add a confidence threshold of 60%. This means we are only using the values that have 60% confidence or highter for comparison to do the classification. 

In [None]:
%%gremlin
g.with("Neptune#ml.endpoint", "${endpoint}").
    V().hasLabel("news").
    has('news_title', '${news_title_str}').
    properties("news_type",  "Neptune#ml.score").
    with("Neptune#ml.inductiveInference").
    with("Neptune#ml.threshold", 0.6D).
    with("Neptune#ml.classification").value()
    

## Cleaning Up

Now that we can delete the inference endpoint to avoid recurring costs!

#### first get training_job_name

First, let's define the name of our training_job_name so that we can use it to make the prediction. If you have previously run the detect-fake-news notebook without restarting the kernel, you should be able to retrieve the endpoint using the cell below.

In [None]:
%%capture captured_output_training_job
%store -r training_job_name

In [None]:
from IPython.display import display, Markdown
if 'no stored variable or alias' in captured_output_training_job.stdout:
    display(Markdown("Looks like you don't have the training job name variable, so you'll need to follow the steps below to get your endpoint name: \n\n 1. Open the [SageMaker console](https://console.aws.amazon.com/sagemaker/)   \n 2. Select **Training** in the left hand panel   \n 3. Click on **Training Jobs**  \n 4. Copy the name of the training job you created. \n\n Once you have the name of the training job, change it in the line below:"))
else:
    print("successfully loaded the endpoint name. No action required - continue running the cells.")

In [None]:
if "training_job_name" not in locals():
    training_job_name = "yourTrainingJobNameHereIfNotAlreadyDefined" # only required if above cell outputs instructions to find it

#### now delete the endpoint

In [None]:
neptune_ml.delete_endpoint(training_job_name)