# Training a GNN model for user cell prediction and making predictions using transductive inference

This notebook goes over how to use Neptune ML to train a GNN model that can be deployed to a machine learning endpoint.
The deployed model endpoint can then be used to make predictions with Gremlin Queries

In [None]:
import neptune_ml_utils as neptune_ml
neptune_ml.check_ml_enabled()


s3_bucket_uri="<your s3 bucket for model training>"

Before we starting model training, we will perform the following steps
- drop some edges : Drop some links between some existing users and Cells to predict them with the transductive mode training.
- for user_0 and cell_62000 we dropped ALL edges
- for user_1500 and cell_56500 we dropped ALL edges  
- for user_4570 three edges user_live_cell_1678350, user_live_cell_734598, user_live_cell_487137

### Select users to drop their user_live_cell edges

In [None]:
%%gremlin
g.V()
.hasId("user_0")
.outE()
.hasLabel("user_live_cell")
.inV()
.valueMap(true, "name")
.groupCount()
.unfold()
.order()
.by(values, desc)

In [None]:
%%gremlin
g.V('user_0').outE()

In [None]:
%%gremlin
g.V('user_0').bothE().where(otherV().hasId('cell_62000'))

In [None]:
#g.V('user_0').bothE().where(otherV().hasId('cell_62000')).drop()

In [None]:
%%gremlin
g.V()
.hasId("user_1500")
.outE()
.hasLabel("user_live_cell")
.inV()
.valueMap(true, "name")
.groupCount()
.unfold()
.order()
.by(values, desc)

In [None]:
%%gremlin
g.V('user_1500').bothE().where(otherV().hasId('cell_56500'))

In [None]:
%%gremlin
g.V()
.hasId("user_1500")
.outE()
.hasLabel("user_live_cell")
.inV()
.valueMap(true, "name")
.groupCount()
.unfold()
.order()
.by(values, desc)

In [None]:
%%gremlin
g.V('user_4570').bothE().where(otherV().hasId('cell_10570'))

In [None]:
%%gremlin
g.V('user_4570').outE('user_live_cell').hasId('user_live_cell_734598').drop()

In [None]:
#g.V('user_4570').outE('user_live_cell').hasId('user_live_cell_1678350').drop()

In [None]:
%%gremlin
g.V('user_4570').bothE().where(otherV().hasId('cell_10570'))

## Launch the export job

In [None]:
export_params={ 
"command": "export-pg", 
"params": { "endpoint": neptune_ml.get_host(),
            "profile": "neptune_ml",
            "useIamAuth": neptune_ml.get_iam(),
            "cloneCluster": False,
            "nodeLabels": ["user", "cell"],
            "edgeLabels": ["user_live_cell"]
            }, 
"outputS3Path": f'{s3_bucket_uri}/neptune-export',
"additionalParams": {
        "neptune_ml": {
          "version": "v2.0",
          "targets": [
            {
                "edge": ["user", "user_live_cell", "cell"],
                "type" : "link_prediction",
                "split_rate": [0.8, 0.1, 0.1]
            }
         ]
        }
      },
"jobSize": "xlarge"}
export_params

In [None]:
%%neptune_ml export start --export-url {neptune_ml.get_export_service_host()} --export-iam --wait --store-to export_results
${export_params}

## Data processing/Preparation of graph data for Training

In [None]:
# The training_job_name can be set to a unique value below, otherwise one will be auto generated
training_job_name=neptune_ml.get_training_job_name('link-prediction')

processing_params = f"""
--config-file-name training-data-configuration.json
--job-id {training_job_name} 
--instance-type ml.r5.16xlarge
--s3-input-uri {export_results['outputS3Uri']}
--s3-processed-uri {str(s3_bucket_uri)}/preloading """

%neptune_ml dataprocessing start --wait --store-to processing_results {processing_params}

## Training

In [None]:
<div style="background-color:#eeeeee; padding:20px; text-align:left; border-radius:10px; margin-top:10px; margin-bottom:10px; "><b>Information</b>: Link prediction is a more computationally complex model than classification or regression </div>

In [None]:
training_params=f"""
--job-id {training_job_name}
--data-processing-id {training_job_name}
--instance-type ml.g4dn.16xlarge
--s3-output-uri {str(s3_bucket_uri)}/training
--max-hpo-number 2
--max-hpo-parallel 2 """

In [None]:
training_params

In [None]:
%neptune_ml training start --wait --store-to training_results {training_params}


The model training above used all default parameters to minimize running time and cost but you can modify them to get a stronger model. For example, you can set --max-hpo-number 9 --max-hpo-parallel 3.

You can also modify additional model and hyperparameter configurations by following the instructions [here](https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning-customizing-hyperparams.html)

For example you can set use-edge-features to True by modifying the `model-hpo-configuration.json` file. However using edge features will lead to errors in inductive inference in Notebook 3b.

## Inference

### Endpoint creation

In [None]:
endpoint_params=f"""
--id {training_job_name}
--model-training-job-id {training_job_name}"""
endpoint_params

In [None]:
%neptune_ml endpoint create --wait --store-to endpoint_results {endpoint_params}

endpoint_transductive=endpoint_results['endpoint']['name']

### reminder

- user_0 and cell_62000 / dropped ALL edges 

- user_1500 and cell_56500 / dropped ALL edges 

- user_4570 three edges of cell_10570

<img src="attachment:2a0d7696-5e11-42af-a56f-f7acc5d63572.png" alt="image.png" width="1000"/>

<div style="background-color:#eeeeee; padding:20px; text-align:left; border-radius:10px; margin-top:10px; margin-bottom:10px; "><b>Experimentation1</b>: GNN is going to predict that user_0 is connected to cell_62000</div>

In [None]:
%%gremlin
g.with("Neptune#ml.endpoint","${endpoint_transductive}")
.with("Neptune#ml.limit",10)
.V("cell_56500")
.in("user_live_cell").with("Neptune#ml.prediction").hasLabel("user")

<div style="background-color:#eeeeee; padding:20px; text-align:left; border-radius:10px; margin-top:10px; margin-bottom:10px; "><b>Experimentation3</b>: GNN is going to predict that user_1500 is connected to cell_63500</div>

In [None]:
%%gremlin
g.with("Neptune#ml.endpoint","${endpoint_transductive}")
.with("Neptune#ml.limit",5)
.V("user_1500")
.out("user_live_cell").with("Neptune#ml.prediction").with("Neptune#ml.filterExistingEdges").hasLabel("cell")

### end to end architecture for multi- scenario

<img src="attachment:f8b596d3-26cd-4127-aebb-132900d153db.png" alt="image.png" width="1000"/>

# Note on GNN evaluation 

- HITS@10 provides a measure of how often the model suggests the correct item within the top 10 recommendation

- While MR gives an overall indication of how close the correct item is to the top of the list on average.

- Sagemaker evaluate the model on the validation and test set

- Results on test set 

    * "HITS at top 1 (HITS@1)": 0.4010819758391616,
    * "HITS at top 10 (HITS@10)": 0.9438622262173598,
    * "HITS at top 3 (HITS@3)": 0.6301810418539388,
    * "mean rank (MR)": 3.719502285632852,
