# TigerGraph AMLSim Demo

To Get Started you will need an instance of TigerGraph running. The fastest way to get a box running is using **https://tgcloud.io**. 

If it's your first time using the cloud portal checkout [**Getting Started with TigerGraph 3.0**](https://www.tigergraph.com/blog/getting-started-with-tigergraph-3-0/)

## Installing AMLSim Graph

To use the graph solution shown in this demo you will want to **import an exsisting solution** which is located inside this repository called `AMLSim_3_0_6.gz`. Once the solution is imported add the data files to the cloud server by following the Getting Started with TigerGraph 3.0 blog. The last step is to install the default queries that came with the starter kit. These instruction are also in the blog mentioned above.

## About AMLSim
The AMLSim project is intended to provide a multi-agent based simulator that generates synthetic banking transaction data together with a set of known money laundering patterns - mainly for the purpose of testing machine learning models and graph algorithms.

**AMLSim Official GitHub**
https://github.com/IBM/AMLSim

**AMLSim WIKI** https://github.com/IBM/AMLSim/wiki

**AMLSim Data** https://github.com/IBM/AMLSim/wiki/Download-Example-Data-Set

## Exploring the data

Let's take a quick look at the data that we will be using for this lab. To do this we can simply convert our csv files that come with this lab and turn them into dataframes.

In [None]:
import pandas as pd
df1 = pd.read_csv('/home/ec2-user/SageMaker/AMLSim_Python_Lab/data/accounts.csv')
df2 = pd.read_csv('/home/ec2-user/SageMaker/AMLSim_Python_Lab/data/alerts.csv')
df3 = pd.read_csv('/home/ec2-user/SageMaker/AMLSim_Python_Lab/data/transactions.csv')

The first file has information about Customers and Accounts.

In [None]:
df1.head()

The second file includes information regarding the labeled fraud transactions. 

In [None]:
df2.head()

Our last file includes all the transactions with amounts, timestamps, sender, receiver, and ids.

In [None]:
df3.head()

### Installing and Importing packages

For this lab the critical package we will need is the `pyTigerGraph` package which is community built python connector. If you would like to use the `beta` version of the package that is acceptable to. The beta version includes the latest features that are being developed.

In [None]:
!pip install pyTigerGraphBeta
!pip install flat-table

import pyTigerGraphBeta as tg
import flat_table
import IPython.display as disp

After we have our packages we will want to provide a few details on our box. You will need to fill in the parameters below according to your provisioned solution. If your unfamiliar with how to generate a secret check out this short [blog](https://towardsdatascience.com/generating-a-secret-in-tigergraph-e5139d52dff6) on a step by step walkthrough.

In [None]:
conn = tg.TigerGraphConnection(host="https://aml-sim-sagemaker.i.tgcloud.io", username="tigergraph", password="tigergraph", graphname = "AMLSim")
conn.apiToken = conn.getToken("0dqle85rg436lg25qabtki0lqenunfvj")

Let's conduct a simple test to see if everything works. We will make a call and fetch the endpoints that are available for us to use.

In [None]:
results = conn.getEndpoints()
disp.JSON(results)

Let's call a few of those endpoints that are included in this package. Now remember every time you create and install a GQSL query, that query get's compiled and exposed as a rest service making it VERY easy to interact with TigerGraph. Let's test it out here.

In [None]:
info = conn.runInstalledQuery("accountInfo", {"limit_x":"2"})
disp.JSON(info)

Take note that it looks like we are missing details about users. That is because we haven't generated those attributes yet. We can simply generate those new features by running the `accountActivity` query that was provided.

In [None]:
print(conn.runInstalledQuery("accountActivity", {}))

Let's re-run the query and see the features that were generated. Perfect. We have everything except a few attributes called `label` and `pagerank`

In [None]:
info = conn.runInstalledQuery("accountInfo", {"limit_x":"2"})
disp.JSON(info)

## Using Graph Structures to Generate new Features using Algorithms
#### PageRank
**Description**

The PageRank algorithm measures the influence of each vertex on every other vertex. PageRank influence is defined recursively: a vertex's influence is based on the influence of the vertices which refer to it. A vertex's influence tends to increase if (1) it has more referring vertices or if (2) its referring vertices have higher influence. The analogy to social influence is clear.

[Full detailed explanation](https://docs.tigergraph.com/tigergraph-platform-overview/graph-algorithm-library#pagerank)

#### Label Propagation
**Description**

Label Propagation is a heuristic method for determining communities.  The idea is simple: If the plurality of your neighbors all bear the label X, then you should label yourself as also a member of X. The algorithm begins with each vertex having its own unique label. Then we iteratively update labels based on the neighbor influence described above. It is important that the order for updating the vertices be random. The algorithm is favored for its efficiency and simplicity, but it is not guaranteed to produce the same results every time.
In a variant version, some vertices could initially be known to belong to the same community. If they are well-connected to one another, they are likely to preserve their common membership and influence their neighbors,

[Full detailed explanation](https://docs.tigergraph.com/tigergraph-platform-overview/graph-algorithm-library#label-propagation)

### Installing Queries

To load these Algorithms into our TigerGraph Cloud instance we will use the Python Extension called [tgcloud-jupyter 0.9.6](https://pypi.org/project/tgcloud-jupyter/). This extension can be added to your JuypterLab by running `pip install tgcloud-jupyter`. If you would like to manually create these queries you can fetch the queries from [https://github.com/tigergraph/gsql-graph-algorithms](https://github.com/tigergraph/gsql-graph-algorithms). 

If the extension isn't showing after running the install you may need to execute this command `jlpm && jlpm build && jupyter-labextension link .` in the extension directory to rebuild extenstions. You extention should be located in `/home/ec2-user/anaconda3/envs/JupyterSystemEnv/share/jupyter/labextensions/tgcloud-jupyter`


### Running Queries

Once those queries are installed let's execute them to generate those new features.

In [None]:
print(conn.runInstalledQuery("pageRank", {"v_type":"Account", "e_type":"Send_To", "max_change":"0.001", "max_iter":"25", "damping":"0.85", "top_k":"10", "print_accum":"TRUE", "result_attr":"pagerank", "file_path":"", "display_edges":"FALSE"}))

In [None]:
print(label_prop = conn.runInstalledQuery("label_prop", {"v_type":"Account", "e_type":"Send_To", "max_iter":"10", "output_limit":"10", "print_accum":"TRUE", "file_path":"", "attr":"label"}))

Let's take a look at some account information to see if everything looks like it was generated properly.

In [None]:
info = conn.runInstalledQuery("accountInfo", {"limit_x":"2"})
disp.JSON(info)

## Extract Features
Now that we've generated a few interesting features let's grab those transactions to use for training a fraud model. 

In [None]:
tx_hop = conn.runInstalledQuery("txMultiHopLimit", {}, timeout="10000000000000000", sizeLimit="1500000000")

Let's convert that JSON into a dataframe to see what it looks like.

In [None]:
df_tx_hop = pd.DataFrame(tx_hop[0]["@@txRecords"])
df_tx_hop = flat_table.normalize(df_tx_hop)
df_tx_hop.head()

Next let's save a that data into a tmp folder so we can send it to an S3 bucket.

In [None]:
df_tx_hop.to_csv(r'./tmp/20210412AMLsim.csv', index = False, header=True)

# SageMaker Model Generation 

During this part of the lab we will be working with a few of the AWS services. To make use these services you must have an execution role that can be used with this notebook session. 

First let's grab a few SageMaker packages.

In [None]:
import sagemaker
from sagemaker import get_execution_role

Next let's assign this session to a parameter.

In [None]:
sagemaker_session = sagemaker.Session()

Also, let's assign our execution role to the parameter role.

In [None]:
# Get a SageMaker-compatible role used by this Notebook Instance.
role = get_execution_role()

Test to see if it was assigned.

In [None]:
role

## Upload Data to S3 for Training

Next we will be uploading our data to S3. Don’t worry this is extremely simple. All you need to do is use the `upload_data` function.

In [None]:
train_input = sagemaker_session.upload_data("tmp")

Test to see if it was uploaded.

In [None]:
train_input

## Create SageMaker Scikit Estimator

Next we will pass our `fraud-prediction.py` script which includes our training and model generation information. Let's explore that script by navigating to the `fraud-prediction.py`.

Once we have our script ready we will assign that script as our `entry_point`. You will also want to fill out a few more parameters below depending on what your requirements are needed. 

This information includes the `role` and `session` parameters that we generated above.


In [None]:
from sagemaker.sklearn.estimator import SKLearn

script_path = 'fraud-prediction.py'

sklearn = SKLearn(
    entry_point=script_path,
    instance_type="ml.m4.xlarge",
    framework_version="0.20.0",
    py_version="py3",
    role=role,
    sagemaker_session=sagemaker_session)

## Train SKLearn Estimator on AML Data

Once we have everything setup let's kick off the training process. This will create an instance based of the parameters you provided above. It will then train a model and save that model for you.

In [None]:
sklearn.fit({'train': train_input})

## Deploy the model 

On the last step. Once you have a model generated let's now deploy that model in AWS. Simply use the `deploy` command and pass information about the instance you would like to deploy the model on. 

In [None]:
deployment = sklearn.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

If all went well you will now see your endpoint that was generated and can now use it for predictions. 

In [None]:
deployment.endpoint

# Congratulations!

You've now not only graph based features using algorithiums, but you've extracted those features, created an S3 bucket, deployed a training server, created a machine learning model using the training server, and lastly deployed that model into an endpoint!!

#### Where to go if you would like help with this tutorial?

[TigerGraphs Community Forum](https://community.tigergraph.com/) simply ask questions by creating a new topic or replying to an existing topic.

[TigerGraph's Developer Chart](https://discord.gg/F2c9b9v) talk with other TigerGraph developers across the world.