<img src="img/GraphAISummitNEW.png" alt="Drawing" width="1000" height="100"/>

<h1>Intro to Recommendations with Tigergraph, Docker and Python<h1>

<p style="margin-left: 40px">Alex Infanzon & Bob Hardaway</p>
<h2>
<p style="margin-left: 60px">- Professional Sales Engineers, Ametuer Data Scientists</p>
<h3>
    TigerGraph is a graph database software with a multitude of functionality and 
    solutions to some of the issues that have plagued other graph databases. This 
    notebook demonstrates how to use basic commands to connect, create and load data 
    into TigerGraph using the Python pyTigerGraph module.

    In the next 40 minutes, we will introduce the PyTigerGraph python package and
    develop a simple recommendation engine running on a portable Docker container.
    Each and every component developed here can then be easily migrated to any
    large-scale production environment once the data scientist is confident in the
    results.
    
    In this way, TigerGraph can enable every data user in the enterprise to rapidly
    develop graph-based solution.
    
    The 6 steps to enable this solution are simple:
        1) Import python packages
        2) Connect to TigerGraph database
        3) Design a Schema
        4) Load Source Data
        5) Explore the Graph
        6) Write Queries (aka Ask Questions)

<h2> TigerGraph Architecture
<img src="img/Architecture_diagram.png" alt="Drawing" width="1000" height="100"/>

<h2>This tutorial is based on two components which making developing python based analytics solutions very easy to create and deploy. 

<h3><a href="https://docs.tigergraph.com/start/get-started/docker">TigerGraph Docker Image</a> - A packaged TigerGraph image is available to easily create an environment.

<h3><a href="https://github.com/pyTigerGraph/pyTigerGraph">pyTigerGraph Python Package</a> - Python package




## STEP 0: Setup Docker and pull the Tigergraph single node image

<h3> Docker is a container deployment tool that makes setup and running applications very easy. Once installed, this docker command access Docker hub and retrieve the latest pre-built image of Tigergraph.

$ docker run -d -p 14022:22 -p 9000:9000 -p 14240:14240 --name tigergraph --ulimit nofile=1000000:1000000 -v ~/data:/home/tigergraph/mydata -t docker.tigergraph.com/tigergraph:latest

Once running, you can immediately access the Studio UI at from your host computer at:
    
       http://localhost:14240
                

## STEP 1: Import Packages

Note: Assuming you have installed the pyTigerGraph package. If not install it using:
        
   ```# pip install -U pyTigerGraph pyTigerDriver```

In [None]:
from IPython.display import display
from PIL import Image

import pyTigerGraph as tg
import pandas as pd
import json

print(tg.__version__)

## STEP 2: Establishing the connection to a TigerGraph database

<div>
  <img style="vertical-align:top" src="img/connected-icon.png" width="30" height="30"/>
  <span style="">The functionality of pyTigerGraph is implemented by the TigerGraphConnection class. To establish the connection to the database you need to provide the hostname, username and password to access the database.</span>
</div>


<table>
  <tr>
    <th>Connect to localhost</th>
    <th>Connect to TG Cloud</th>
    <th>Connect to AWS EC2</th>
  </tr>
  <tr>
    <td>conn = tg.TigerGraphConnection(<br>host='http://localhost',<br> username="tigergraph",<br> password='tigergraph'<br>) </td>
    <td>conn = tg.TigerGraphConnection(<br>host='https://tgcloud.io/app/solutions',<br> graphname="test",<br> username=userName,<br> password=password,<br> apiToken=apiToken)<br> authToken = conn.getToken(secret)<br> )</td>
    <td>conn = tg.TigerGraphConnection(<br>host='https://ec2-52-44-226-118.compute-1.amazonaws.com/',<br> graphname="test",<br> username=userName,<br> password=password,<br> apiToken=apiToken)<br> authToken = conn.getToken(secret)<br> )</td>
   </tr>

</table>

<div> Tigergrpah supports multiple authentication protocols including simple and pw and token based. For this tutorial we are just using a pw.
</div>

In [None]:
conn = tg.TigerGraphConnection(
    host='http://localhost',
    username="tigergraph",
    password='tigergraph')

<div> In order to start from scratch, all existing elements can be deleted! This will delete existing graphs and elements. Execute the next cell ONLY if you would like to start the notebook lab from the beginning.</div>

In [None]:
print(conn.gsql('''DROP ALL''', options=[]))

## STEP 3: Design Schema

<div>
  <img style="vertical-align:top" src="img/graph_img.png" width="30" height="30"/>
  <span style="">Before data can be loaded into the graph store, the user must define a graph schema. A graph schema is a "dictionary" that defines the types of entities, vertices and edges, in the graph and how those types of entities are related to one another.</span>
</div>

### WARNING: DROP ALL - Will Delete everything in your graph!

Execute this cell if you would like to start the notebook lab from the beginning.

----
<img src="img/graph_sch.png" alt="Drawing" width="500" height="100"/>

The CREATE VERTEX statement defines a new global vertex type, with a name and an attribute list. 

The CREATE EDGE statement defines a new global edge type. There are two forms of the CREATE EDGE statement, one for directed edges and one for undirected edges.  Each edge type must specify that it connects FROM one vertex type TO another vertex type.

In [None]:
print(conn.gsql('''

CREATE VERTEX person (PRIMARY_ID Id STRING, id INT, gender STRING, name STRING, age INT, state STRING) 

CREATE UNDIRECTED EDGE friendship (FROM person, TO person, connect_day datetime)

CREATE GRAPH social (person, friendship)'''
                
, options=[]))

<h3>The GSQL command enable executing any GSQL statements against the database. Next cell show how list a catalog of schema elements created by the gsql command.


In [None]:
print(conn.gsql('''ls''', options=[]))

<h3> Specify the graph to be used (social)

In [None]:
conn.graphname = 'social'

## STEP 4: Load data

<div>
  <img style="vertical-align:top" src="img/load_data.png" width="30" height="30"/>
  <span style="">The pyTigerGraph submodule provides results from various built-in endpoints in a Pandas DataFrame. To load data upload the csv file to a dataframe inside the notebook. 
</span>
</div>

In [None]:
persons = pd.read_csv('data/people.csv')
persons

In [None]:
friendships = pd.read_csv('./data/friendships.csv')
friendships

<h3> Use the pandas dataframe to create and populate the person vertexes

In [None]:
v_person = conn.upsertVertexDataFrame(
      persons, "person", "name"
    , attributes={"id":"id", "name": "name", "gender": "gender", "age": "age", "state": "state"})
print(str(v_person) + " Customer VERTICES Upserted")

<h3> Create the friendship edges

In [None]:
v_friendships = conn.upsertEdgeDataFrame(friendships,"person", "friendship", "person", from_id="person1", to_id="person2", attributes={"connect_day":"date"})
print(str(v_friendships) + " Friendships Edges Upserted")

<h3> List the resulting elements

In [None]:
numPersons = conn.getVertexCount("person")
print(f"There are currently {numPersons} in of vertex type person")

In [None]:
numFriends = conn.getEdgeCount("friendship")
print(f"There are currently {numFriends} of edge type friendship")

## STEP 5: Explore Graph

### The Functions

The functions below are grouped by:

- Schema related functions - these functions can be used to get schema information or to load data into the graph
- Query related functions - these two functions are use to run or compile GSQL queries
- Vertex related functions - vertex related functions
- Edge related functions - edge related functions
- Token management - management
- Other functions - some miselaneous functions


| Schema related functions | Query related functions | Vertex related functions | Edge related functions | Token management | Other functions |
| :------------------------| :---------------------- | :----------------------- | :--------------------- | :--------------- | :-------------- |
| getSchema | runInstalledQuery | getVertexTypes | getEdgeTypes | getToken | echo |
| getUDTs | runInterpretedQuery | getVertexType | getEdgeType | refreshToken  | getEndpoints|
| getUDT | | getVertexCount| getEdgeCount|deleteToken | getStatistics |
| upsertData| |  upsertVertex|upsertEdge||getVersion |
| | | upsertVertices | upsertEdges||getVer |
| | | getVertices | getEdges||getLicenseInfo |
| | | getVerticesById | getEdgeStats|| |
| | | getVertexStats | delEdges|||
| | | delVertices| | |
| | | delVerticesById| 

<h3> We can not use the pyTigerGraph API directly to explore the elements (vertexes and edges) of the social graph directly.

In [None]:
print(conn.gsql('''ls''', options=[]))

In [None]:
conn.getVertexTypes()

In [None]:
print(conn.getVertexType('person'))

In [None]:
conn.getVertexStats('person')

In [None]:
conn.getEdgeTypes()

In [None]:
conn.getEdgeStats('friendship', skipNA=False)

## STEP 6: Write Queries

<div>
  <img style="vertical-align:top" src="img/query.png" width="28" height="28"/>
  <span style=""> Next we begin to explore the Graph to discovering key relationships and insights within the structure. We can use the pyTigerGraph APIs directly. 
</span>
</div>

<h3> Discover friends of Jenny

In [None]:
display(conn.getEdgesDataframe("person", "Jenny"))
img = Image.open("img/Explore_fig1.png")
newsize = (500, 300)
img = img.resize(newsize)
display(img)

In [None]:
def flatten(obj):
    output = []
    for e in obj:
        element = {}
        element["v_id"] = e["v_id"]
        element["v_type"] = e["v_type"]
        for k in e["attributes"]:
            element[k] = e["attributes"][k]
        output.append(element)
    return output

<h3> Execute any standard sql statement. Tigergraph supports most basic SQL functionality, so any analyst who can write simple queries can get started easily

In [None]:
rs = conn.gsql('''SELECT * FROM person LIMIT 4''')
display(pd.DataFrame.from_records(flatten(json.loads(rs))))

In [None]:
rs = conn.gsql('''SELECT * FROM person WHERE gender=="female"''')
display(pd.DataFrame.from_records(flatten(json.loads(rs))))

In [None]:
rs = conn.gsql('''select * from person where primary_id=="Tom"''')
display(pd.DataFrame.from_records(flatten(json.loads(rs))))

<h3>The Graph SQL capabilities are exposed thru the python library. Here we begin to explore relationships embedded in the graph<h3/>

In [None]:
rs=conn.getVertices('person', select='name,age,gender', where='gender=="female"')
display(pd.DataFrame.from_records(flatten(rs)))

<h3> Tigergraph supports 2 types of query execution:
        <h4>Interpretted - Adhoc with no pre-compilation
        <h4>Installed - Precompiled and optimized at compile time
<h3/>

In [None]:
conn.runInterpretedQuery('''
  INTERPRET QUERY () FOR GRAPH social {
    PRINT "Hello World"; 
}
''')

<h3> Using the GSQL syntax, we can query to find Tom's friends using a where claus

In [None]:
conn.runInterpretedQuery('''INTERPRET QUERY () FOR GRAPH social {
    users = {person.*};
    Result = SELECT p FROM users:u-(friendship)->:p WHERE u.name == "Tom";
  PRINT Result; 
}''')

<h3> And we can also parameterize the query and pass in a person name

In [None]:
conn.runInterpretedQuery('''
  INTERPRET QUERY x() FOR GRAPH social {
  # declaration statements
  STRING uid = "Tom";
  users = {person.*};
  # body statements
  friends = SELECT p
    FROM users:u-(friendship)->:p
    WHERE u.name == uid;
  PRINT friends; 
}
''')

<h3>And, we can create and compile queries on the server to improve performance

In [None]:
conn.gsql('''
    CREATE QUERY getFriends(STRING uid) FOR GRAPH social {
  users = {person.*};
  # body statements
  friends = SELECT p
    FROM users:u-(friendship)->:p
    WHERE u.name == uid;
  PRINT friends; 
}
''')

<h3>Next, we install the query - NOTE: this fails on local docker

In [None]:
conn.gsql('''INSTALL QUERY getFriends''')

In [None]:
conn.runInterpretedQuery('''
  INTERPRET QUERY () FOR GRAPH social {
    person1 = {person.*};
    Result = SELECT tgt
           FROM person1:s-(friendship:e)-person:tgt;
    PRINT Result; 
}
''')

In [None]:
sourceVertexType='person'
sourceVertexId='Dan'
conn.getEdges(sourceVertexType, sourceVertexId, edgeType=None, targetVertexType=None, targetVertexId=None, select="", where="", limit="", sort="", timeout=0)

In [None]:
conn.getEdges('person', 'Jenny'
              , edgeType='friendship'
              , targetVertexType='person'
              , targetVertexId=None, select="connect_day", where="", limit="", sort="", timeout=0)

<h2> Recommender Framework in 40min
<h3>   Using Docker, Python, pyTigerGraph and TigerGraph
<h4>    Within just an hour, we are able to build a framework for developing large-scale analytics solutions for the enterprise.

<img src="img/pyTGSolution.png" alt="Drawing" width="1000" height="100"/>

<h3> For more information, visit <a href="http://tigergraph.com">Tigergraph</a>  