In [None]:


<img src="img/GraphAI.png" alt="Drawing" width="800" height="100"/>

# GraphAI Summit - San Francisco Oct 5th, 2021
##   TigerGraph 101
###      Alex Infanzon & Bob Hardaway
####        Professional Sales Engineers, Ametuer Data Scientists

Intro to Recommendations with Tigergraph, Docker and Python

### In the next 40 minutes, we will introduce the PyTigerGraph python package and 
### develop a simple recommendation engine running on a portable Docker container.


<img src="img/headerImg.jpeg" alt="Drawing" width="1000" height="100"/>

# Create Your First Graph
## Recommendations on large scale data in eight simple steps

TigerGraph is a graph database software with a multitude of functionality and solutions to many of the critical challenges facing enterprises today. This notebook demonstrates how any python user (analyst to scientist) can quickly connect to a Tiger server, use simple and powerful python contructs like the dataframe and machine learning algorythms and produce high quality data models in minutes not months.

Note: The pyTigerGraph packageis available via the pip package manager. It is installed simply using:
```pip install pyTigerGraph```

## STEP 1: Import Packages

The solution is based on some basic and powerful packages, including pyTigerGraph, pandas (a very common data management package) and json.

Note: Assuming you have installed the pyTigerGraph package. If not install it using:
        
```pip install pyTigerGraph```

In [17]:
import pyTigerGraph as tg
import pandas as pd
import json

## STEP 2: Establishing the connection to a TigerGraph database

<div>
  <img style="vertical-align:top" src="img/connected-icon.png" width="30" height="30"/>
The functionality of pyTigerGraph is implemented by the TigerGraphConnection class. To establish the connection to the database you need to provide the hostname, username and password to access the database.

Note this connection is to a Docker container running locally and exposing the default tigergraph UI port (14240)</span>
<\div>


In [2]:
conn = tg.TigerGraphConnection(
    host='http://localhost',
    username="tigergraph",
    password='tigergraph')

## STEP 3: Define a Graph Schema

<div>
  <img style="vertical-align:top" src="img/graph_img.png" width="30" height="30"/>
  <span style="">The first step is to define a graph schema. A graph schema is a "dictionary" that defines the types of entities, vertices and edges, in the graph and how those types of entities are related to one another.</span>
</div>

### First, we will drop any existing entities to keep things simple!

Execute this cell if you would like to start the notebook lab from the beginning.

In [3]:
print(conn.gsql('''DROP ALL''', options=[]))

Dropping all, about 1 minute ...
Abort all active loading jobs
Try to abort all loading jobs on graph ldbc_snb, it may take a while ...
[ABORT_SUCCESS] No active Loading Job to abort.
Resetting GPE...
Successfully reset GPE
Stopping GPE GSE
Successfully stopped GPE GSE in 7.134 seconds
Clearing graph store...
Successfully cleared graph store
Starting GPE GSE RESTPP
Successfully started GPE GSE RESTPP in 0.096 seconds
Everything is dropped.


<div>
  <img style="vertical-align:top" src="img/graph_img.png" width="30" height="30"/>
  <span style="">The CREATE VERTEX statement defines a new global vertex type, with a name and an attribute list. The CREATE EDGE statement defines a new global edge type. There are two forms of the CREATE EDGE statement, one for directed edges and one for undirected edges.  Each edge type must specify that it connects FROM one vertex type TO another vertex type.</span>
</div>

In [4]:
print(conn.gsql('''
CREATE VERTEX person (PRIMARY_ID name STRING, gender STRING, name STRING, age INT, state STRING)                     
CREATE UNDIRECTED EDGE friendship (FROM person, TO person, connect_day datetime)
CREATE GRAPH social (person, friendship)
''', options=[]))

The vertex type person is created.
The edge type friendship is created.
Stopping GPE GSE RESTPP
Successfully stopped GPE GSE RESTPP in 0.543 seconds
Starting GPE GSE RESTPP
Successfully started GPE GSE RESTPP in 0.063 seconds
The graph social is created.


### Once this gsql is executed, you will see the vertex/edges created along with the graph in Studio.

<img src="img/SocialSchema.png" alt="Drawing" width="800" height="200"/>


The gsql command 'ls' provides a complete listing of all existing entities, graphs and jobs

In [20]:
conn.graphname = 'social'
print(conn.gsql('''ls''', options=[]))

---- Global vertices, edges, and all graphs
Vertex Types:
- VERTEX person(PRIMARY_ID name STRING, gender STRING, name STRING, age INT, state STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE"
Edge Types:
- UNDIRECTED EDGE friendship(FROM person, TO person, connect_day DATETIME)

Graphs:
- Graph social(person:v, friendship:e)
Jobs:


JSON API version: v2
Syntax version: v1



## STEP 4: Load data

<div>
  <img style="vertical-align:top" src="img/load_data.png" width="30" height="30"/>
  
  <span style="">In Pandas, the dataframe is a fundemental construct for managing and working with table data. We can load data from local file very easily. 
</span>
</div>

In [32]:
persons = pd.read_csv('data/people.csv')
persons

Unnamed: 0,id,name,gender,age,state
0,1,Tom,male,40,english
1,2,Dan,male,34,russian
2,3,Jenny,female,25,english
3,4,Kevin,male,28,dutch
4,5,Emily,female,22,spanish
5,6,Nancy,female,20,spanish
6,7,Jack,male,26,english
7,8,Bob,male,52,english
8,9,Alex,male,52,spanish
9,10,Margie,female,53,english


In [7]:
friendships = pd.read_csv('./data/friendships.csv')
friendships

Unnamed: 0,person1,person2,date
0,Tom,Dan,2017-06-03
1,Tom,Jenny,2015-01-01
2,Dan,Jenny,2016-08-03
3,Jenny,Amily,2015-06-08
4,Dan,Nancy,2016-01-03
5,Nancy,Jack,2017-03-02
6,Dan,Kevin,2015-12-30
7,Bob,Margie,1998-10-22
8,Lacy,Bob,2004-01-21
9,Margie,Lacy,1992-12-02


## STEP 5 - Map DataFrame to Vertex and Edge types

Next, we map the data frames to create and populate Tigergraph Vertexes and Edges. The pyTigerGraph packages makes this very easy.

In [33]:
v_person = conn.upsertVertexDataFrame(
      persons, "person", "name"
    , attributes={"name": "name", "gender": "gender", "age": "age", "state": "state"})
print(str(v_person) + " Customer VERTICES Upserted")

11 Customer VERTICES Upserted


In [34]:
numPersons = conn.getVertexCount("person")
print(f"There are currently {numPersons} in of vertex type person, prior to map")

There are currently 12 in of vertex type person, prior to map


In [35]:
v_friendships = conn.upsertEdgeDataFrame(friendships,"person", "friendship", "person", from_id="person1", to_id="person2", attributes={"connect_day":"date"})
print(str(v_friendships) + " Friendships Edges Upserted")

10 Friendships Edges Upserted


## STEP 6: Explore Graph

<div>
  <img style="vertical-align:top" src="img/inquiry.jpeg" width="28" height="28"/>
  <span style="">With the basics of the graph created, we now begin to explore using the pyTigerGraph package and GSQL. First we can lookup and get some stats on the Person vertex and friendships edge
</span>
</div>

In [28]:
conn.getVertexTypes()

['person']

In [27]:
conn.getVertexStats('person')

{'person': {'age': {'MAX': 53, 'MIN': 20, 'AVG': 34.54545}}}

In [29]:
conn.getEdgeTypes()

['friendship']

In [30]:
conn.getEdgeStats('friendship', skipNA=False)

{'friendship': {'connect_day': {'MAX': 1496448000,
   'MIN': 723254400,
   'AVG': 1291896000}}}

### Next a simple call to find friendships for the person Jenny. We can see Jenny has 3 friends, Emily, Tom and Dan.

In [36]:
conn.getEdges('person', 'Jenny'
              , edgeType='friendship'
              , targetVertexType='person'
              , targetVertexId=None, select="connect_day", where="", limit="", sort="", timeout=0)

[{'e_type': 'friendship',
  'directed': False,
  'from_id': 'Jenny',
  'from_type': 'person',
  'to_id': 'Amily',
  'to_type': 'person',
  'attributes': {'connect_day': '2015-06-08 00:00:00'}},
 {'e_type': 'friendship',
  'directed': False,
  'from_id': 'Jenny',
  'from_type': 'person',
  'to_id': 'Tom',
  'to_type': 'person',
  'attributes': {'connect_day': '2015-01-01 00:00:00'}},
 {'e_type': 'friendship',
  'directed': False,
  'from_id': 'Jenny',
  'from_type': 'person',
  'to_id': 'Dan',
  'to_type': 'person',
  'attributes': {'connect_day': '2016-08-03 00:00:00'}}]

### This network for friends of Jenny looks like this in Studio

<br>
<div>
  <img style="vertical-align:top" src="img/JennyVertex.png" width="400" height="400"/>
</div>

### We can also directly add results into a Pandas data frame, such as created a frame containing Dan's friends

In [37]:
#getEdgesDataframe(sourceVertexType, sourceVerticies, edgeType=None, targetVertexType=None, targetVertexId=None, select="", where="", limit="", sort="", timeout=0)

df=conn.getEdgesDataframe("person", "Dan")
df

Unnamed: 0,from_type,from_id,to_type,to_id,connect_day
0,person,Dan,person,Tom,2017-06-03 00:00:00
1,person,Dan,person,Kevin,2015-12-30 00:00:00
2,person,Dan,person,Jenny,2016-08-03 00:00:00
3,person,Dan,person,Nancy,2016-01-03 00:00:00


## STEP 7: Execute GSQL Queires

<div>
  <img style="vertical-align:top" src="img/inquiry.jpeg" width="28" height="28"/>
  <span style="">Use the gsql api to write some basic queries. 
</span>
</div>

### For example, select vertex attributes for 3 people

In [50]:
resultSet1 = conn.gsql('''use graph social 
   SELECT * FROM person LIMIT 3''')
resultSet1

'Using graph \'social\'\n[\n{\n"v_id": "Nancy",\n"attributes": {\n"gender": "female",\n"name": "Nancy",\n"state": "spanish",\n"age": 20\n},\n"v_type": "person"\n},\n{\n"v_id": "Emily",\n"attributes": {\n"gender": "female",\n"name": "Emily",\n"state": "spanish",\n"age": 22\n},\n"v_type": "person"\n},\n{\n"v_id": "Jenny",\n"attributes": {\n"gender": "female",\n"name": "Jenny",\n"state": "english",\n"age": 25\n},\n"v_type": "person"\n}\n]'

### And use the JSON package to prettify/format the result

In [51]:
import json
rsj = '[\n{\n"v_id": "Nancy",\n"attributes": {\n"gender": "female",\n"name": "Nancy",\n"state": "spanish",\n"age": 20\n},\n"v_type": "person"\n},\n{\n"v_id": "Dave",\n"attributes": {\n"gender": "male",\n"name": "Dave",\n"state": "wa",\n"age": 54\n},\n"v_type": "person"\n},\n{\n"v_id": "Jenny",\n"attributes": {\n"gender": "female",\n"name": "Jenny",\n"state": "english",\n"age": 25\n},\n"v_type": "person"\n}\n]'

import re
rsj2 = re.split('\[]', rsj)
json.loads(rsj2[0])


[{'v_id': 'Nancy',
  'attributes': {'gender': 'female',
   'name': 'Nancy',
   'state': 'spanish',
   'age': 20},
  'v_type': 'person'},
 {'v_id': 'Dave',
  'attributes': {'gender': 'male', 'name': 'Dave', 'state': 'wa', 'age': 54},
  'v_type': 'person'},
 {'v_id': 'Jenny',
  'attributes': {'gender': 'female',
   'name': 'Jenny',
   'state': 'english',
   'age': 25},
  'v_type': 'person'}]

In [52]:
results = conn.gsql('''USE GRAPH social

INTERPRET QUERY () SYNTAX v2 {
   #1-hop pattern.
   friends = SELECT p
             FROM person:s -(friendship:f)- person:p
             WHERE s.name == "Bob"
             ORDER BY p.age ASC
             LIMIT 3;

    PRINT  friends[friends.name, friends.gender, friends.age];
}''')

results


'Using graph \'social\'\n{\n"error": false,\n"message": "",\n"version": {\n"schema": 0,\n"edition": "enterprise",\n"api": "v2"\n},\n"results": [{"friends": [\n{\n"v_id": "Lacy",\n"attributes": {\n"friends.name": "Lacy",\n"friends.gender": "female",\n"friends.age": 28\n},\n"v_type": "person"\n},\n{\n"v_id": "Margie",\n"attributes": {\n"friends.name": "Margie",\n"friends.gender": "female",\n"friends.age": 53\n},\n"v_type": "person"\n}\n]}]\n}'

In [42]:
results = conn.gsql('''use graph social
     SELECT * from person:p-(friendship:f)-person:t LIMIT 5''')
results

ExceptionCodeRet: 211

In [47]:
results = conn.gsql('''use graph social
                        SELECT * FROM person WHERE gender=="female"''')
results
#df = pd.DataFrame(results[0]['attributes'])
#df
#print(json.dumps(results, indent=2))

'Using graph \'social\'\n[\n{\n"v_id": "Nancy",\n"attributes": {\n"gender": "female",\n"name": "Nancy",\n"state": "spanish",\n"age": 20\n},\n"v_type": "person"\n},\n{\n"v_id": "Emily",\n"attributes": {\n"gender": "female",\n"name": "Emily",\n"state": "spanish",\n"age": 22\n},\n"v_type": "person"\n},\n{\n"v_id": "Jenny",\n"attributes": {\n"gender": "female",\n"name": "Jenny",\n"state": "english",\n"age": 25\n},\n"v_type": "person"\n},\n{\n"v_id": "Margie",\n"attributes": {\n"gender": "female",\n"name": "Margie",\n"state": "english",\n"age": 53\n},\n"v_type": "person"\n},\n{\n"v_id": "Lacy",\n"attributes": {\n"gender": "female",\n"name": "Lacy",\n"state": "spanish",\n"age": 28\n},\n"v_type": "person"\n},\n{\n"v_id": "Amily",\n"attributes": {\n"gender": "female",\n"name": "Amily",\n"state": "spanish",\n"age": 22\n},\n"v_type": "person"\n}\n]'

In [48]:
res=conn.getVertices('person', select='name,age,gender', where='gender=="female"')
res

[{'v_id': 'Nancy',
  'v_type': 'person',
  'attributes': {'name': 'Nancy', 'age': 20, 'gender': 'female'}},
 {'v_id': 'Emily',
  'v_type': 'person',
  'attributes': {'name': 'Emily', 'age': 22, 'gender': 'female'}},
 {'v_id': 'Jenny',
  'v_type': 'person',
  'attributes': {'name': 'Jenny', 'age': 25, 'gender': 'female'}},
 {'v_id': 'Margie',
  'v_type': 'person',
  'attributes': {'name': 'Margie', 'age': 53, 'gender': 'female'}},
 {'v_id': 'Lacy',
  'v_type': 'person',
  'attributes': {'name': 'Lacy', 'age': 28, 'gender': 'female'}},
 {'v_id': 'Amily',
  'v_type': 'person',
  'attributes': {'name': 'Amily', 'age': 22, 'gender': 'female'}}]

In [49]:
type(res)

list

In [29]:
attrs=[x['attributes'] for x in res]
attrs

[{'name': 'Nancy', 'age': 20, 'gender': 'female'},
 {'name': 'Jenny', 'age': 25, 'gender': 'female'},
 {'name': 'Margie', 'age': 53, 'gender': 'female'},
 {'name': 'Lacy', 'age': 28, 'gender': 'female'},
 {'name': 'Amily', 'age': 22, 'gender': 'female'}]

In [30]:
type(attrs)

list

In [31]:
attrs=res[-2]['attributes']
attrs

{'name': 'Lacy', 'age': 28, 'gender': 'female'}

In [32]:
df = pd.DataFrame(res)
df

Unnamed: 0,v_id,v_type,attributes
0,Nancy,person,"{'name': 'Nancy', 'age': 20, 'gender': 'female'}"
1,Jenny,person,"{'name': 'Jenny', 'age': 25, 'gender': 'female'}"
2,Margie,person,"{'name': 'Margie', 'age': 53, 'gender': 'female'}"
3,Lacy,person,"{'name': 'Lacy', 'age': 28, 'gender': 'female'}"
4,Amily,person,"{'name': 'Amily', 'age': 22, 'gender': 'female'}"


In [33]:
sourceVertexType='person'
sourceVertexId='Dan'
conn.getEdges(sourceVertexType, sourceVertexId, edgeType=None, targetVertexType=None, targetVertexId=None, select="", where="", limit="", sort="", timeout=0)

[{'e_type': 'friendship',
  'directed': False,
  'from_id': 'Dan',
  'from_type': 'person',
  'to_id': 'Tom',
  'to_type': 'person',
  'attributes': {'connect_day': '2017-06-03 00:00:00'}},
 {'e_type': 'friendship',
  'directed': False,
  'from_id': 'Dan',
  'from_type': 'person',
  'to_id': 'Kevin',
  'to_type': 'person',
  'attributes': {'connect_day': '2015-12-30 00:00:00'}},
 {'e_type': 'friendship',
  'directed': False,
  'from_id': 'Dan',
  'from_type': 'person',
  'to_id': 'Jenny',
  'to_type': 'person',
  'attributes': {'connect_day': '2016-08-03 00:00:00'}},
 {'e_type': 'friendship',
  'directed': False,
  'from_id': 'Dan',
  'from_type': 'person',
  'to_id': 'Nancy',
  'to_type': 'person',
  'attributes': {'connect_day': '2016-01-03 00:00:00'}}]

## STEP 8: Develop Hop queries

<div>
  <img style="vertical-align:top" src="img/query.png" width="28" height="28"/>
  <span style="">As a final step, we use the advanced analytics means with Tiger to begin exploring relationships and developing the recommender. 
</span>
</div>

In [34]:
conn.gsql('''use graph social
                 select * from person where primary_id=="Tom"''')

'Using graph \'social\'\n[{\n"v_id": "Tom",\n"attributes": {\n"gender": "male",\n"name": "Tom",\n"state": "english",\n"age": 40\n},\n"v_type": "person"\n}]'

### We create an adhoc query to explore Tom's network

In [35]:
conn.runInterpretedQuery('''
  INTERPRET QUERY x() FOR GRAPH social {
  # declaration statements
  STRING uid = "Tom";
  users = {person.*};
  # body statements
  posts = SELECT p
    FROM users:u-(friendship)->:p
    WHERE u.name == uid;
  PRINT posts; 
}
''')

[{'posts': [{'v_id': 'Dan',
    'v_type': 'person',
    'attributes': {'gender': 'male',
     'name': 'Dan',
     'age': 34,
     'state': 'russian'}},
   {'v_id': 'Jenny',
    'v_type': 'person',
    'attributes': {'gender': 'female',
     'name': 'Jenny',
     'age': 25,
     'state': 'english'}}]}]

In [36]:
conn.runInterpretedQuery('''
  INTERPRET QUERY () FOR GRAPH social {
    PRINT "Hello World"; 
}
''')

[{'"Hello World"': 'Hello World'}]

In [37]:
conn.runInterpretedQuery('''
  INTERPRET QUERY () FOR GRAPH social {
    person1 = {person.*};
    Result = SELECT tgt
           FROM person1:s-(friendship:e)-person:tgt;
    PRINT Result; 
}
''')

[{'Result': [{'v_id': 'Amily',
    'v_type': 'person',
    'attributes': {'gender': 'female',
     'name': 'Amily',
     'age': 22,
     'state': 'spanish'}},
   {'v_id': 'Bob',
    'v_type': 'person',
    'attributes': {'gender': 'male',
     'name': 'Bob',
     'age': 52,
     'state': 'english'}},
   {'v_id': 'Lacy',
    'v_type': 'person',
    'attributes': {'gender': 'female',
     'name': 'Lacy',
     'age': 28,
     'state': 'spanish'}},
   {'v_id': 'Tom',
    'v_type': 'person',
    'attributes': {'gender': 'male',
     'name': 'Tom',
     'age': 40,
     'state': 'english'}},
   {'v_id': 'Kevin',
    'v_type': 'person',
    'attributes': {'gender': 'male',
     'name': 'Kevin',
     'age': 28,
     'state': 'dutch'}},
   {'v_id': 'Dan',
    'v_type': 'person',
    'attributes': {'gender': 'male',
     'name': 'Dan',
     'age': 34,
     'state': 'russian'}},
   {'v_id': 'Margie',
    'v_type': 'person',
    'attributes': {'gender': 'female',
     'name': 'Margie',
     'age': 