# <img src="img/headerImg.jpeg" alt="Drawing" width="1000" height="100"/>

# Create a Recommendation Solution in hours on Tigergraph
## TigerGraph 201

TigerGraph is a graph database software with a multitude of functionality and solutions to some of the issues that have plagued other graph databases. This notebook demonstrates how to use basic commands to connect, create and load data into TigerGraph using the Python pyTigerGraph module.

This tutorial will take you throught the steps to develop a recommendation engine in literally a few hours.

## STEP 1: Import Packages

Note: Assuming you have installed the pyTigerGraph package. If not install it using:
```pip install pyTigerGraph```

<img src="img/headerImg.jpeg" alt="Drawing" width="1000" height="100"/>

# Create a Recommendation Graph based on the LDBC benchmark dataset
## TigerGraph 201 - Recoomendation Engine

## STEP 1: Import Packages

Note: Assuming you have installed the pyTigerGraph package. If not install it using:
```pip install pyTigerGraph```

In [1]:
import pyTigerGraph as tg
import pandas as pd
import json

## STEP 2: Establishing the connection to a TigerGraph database

<div>
  <img style="vertical-align:top" src="img/connected-icon.png" width="30" height="30"/>
  <span style="">The functionality of pyTigerGraph is implemented by the TigerGraphConnection class. To establish the connection to the database you need to provide the hostname, username and password to access the database.</span>
</div>

In [2]:
conn = tg.TigerGraphConnection(
    host='http://localhost',
    username="tigergraph",
    password='tigergraph')

## STEP 3: Define a Graph Schema

<div>
  <img style="vertical-align:top" src="img/graph_img.png" width="30" height="30"/>
  <span style="">Before data can be loaded into the graph store, the user must define a graph schema. A graph schema is a "dictionary" that defines the types of entities, vertices and edges, in the graph and how those types of entities are related to one another.</span>
</div>

### WARNING: DROP ALL - Will Delete everything in your graph!

Execute this cell if you would like to start the notebook lab from the beginning.

In [None]:
print(conn.gsql('''DROP ALL''', options=[]))

In [None]:
----
The CREATE VERTEX statement defines a new global vertex type, with a name and an attribute list. 

The CREATE EDGE statement defines a new global edge type. There are two forms of the CREATE EDGE statement, one for directed edges and one for undirected edges.  Each edge type must specify that it connects FROM one vertex type TO another vertex type.

In [3]:
print(conn.gsql('''
CREATE VERTEX Comment (PRIMARY_ID id UINT, creationDate DATETIME, locationIP STRING, browserUsed STRING, content STRING, length UINT) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Post (PRIMARY_ID id UINT, imageFile STRING, creationDate DATETIME, locationIP STRING, browserUsed STRING, lang STRING, content STRING, length UINT) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Company (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX University (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX City (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Country (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Continent (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Forum (PRIMARY_ID id UINT, title STRING, creationDate DATETIME) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Person (PRIMARY_ID id UINT, firstName STRING, lastName STRING, gender STRING, birthday DATETIME, creationDate DATETIME, locationIP STRING, browserUsed STRING, speaks set<STRING>, email set<STRING>) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX Tag (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"
CREATE VERTEX TagClass (PRIMARY_ID id UINT, name STRING, url STRING) WITH primary_id_as_attribute="TRUE"

CREATE DIRECTED EDGE CONTAINER_OF (FROM Forum, TO Post) WITH REVERSE_EDGE="CONTAINER_OF_REVERSE"
CREATE DIRECTED EDGE HAS_CREATOR (FROM Comment|Post, TO Person) WITH REVERSE_EDGE="HAS_CREATOR_REVERSE"
CREATE DIRECTED EDGE HAS_INTEREST (FROM Person, TO Tag) WITH REVERSE_EDGE="HAS_INTEREST_REVERSE"
CREATE DIRECTED EDGE HAS_MEMBER (FROM Forum, TO Person, joinDate DATETIME) WITH REVERSE_EDGE="HAS_MEMBER_REVERSE"
CREATE DIRECTED EDGE HAS_MODERATOR (FROM Forum, TO Person) WITH REVERSE_EDGE="HAS_MODERATOR_REVERSE"
CREATE DIRECTED EDGE HAS_TAG (FROM Comment|Post|Forum, TO Tag) WITH REVERSE_EDGE="HAS_TAG_REVERSE"
CREATE DIRECTED EDGE HAS_TYPE (FROM Tag, TO TagClass) WITH REVERSE_EDGE="HAS_TYPE_REVERSE"
CREATE DIRECTED EDGE IS_LOCATED_IN (FROM Comment, TO Country | FROM Post, TO Country | FROM Company, TO Country | FROM Person, TO City | FROM University, TO City) WITH REVERSE_EDGE="IS_LOCATED_IN_REVERSE"
CREATE DIRECTED EDGE IS_PART_OF (FROM City, TO Country | FROM Country, TO Continent) WITH REVERSE_EDGE="IS_PART_OF_REVERSE"
CREATE DIRECTED EDGE IS_SUBCLASS_OF (FROM TagClass, TO TagClass) WITH REVERSE_EDGE="IS_SUBCLASS_OF_REVERSE"
CREATE UNDIRECTED EDGE KNOWS (FROM Person, TO Person, creationDate DATETIME)
CREATE DIRECTED EDGE LIKES (FROM Person, TO Comment|Post, creationDate DATETIME) WITH REVERSE_EDGE="LIKES_REVERSE"
CREATE DIRECTED EDGE REPLY_OF (FROM Comment, TO Comment|Post) WITH REVERSE_EDGE="REPLY_OF_REVERSE"
CREATE DIRECTED EDGE STUDY_AT (FROM Person, TO University, classYear INT) WITH REVERSE_EDGE="STUDY_AT_REVERSE"
CREATE DIRECTED EDGE WORK_AT (FROM Person, TO Company, workFrom INT) WITH REVERSE_EDGE="WORK_AT_REVERSE"
''', options=[]))

The vertex type Comment is created.
The vertex type Post is created.
The vertex type Company is created.
The vertex type University is created.
The vertex type City is created.
The vertex type Country is created.
The vertex type Continent is created.
The vertex type Forum is created.
The vertex type Person is created.
The vertex type Tag is created.
The vertex type TagClass is created.
The edge type CONTAINER_OF is created.
The reverse edge type CONTAINER_OF_REVERSE is created.
The edge type HAS_CREATOR is created.
The reverse edge type HAS_CREATOR_REVERSE is created.
The edge type HAS_INTEREST is created.
The reverse edge type HAS_INTEREST_REVERSE is created.
The edge type HAS_MEMBER is created.
The reverse edge type HAS_MEMBER_REVERSE is created.
The edge type HAS_MODERATOR is created.
The reverse edge type HAS_MODERATOR_REVERSE is created.
The edge type HAS_TAG is created.
The reverse edge type HAS_TAG_REVERSE is created.
The edge type HAS_TYPE is created.
The reverse edge type HAS_

Next CREATE GRAPH statement and CREATE LOADING JOB to populate the graph

In [None]:
print(conn.gsql('''
CREATE GRAPH ldbc_snb (*)
''', options=[]))


In [18]:
print(conn.gsql('''
USE GRAPH ldbc_snb
''', options=[]))


Using graph 'ldbc_snb'


## STEP 4: Load data

<div>
  <img style="vertical-align:top" src="img/load_data.png" width="30" height="30"/>
  <span style="">LDBC provides sample datasets for use in benchmarking Graph Database performance. Add the sample archive file to your TG docker instance. 
</span>
</div>

In [None]:
CREATE LOADING JOB to populate ldbc graph with data

In [13]:
print(conn.gsql('''
USE GRAPH ldbc_snb
CREATE LOADING JOB load_ldbc_snb FOR GRAPH ldbc_snb {
  DEFINE FILENAME v_comment_file;
  LOAD v_comment_file 
    TO VERTEX Comment VALUES ($0, $1, $2, $3, $4, $5) USING header="true", separator="|";
}
''', options=[]))

Using graph 'ldbc_snb'
The job load_ldbc_snb is created.


In [14]:
print(conn.gsql('''
USE GRAPH ldbc_snb
CREATE LOADING JOB load_ldbc_snb FOR GRAPH ldbc_snb {
  DEFINE FILENAME v_comment_file;
  DEFINE FILENAME v_post_file;
  DEFINE FILENAME v_organisation_file;
  DEFINE FILENAME v_place_file;
  DEFINE FILENAME v_forum_file;
  DEFINE FILENAME v_person_file;
  DEFINE FILENAME v_tag_file;
  DEFINE FILENAME v_tagclass_file;
  
  DEFINE FILENAME forum_containerOf_post_file;
  DEFINE FILENAME comment_hasCreator_person_file;
  DEFINE FILENAME post_hasCreator_person_file;
  DEFINE FILENAME person_hasInterest_tag_file;
  DEFINE FILENAME forum_hasMember_person_file;
  DEFINE FILENAME forum_hasModerator_person_file;
  DEFINE FILENAME comment_hasTag_tag_file;
  DEFINE FILENAME post_hasTag_tag_file;
  DEFINE FILENAME forum_hasTag_tag_file;
  DEFINE FILENAME tag_hasType_tagclass_file;
  DEFINE FILENAME organisation_isLocatedIn_place_file;
  DEFINE FILENAME comment_isLocatedIn_place_file;
  DEFINE FILENAME post_isLocatedIn_place_file;
  DEFINE FILENAME person_isLocatedIn_place_file;
  DEFINE FILENAME place_isPartOf_place_file;
  DEFINE FILENAME tagclass_isSubclassOf_tagclass_file;
  DEFINE FILENAME person_knows_person_file;
  DEFINE FILENAME person_likes_comment_file;
  DEFINE FILENAME person_likes_post_file;
  DEFINE FILENAME comment_replyOf_comment_file;
  DEFINE FILENAME comment_replyOf_post_file;
  DEFINE FILENAME person_studyAt_organisation_file;
  DEFINE FILENAME person_workAt_organisation_file;

  LOAD v_comment_file 
    TO VERTEX Comment VALUES ($0, $1, $2, $3, $4, $5) USING header="true", separator="|";
  LOAD v_post_file
    TO VERTEX Post VALUES ($0, $1, $2, $3, $4, $5, $6, $7) USING header="true", separator="|";
  LOAD v_organisation_file
    TO VERTEX Company VALUES ($0, $2, $3) WHERE $1=="company",
    TO VERTEX University VALUES ($0, $2, $3) WHERE $1=="university" USING header="true", separator="|";
  LOAD v_place_file
    TO VERTEX City VALUES ($0, $1, $2) WHERE $3=="city",
    TO VERTEX Country VALUES ($0, $1, $2) WHERE $3=="country",
    TO VERTEX Continent VALUES ($0, $1, $2) WHERE $3=="continent" USING header="true", separator="|";
  LOAD v_forum_file
    TO VERTEX Forum VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD v_person_file
    TO VERTEX Person VALUES ($0, $1, $2, $3, $4, $5, $6, $7, SPLIT($8,";"), SPLIT($9,";")) USING header="true", separator="|";
  LOAD v_tag_file
    TO VERTEX Tag VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD v_tagclass_file
    TO VERTEX TagClass VALUES ($0, $1, $2) USING header="true", separator="|";

  LOAD forum_containerOf_post_file
    TO EDGE CONTAINER_OF VALUES ($0, $1) USING header="true", separator="|";
  LOAD comment_hasCreator_person_file
    TO EDGE HAS_CREATOR VALUES ($0 Comment, $1) USING header="true", separator="|";
  LOAD post_hasCreator_person_file
    TO EDGE HAS_CREATOR VALUES ($0 Post, $1) USING header="true", separator="|";
  LOAD person_hasInterest_tag_file
    TO EDGE HAS_INTEREST VALUES ($0, $1) USING header="true", separator="|";
  LOAD forum_hasMember_person_file
    TO EDGE HAS_MEMBER VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD forum_hasModerator_person_file
    TO EDGE HAS_MODERATOR VALUES ($0, $1) USING header="true", separator="|";
  LOAD comment_hasTag_tag_file
    TO EDGE HAS_TAG VALUES ($0 Comment, $1) USING header="true", separator="|";
  LOAD post_hasTag_tag_file
    TO EDGE HAS_TAG VALUES ($0 Post, $1) USING header="true", separator="|";
  LOAD forum_hasTag_tag_file
    TO EDGE HAS_TAG VALUES ($0 Forum, $1) USING header="true", separator="|";
  LOAD tag_hasType_tagclass_file
    TO EDGE HAS_TYPE VALUES ($0, $1) USING header="true", separator="|";
  LOAD organisation_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Company, $1 Country) WHERE to_int($1) < 111, 
    TO EDGE IS_LOCATED_IN VALUES ($0 University, $1 City) WHERE to_int($1) > 110 USING header="true", separator="|";
  LOAD comment_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Comment, $1 Country) USING header="true", separator="|";
  LOAD post_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Post, $1 Country) USING header="true", separator="|";
  LOAD person_isLocatedIn_place_file
    TO EDGE IS_LOCATED_IN VALUES ($0 Person, $1 City) USING header="true", separator="|";
  LOAD place_isPartOf_place_file
    TO EDGE IS_PART_OF VALUES ($0 Country, $1 Continent) WHERE to_int($0) < 111,
    TO EDGE IS_PART_OF VALUES ($0 City, $1 Country) WHERE to_int($0) > 110 USING header="true", separator="|";
  LOAD tagclass_isSubclassOf_tagclass_file
    TO EDGE IS_SUBCLASS_OF VALUES ($0, $1) USING header="true", separator="|";
  LOAD person_knows_person_file
    TO EDGE KNOWS VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD person_likes_comment_file
    TO EDGE LIKES VALUES ($0, $1 Comment, $2) USING header="true", separator="|";
  LOAD person_likes_post_file
    TO EDGE LIKES VALUES ($0, $1 Post, $2) USING header="true", separator="|";
  LOAD comment_replyOf_comment_file
    TO EDGE REPLY_OF VALUES ($0, $1 Comment) USING header="true", separator="|";
  LOAD comment_replyOf_post_file
    TO EDGE REPLY_OF VALUES ($0, $1 Post) USING header="true", separator="|";
  LOAD person_studyAt_organisation_file
    TO EDGE STUDY_AT VALUES ($0, $1, $2) USING header="true", separator="|";
  LOAD person_workAt_organisation_file
    TO EDGE WORK_AT VALUES ($0, $1, $2) USING header="true", separator="|";
}
''', options=[]))

Using graph 'ldbc_snb'
The job load_ldbc_snb is created.


In [None]:
TG has generated an LDBC data set with scale factor 1 (approximate 1GB). 

You can download it from https://s3-us-west-1.amazonaws.com/tigergraph-benchmark-dataset/LDBC/SF-1/ldbc_snb_data-sf1.tar.gz 
    

In [None]:
Load Vertex Data

In [25]:
print(conn.gsql('''
USE GRAPH ldbc_snb

RUN LOADING JOB load_ldbc_snb USING v_person_file="/home/tigergraph/ldbc_snb_data_sample/social_network/person_0_0.csv"
RUN LOADING JOB load_ldbc_snb USING v_post_file="/home/tigergraph/ldbc_snb_data_sample/social_network/post_0_0.csv"
RUN LOADING JOB load_ldbc_snb USING v_tag_file="/home/tigergraph/ldbc_snb_data_sample/social_network/tag_0_0.csv"
RUN LOADING JOB load_ldbc_snb USING v_place_file="/home/tigergraph/ldbc_snb_data_sample/social_network/place_0_0.csv"
RUN LOADING JOB load_ldbc_snb USING v_comment_file="/home/tigergraph/ldbc_snb_data_sample/social_network/comment_0_0.csv"
RUN LOADING JOB load_ldbc_snb USING v_forum_file="/home/tigergraph/ldbc_snb_data_sample/social_network/forum_0_0.csv"
RUN LOADING JOB load_ldbc_snb USING v_organisation_file="/home/tigergraph/ldbc_snb_data_sample/social_network/organisation_0_0.csv"
RUN LOADING JOB load_ldbc_snb USING v_tagclass_file="/home/tigergraph/ldbc_snb_data_sample/social_network/tagclass_0_0.csv"
''', options=[]))

[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
Using graph 'ldbc_snb'
[Tip: Use "CTRL + C" to stop displaying the loading status update, then use "SHOW LOADING STATUS jobid" to track the loading progress again]
[Tip: Manage loading jobs with "ABORT/RESUME LOADING JOB jobid"]
Starting the following job, i.e.
JobName: load_ldbc_snb, jobid: ldbc_snb.load_ldbc_snb.file.m1.1631050315809
Loading log: '/home/tigergraph/tigergraph/log/restpp/restpp_loader_logs/ldbc_snb/ldbc_snb.load_ldbc_snb.file.m1.1631050315809.log'

Job "ldbc_snb.load_ldbc_snb.file.m1.1631050315809" loading status
[WAITING] m1 ( Finished: 0 / Total: 0 )
Job "ldbc_snb.load_ldbc_snb.file.m1.1631050315809" loading status
[FINISHED] m1 ( Finished: 1 / Total: 1 )
[LOADED]
+---------------------------------------------------------------------------------------------------------------+
|                                                           FILENAME |   L

Load Edge data

In [30]:
print(conn.gsql('''
USE GRAPH ldbc_snb

RUN LOADING JOB load_ldbc_snb USING person_knows_person_file="/home/tigergraph/ldbc_snb_data_sample/social_network/person_knows_person_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING comment_replyOf_post_file="/home/tigergraph/ldbc_snb_data_sample/social_network/comment_replyOf_post_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING comment_replyOf_comment_file="/home/tigergraph/ldbc_snb_data_sample/social_network/comment_replyOf_comment_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING post_hasCreator_person_file="/home/tigergraph/ldbc_snb_data_sample/social_network/post_hasCreator_person_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING post_hasTag_tag_file="/home/tigergraph/ldbc_snb_data_sample/social_network/post_hasTag_tag_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING comment_hasCreator_person_file="/home/tigergraph/ldbc_snb_data_sample/social_network/comment_hasCreator_person_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING post_isLocatedIn_place_file="/home/tigergraph/ldbc_snb_data_sample/social_network/post_isLocatedIn_place_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING comment_hasTag_tag_file="/home/tigergraph/ldbc_snb_data_sample/social_network/comment_hasTag_tag_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING comment_isLocatedIn_place_file="/home/tigergraph/ldbc_snb_data_sample/social_network/comment_isLocatedIn_place_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING forum_containerOf_post_file="/home/tigergraph/ldbc_snb_data_sample/social_network/forum_containerOf_post_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING forum_hasMember_person_file="/home/tigergraph/ldbc_snb_data_sample/social_network/forum_hasMember_person_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING forum_hasModerator_person_file="/home/tigergraph/ldbc_snb_data_sample/social_network/forum_hasModerator_person_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING forum_hasTag_tag_file="/home/tigergraph/ldbc_snb_data_sample/social_network/forum_hasTag_tag_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING organisation_isLocatedIn_place_file="/home/tigergraph/ldbc_snb_data_sample/social_network/organisation_isLocatedIn_place_0_0.csv"
RUN LOADING JOB load_ldbc_snb USING person_hasInterest_tag_file="/home/tigergraph/ldbc_snb_data_sample/social_network/person_hasInterest_tag_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING person_isLocatedIn_place_file="/home/tigergraph/ldbc_snb_data_sample/social_network/person_isLocatedIn_place_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING person_likes_comment_file="/home/tigergraph/ldbc_snb_data_sample/social_network/person_likes_comment_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING person_likes_post_file="/home/tigergraph/ldbc_snb_data_sample/social_network/person_likes_post_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING person_studyAt_organisation_file="/home/tigergraph/ldbc_snb_data_sample/social_network/person_studyAt_organisation_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING person_workAt_organisation_file="/home/tigergraph/ldbc_snb_data_sample/social_network/person_workAt_organisation_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING place_isPartOf_place_file="/home/tigergraph/ldbc_snb_data_sample/social_network/place_isPartOf_place_0_0.csv"
RUN LOADING JOB load_ldbc_snb USING tag_hasType_tagclass_file="/home/tigergraph/ldbc_snb_data_sample/social_network/tag_hasType_tagclass_0_0.csv" 
RUN LOADING JOB load_ldbc_snb USING tagclass_isSubclassOf_tagclass_file="/home/tigergraph/ldbc_snb_data_sample/social_network/tagclass_isSubclassOf_tagclass_0_0.csv" 
''', options=[]))

[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
[2A
[2K
[2K
Using graph 'ldbc_snb'
[Tip: Use "CTRL + C" to stop displaying the loading status update, then use "SHOW LOADING STATUS jobid" to track the loading progress again]
[Tip: Manage loading jobs with "ABORT/RESUME LOADING JOB jobid"]
Starting the following job, i.e.
JobName: load_ldbc_snb, jobid: ldbc_snb.load_ldbc_snb.file.m1.1631051349946
Loading log: '/home/tigergraph/tigergraph/log/restpp/restpp_loader_logs/ldbc_snb/ldbc_snb.load_ldbc_snb.file.m1.1631051349946.log'

Job "ldbc_snb.load_ldbc_snb.file.m1.1631051349946" loading status
[WAITING] m1 ( Finished: 0 / Total: 0 )
Job "ldbc_snb.load_ldbc_snb.file.m1.1631051349946" loading status
[FINISHED] m1

----
The GSQL command enable sending arbitrary GSQL statements to the database. Next cell show how to test the schema createtion was succesful.

In [31]:
conn.graphname = 'ldbc_snb'
print(conn.gsql('''ls''', options=[]))

---- Global vertices, edges, and all graphs
Vertex Types:
- VERTEX Comment(PRIMARY_ID id UINT, creationDate DATETIME, locationIP STRING, browserUsed STRING, content STRING, length UINT) WITH STATS="OUTDEGREE_BY_EDGETYPE", PRIMARY_ID_AS_ATTRIBUTE="TRUE"
- VERTEX Post(PRIMARY_ID id UINT, imageFile STRING, creationDate DATETIME, locationIP STRING, browserUsed STRING, lang STRING, content STRING, length UINT) WITH STATS="OUTDEGREE_BY_EDGETYPE", PRIMARY_ID_AS_ATTRIBUTE="TRUE"
- VERTEX Company(PRIMARY_ID id UINT, name STRING, url STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE", PRIMARY_ID_AS_ATTRIBUTE="TRUE"
- VERTEX University(PRIMARY_ID id UINT, name STRING, url STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE", PRIMARY_ID_AS_ATTRIBUTE="TRUE"
- VERTEX City(PRIMARY_ID id UINT, name STRING, url STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE", PRIMARY_ID_AS_ATTRIBUTE="TRUE"
- VERTEX Country(PRIMARY_ID id UINT, name STRING, url STRING) WITH STATS="OUTDEGREE_BY_EDGETYPE", PRIMARY_ID_AS_ATTRIBUTE="TRUE"
- VERTEX

In [89]:
numPersons = conn.getVertexCount("Person")
print(f"There are currently {numPersons} in of vertex type Person")

There are currently 9892 in of vertex type Person


## STEP 5: Explore Graph

<div>
  <img style="vertical-align:top" src="img/inquiry.jpeg" width="28" height="28"/>
  <span style="">TBD. 
</span>
</div>

In [90]:
conn.getVertexStats('Person')

{'Person': {'id': {'MAX': 35184372099695,
   'MIN': 65,
   'AVG': 16635393076606.418},
  'birthday': {'MAX': 633744000, 'MIN': 318211200, 'AVG': 474343450.38415},
  'creationDate': {'MAX': 1347522126,
   'MIN': 1262307686,
   'AVG': 1305029388.1271734}}}

In [91]:
conn.getVertexTypes()

['Comment',
 'Post',
 'Company',
 'University',
 'City',
 'Country',
 'Continent',
 'Forum',
 'Person',
 'Tag',
 'TagClass']

In [92]:
conn.getEdgeStats('LIKES', skipNA=False)

{'LIKES': {'creationDate': {'MAX': 1347528932,
   'MIN': 1267091759,
   'AVG': 1325267172.6366217}}}

In [93]:
conn.getEdges('Person', '4398046511904'
              , edgeType='LIKES'
              , targetVertexType='Comment'
              , targetVertexId=None, select="creationDate", where="", limit="", sort="", timeout=0)

[{'e_type': 'LIKES',
  'directed': True,
  'from_id': '4398046511904',
  'from_type': 'Person',
  'to_id': '1924145533450',
  'to_type': 'Comment',
  'attributes': {'creationDate': '2012-05-15 16:27:01'}},
 {'e_type': 'LIKES',
  'directed': True,
  'from_id': '4398046511904',
  'from_type': 'Person',
  'to_id': '274878030603',
  'to_type': 'Comment',
  'attributes': {'creationDate': '2010-06-27 04:36:49'}},
 {'e_type': 'LIKES',
  'directed': True,
  'from_id': '4398046511904',
  'from_type': 'Person',
  'to_id': '2061584486442',
  'to_type': 'Comment',
  'attributes': {'creationDate': '2012-08-30 00:15:14'}},
 {'e_type': 'LIKES',
  'directed': True,
  'from_id': '4398046511904',
  'from_type': 'Person',
  'to_id': '824633843838',
  'to_type': 'Comment',
  'attributes': {'creationDate': '2011-01-06 14:52:35'}},
 {'e_type': 'LIKES',
  'directed': True,
  'from_id': '4398046511904',
  'from_type': 'Person',
  'to_id': '412316984040',
  'to_type': 'Comment',
  'attributes': {'creationDate'

In [94]:
#getEdgesDataframe(sourceVertexType, sourceVerticies, edgeType=None, targetVertexType=None, targetVertexId=None, select="", where="", limit="", sort="", timeout=0)

df=conn.getEdgesDataframe("Person", "32985348834284")
df

Unnamed: 0,from_type,from_id,to_type,to_id,creationDate,classYear
0,Person,32985348834284,Forum,2061584370709,,
1,Person,32985348834284,City,805,,
2,Person,32985348834284,Person,10995116279461,2012-08-26 09:38:50,
3,Person,32985348834284,Person,21990232558713,2012-08-24 14:48:25,
4,Person,32985348834284,Person,2199023265049,2012-08-26 16:19:34,
...,...,...,...,...,...,...
57,Person,32985348834284,Person,6597069771199,2012-08-03 20:02:29,
58,Person,32985348834284,Person,15393162794383,2012-08-10 01:32:29,
59,Person,32985348834284,Person,17592186048441,2012-07-31 08:59:35,
60,Person,32985348834284,Person,13194139534185,2012-08-19 04:54:26,


In [95]:
resultSet1 = conn.gsql('''use graph ldbc_snb 
   SELECT * FROM City LIMIT 3''')
resultSet1

'Using graph \'ldbc_snb\'\n[\n{\n"v_id": "1406",\n"attributes": {\n"name": "Bursa",\n"id": 1406,\n"url": "http://dbpedia.org/resource/Bursa"\n},\n"v_type": "City"\n},\n{\n"v_id": "1292",\n"attributes": {\n"name": "Coimbra",\n"id": 1292,\n"url": "http://dbpedia.org/resource/Coimbra"\n},\n"v_type": "City"\n},\n{\n"v_id": "1220",\n"attributes": {\n"name": "Comrat",\n"id": 1220,\n"url": "http://dbpedia.org/resource/Comrat"\n},\n"v_type": "City"\n}\n]'

In [96]:
import json
rsj = '[\n{\n"v_id": "Nancy",\n"attributes": {\n"gender": "female",\n"name": "Nancy",\n"state": "spanish",\n"age": 20\n},\n"v_type": "person"\n},\n{\n"v_id": "Dave",\n"attributes": {\n"gender": "male",\n"name": "Dave",\n"state": "wa",\n"age": 54\n},\n"v_type": "person"\n},\n{\n"v_id": "Jenny",\n"attributes": {\n"gender": "female",\n"name": "Jenny",\n"state": "english",\n"age": 25\n},\n"v_type": "person"\n}\n]'

import re
rsj2 = re.split('\[]', rsj)
json.loads(rsj2[0])


[{'v_id': 'Nancy',
  'attributes': {'gender': 'female',
   'name': 'Nancy',
   'state': 'spanish',
   'age': 20},
  'v_type': 'person'},
 {'v_id': 'Dave',
  'attributes': {'gender': 'male', 'name': 'Dave', 'state': 'wa', 'age': 54},
  'v_type': 'person'},
 {'v_id': 'Jenny',
  'attributes': {'gender': 'female',
   'name': 'Jenny',
   'state': 'english',
   'age': 25},
  'v_type': 'person'}]

In [97]:
print(conn.gsql('''
USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {
   #1-hop pattern.
   friends = SELECT p
             FROM Person:s -(KNOWS:e)- Person:p
             WHERE s.firstName == "Bobby" AND s.lastName == "Sotto"
             ORDER BY p.birthday ASC
             LIMIT 3;

    PRINT  friends[friends.firstName, friends.lastName, friends.birthday];
}
''', options=[]))


Using graph 'ldbc_snb'
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"friends": [
{
"v_id": "10995116279461",
"attributes": {
"friends.birthday": "1980-05-13 00:00:00",
"friends.lastName": "Cajes",
"friends.firstName": "Gregorio"
},
"v_type": "Person"
},
{
"v_id": "30786325583918",
"attributes": {
"friends.birthday": "1981-01-04 00:00:00",
"friends.lastName": "Garcia",
"friends.firstName": "Bobby"
},
"v_type": "Person"
},
{
"v_id": "19791209308415",
"attributes": {
"friends.birthday": "1980-08-17 00:00:00",
"friends.lastName": "Reyes",
"friends.firstName": "Jose"
},
"v_type": "Person"
}
]}]
}


In [98]:
print(conn.gsql('''
USE GRAPH ldbc_snb
set query_timeout=60000

INTERPRET QUERY () SYNTAX v2{
  SumAccum<int> @commentCnt= 0;
  SumAccum<int> @postCnt= 0;

  Result = SELECT tgt
           FROM Person:tgt -(<HAS_CREATOR|LIKES>)- (Comment|Post):src
           WHERE tgt.firstName == "Bob" AND tgt.lastName == "Choi"
           ACCUM CASE WHEN src.type == "Comment" THEN
                          tgt.@commentCnt += 1
                      WHEN src.type == "Post" THEN
                          tgt.@postCnt += 1
                 END;

  PRINT Result[Result.@commentCnt, Result.@postCnt];
}

''', options=[]))

Using graph 'ldbc_snb'
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"Result": [{
"v_id": "21990232556497",
"attributes": {
"Result.@commentCnt": 31,
"Result.@postCnt": 16
},
"v_type": "Person"
}]}]
}


In [99]:
results = conn.gsql('''use graph ldbc_snb
                        SELECT * FROM Person WHERE gender=="female" AND browserUsed=="Firefox" AND firstName=="Bob"''')
results
#df = pd.DataFrame(results[0]['attributes'])
#df
#print(json.dumps(results, indent=2))

'Using graph \'ldbc_snb\'\n[{\n"v_id": "4398046521266",\n"attributes": {\n"birthday": "1986-05-03 00:00:00",\n"firstName": "Bob",\n"lastName": "Calvert",\n"gender": "female",\n"speaks": [\n"en",\n"af"\n],\n"browserUsed": "Firefox",\n"locationIP": "41.204.199.69",\n"id": 4398046521266,\n"creationDate": "2010-06-24 05:47:05",\n"email": [\n"Bob4398046521266@gmail.com",\n"Bob4398046521266@gmx.com",\n"Bob4398046521266@hotmail.com"\n]\n},\n"v_type": "Person"\n}]'

In [100]:
res=conn.getVertices('Person', select='lastName,birthday,gender', where='firstName=="Alexander"')
res

[{'v_id': '19791209303325',
  'v_type': 'Person',
  'attributes': {'lastName': 'Morales',
   'birthday': '1980-12-25 00:00:00',
   'gender': 'female'}},
 {'v_id': '26388279072457',
  'v_type': 'Person',
  'attributes': {'lastName': 'Khizhynkova',
   'birthday': '1980-08-03 00:00:00',
   'gender': 'female'}},
 {'v_id': '6597069772578',
  'v_type': 'Person',
  'attributes': {'lastName': 'Golovin',
   'birthday': '1989-11-30 00:00:00',
   'gender': 'male'}},
 {'v_id': '28587302322814',
  'v_type': 'Person',
  'attributes': {'lastName': 'Basov',
   'birthday': '1983-11-27 00:00:00',
   'gender': 'male'}},
 {'v_id': '8796093029142',
  'v_type': 'Person',
  'attributes': {'lastName': 'Dobrunov',
   'birthday': '1984-02-10 00:00:00',
   'gender': 'male'}},
 {'v_id': '2199023260548',
  'v_type': 'Person',
  'attributes': {'lastName': 'Berman',
   'birthday': '1983-07-30 00:00:00',
   'gender': 'male'}},
 {'v_id': '4398046519334',
  'v_type': 'Person',
  'attributes': {'lastName': 'Ivanov',
   

In [101]:
attrs=[x['attributes'] for x in res]
attrs

[{'lastName': 'Morales',
  'birthday': '1980-12-25 00:00:00',
  'gender': 'female'},
 {'lastName': 'Khizhynkova',
  'birthday': '1980-08-03 00:00:00',
  'gender': 'female'},
 {'lastName': 'Golovin', 'birthday': '1989-11-30 00:00:00', 'gender': 'male'},
 {'lastName': 'Basov', 'birthday': '1983-11-27 00:00:00', 'gender': 'male'},
 {'lastName': 'Dobrunov', 'birthday': '1984-02-10 00:00:00', 'gender': 'male'},
 {'lastName': 'Berman', 'birthday': '1983-07-30 00:00:00', 'gender': 'male'},
 {'lastName': 'Ivanov', 'birthday': '1989-01-03 00:00:00', 'gender': 'male'},
 {'lastName': 'Popov', 'birthday': '1981-05-18 00:00:00', 'gender': 'male'},
 {'lastName': 'Popov', 'birthday': '1983-11-04 00:00:00', 'gender': 'male'},
 {'lastName': 'Shevchenko',
  'birthday': '1981-04-30 00:00:00',
  'gender': 'male'},
 {'lastName': 'Eduard', 'birthday': '1985-01-09 00:00:00', 'gender': 'male'},
 {'lastName': 'Gusev', 'birthday': '1985-04-17 00:00:00', 'gender': 'male'},
 {'lastName': 'Dobrunov', 'birthday': '

In [102]:
type(attrs)

list

In [103]:
attrs=res[-2]['attributes']
attrs

{'lastName': 'Basov', 'birthday': '1982-01-23 00:00:00', 'gender': 'male'}

In [104]:
df = pd.DataFrame(res)
df

Unnamed: 0,v_id,v_type,attributes
0,19791209303325,Person,"{'lastName': 'Morales', 'birthday': '1980-12-2..."
1,26388279072457,Person,"{'lastName': 'Khizhynkova', 'birthday': '1980-..."
2,6597069772578,Person,"{'lastName': 'Golovin', 'birthday': '1989-11-3..."
3,28587302322814,Person,"{'lastName': 'Basov', 'birthday': '1983-11-27 ..."
4,8796093029142,Person,"{'lastName': 'Dobrunov', 'birthday': '1984-02-..."
...,...,...,...
58,13194139535786,Person,"{'lastName': 'Dobrunov', 'birthday': '1983-03-..."
59,28587302331009,Person,"{'lastName': 'Basov', 'birthday': '1989-04-07 ..."
60,4398046512766,Person,"{'lastName': 'Kuzmina', 'birthday': '1988-12-2..."
61,2199023255688,Person,"{'lastName': 'Basov', 'birthday': '1982-01-23 ..."


In [105]:
sourceVertexType='Person'
sourceVertexId='4398046512766'
conn.getEdges(sourceVertexType, sourceVertexId, edgeType=None, targetVertexType=None, targetVertexId=None, select="", where="", limit="", sort="", timeout=0)

[{'e_type': 'HAS_CREATOR_REVERSE',
  'directed': True,
  'from_id': '4398046512766',
  'from_type': 'Person',
  'to_id': '824633799633',
  'to_type': 'Comment',
  'attributes': {}},
 {'e_type': 'HAS_CREATOR_REVERSE',
  'directed': True,
  'from_id': '4398046512766',
  'from_type': 'Person',
  'to_id': '824633798089',
  'to_type': 'Comment',
  'attributes': {}},
 {'e_type': 'HAS_CREATOR_REVERSE',
  'directed': True,
  'from_id': '4398046512766',
  'from_type': 'Person',
  'to_id': '824633797789',
  'to_type': 'Comment',
  'attributes': {}},
 {'e_type': 'HAS_CREATOR_REVERSE',
  'directed': True,
  'from_id': '4398046512766',
  'from_type': 'Person',
  'to_id': '824633798648',
  'to_type': 'Comment',
  'attributes': {}},
 {'e_type': 'HAS_CREATOR_REVERSE',
  'directed': True,
  'from_id': '4398046512766',
  'from_type': 'Person',
  'to_id': '824633797812',
  'to_type': 'Comment',
  'attributes': {}},
 {'e_type': 'HAS_CREATOR_REVERSE',
  'directed': True,
  'from_id': '4398046512766',
  'fr

## STEP 6: Write Queries

<div>
  <img style="vertical-align:top" src="img/query.png" width="28" height="28"/>
  <span style="">TBD. 
</span>
</div>

In [106]:
conn.gsql('''use graph ldbc_snb
                 select * from Post where browserUsed=="Chrome" LIMIT 10''')

'Using graph \'ldbc_snb\'\n[\n{\n"v_id": "274877939768",\n"attributes": {\n"imageFile": "photo274877939768.jpg",\n"browserUsed": "Chrome",\n"length": 0,\n"locationIP": "196.216.245.233",\n"id": 274877939768,\n"creationDate": "2010-06-03 16:25:55",\n"lang": "",\n"content": ""\n},\n"v_type": "Post"\n},\n{\n"v_id": "1511828520988",\n"attributes": {\n"imageFile": "photo1511828520988.jpg",\n"browserUsed": "Chrome",\n"length": 0,\n"locationIP": "196.216.245.233",\n"id": 1511828520988,\n"creationDate": "2011-12-22 05:37:32",\n"lang": "",\n"content": ""\n},\n"v_type": "Post"\n},\n{\n"v_id": "274877939701",\n"attributes": {\n"imageFile": "photo274877939701.jpg",\n"browserUsed": "Chrome",\n"length": 0,\n"locationIP": "196.216.245.233",\n"id": 274877939701,\n"creationDate": "2010-06-23 02:36:13",\n"lang": "",\n"content": ""\n},\n"v_type": "Post"\n},\n{\n"v_id": "1649267474305",\n"attributes": {\n"imageFile": "photo1649267474305.jpg",\n"browserUsed": "Chrome",\n"length": 0,\n"locationIP": "196.216

In [107]:
conn.gsql('''
USE GRAPH ldbc_snb
INTERPRET QUERY () SYNTAX v2 {
  SumAccum<int> @@cnt= 0;

  Result = SELECT tgt
           FROM Person:tgt -(<_)- (Comment|Post):src
           WHERE tgt.firstName == "David" AND tgt.lastName == "Hunter"
           ACCUM  @@cnt += 1;

  PRINT @@cnt;
}
''')

'Using graph \'ldbc_snb\'\n{\n"error": false,\n"message": "",\n"version": {\n"schema": 0,\n"edition": "enterprise",\n"api": "v2"\n},\n"results": [{"@@cnt": 16}]\n}'

In [108]:
conn.runInterpretedQuery('''
  INTERPRET QUERY () FOR GRAPH ldbc_snb {
    PRINT "Hello World"; 
}
''')

[{'"Hello World"': 'Hello World'}]

In [109]:
conn.gsql('''
  
  USE GRAPH ldbc_snb

INTERPRET QUERY () SYNTAX v2 {

  TagClass1 =  SELECT t
               FROM TagClass:s - (IS_SUBCLASS_OF>*) - TagClass:t
               WHERE s.name == "TennisPlayer";

    PRINT  TagClass1;
}
''')

'Using graph \'ldbc_snb\'\n{\n"error": false,\n"message": "",\n"version": {\n"schema": 0,\n"edition": "enterprise",\n"api": "v2"\n},\n"results": [{"TagClass1": [\n{\n"v_id": "211",\n"attributes": {\n"name": "Person",\n"id": 211,\n"url": "http://dbpedia.org/ontology/Person"\n},\n"v_type": "TagClass"\n},\n{\n"v_id": "239",\n"attributes": {\n"name": "Agent",\n"id": 239,\n"url": "http://dbpedia.org/ontology/Agent"\n},\n"v_type": "TagClass"\n},\n{\n"v_id": "0",\n"attributes": {\n"name": "Thing",\n"id": 0,\n"url": "http://www.w3.org/2002/07/owl#Thing"\n},\n"v_type": "TagClass"\n},\n{\n"v_id": "149",\n"attributes": {\n"name": "Athlete",\n"id": 149,\n"url": "http://dbpedia.org/ontology/Athlete"\n},\n"v_type": "TagClass"\n},\n{\n"v_id": "59",\n"attributes": {\n"name": "TennisPlayer",\n"id": 59,\n"url": "http://dbpedia.org/ontology/TennisPlayer"\n},\n"v_type": "TagClass"\n}\n]}]\n}'