<p><img alt="TigerGraph logo" height="45px" src="https://blobscdn.gitbook.com/v0/b/gitbook-28427.appspot.com/o/spaces%2F-LHvjxIN4__6bA0T-QmU%2Favatar.png?generation=1532158270801864&amp;alt=media" align="left" hspace="10px" vspace="0px"></p>

# TigerGraph 2 TensorFlow
------
Using Pandas + TensorFlow for User Rating Predictions
## Introduction
This notebook walks through a basic example of using <a href="https://www.tensorflow.org/tutorials" target="_blank">TensorFlow</a> with the output of a graph to run a prediction whether a User will like a movie. The data is collected from a TigerGraph database using a Python package <a href="https://github.com/parkererickson/pyTigerGraph" target="_blank">pyTigerGraph</a>. Data collected is via a REST call on a user built query. The output from the call is coming to the notebook in a JSON format. Using <a href="https://pypi.org/project/pandas" target="_blank">Pandas</a> we transform that data into a dataframe. After transforming the data we get the data ready fror consumption for a basic model with 2 layers. The very last step we do a data prediction on some movies the user hasn't rated.


## Install Queries on TigerGraph Server
You need to create and install one queries on the TigerGraph server named userRating

```
CREATE QUERY userData(VERTEX<USER> user) FOR GRAPH Recommender { 
  // Feature Extraction of User: movieID, movieTitle, userRating, term, termRating
	// Sample Param = 271
	
  SumAccum<float> @rating;
	SetAccum<STRING> @genre;
	
	src = {user}; //From the User
  
	S1 = SELECT tgt FROM src:s -(rate:e)-> MOVIE:tgt //Grab all the movies that they rated
       ACCUM tgt.@rating += e.rating; //Also add a local varible of that users rating
	
	S2 = SELECT s FROM S1:s -(movie_term:e)-> TERM:tgt //From those movies grab the genre (term)
	     ACCUM s.@genre += tgt.term; //Add those as a local variable as set of genres 
	
  PRINT S2[S2.movie_id as movieID, S2.name as movieTitle, S2.@rating as userRating, S2.@genre as genre];
}
```


## Setup Your Notebook

#### Install Latest Version of Packages

In [1]:
!pip install pyTigerGraph
!pip install pandas
!pip install flat-table
!pip install tensorflow
!pip install -q sklearn

Collecting pyTigerGraph
  Downloading https://files.pythonhosted.org/packages/59/d1/cc51f54b48d65e3fd6430ec4162b5a84be78cf9532fa8efdbd63773ed340/pyTigerGraph-0.0.4.6.tar.gz
Collecting validators
  Downloading https://files.pythonhosted.org/packages/4d/56/9b48c918ef118ea12b90f227c4498ed4703b418bdd8fb49479dfcbeae4ef/validators-0.14.2.tar.gz
Building wheels for collected packages: pyTigerGraph, validators
  Building wheel for pyTigerGraph (setup.py) ... [?25l[?25hdone
  Created wheel for pyTigerGraph: filename=pyTigerGraph-0.0.4.6-cp36-none-any.whl size=3410 sha256=2bdfefecd824b537078ad62c622aae8cf18ce6ed9e92763b01917a82d04084b8
  Stored in directory: /root/.cache/pip/wheels/b6/3a/d3/24bb96c355fdda37c0bee211275b98a3cfc08b33432169ed26
  Building wheel for validators (setup.py) ... [?25l[?25hdone
  Created wheel for validators: filename=validators-0.14.2-cp36-none-any.whl size=17249 sha256=e47c14e0540019d31846c16cd9a9a5feeb3a112684cda4f9a1e36b0f202879f5
  Stored in directory: /root/.cac

#### Import Tools

In [2]:
import pyTigerGraph as tg
import pandas as pd
import flat_table
import tensorflow as tf

from __future__ import absolute_import, division, print_function, unicode_literals
from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

tf.enable_eager_execution()

In [4]:
graph = tg.TigerGraphConnection(
    ipAddress="https://graphml.i.tgcloud.io", 
    graphname="Recommender", 
    apiToken="k794knbbp8uam2gqgghqusohe5on44gs")

results = graph.getEndpoints()
print(results)


{'DELETE /graph/delete_by_type/vertices/{vertex_type}/': {'parameters': {'ack': {'default': 'all', 'max_count': 1, 'min_count': 1, 'options': ['all', 'none'], 'type': 'STRING'}, 'permanent': {'default': 'false', 'max_count': 1, 'min_count': 1, 'type': 'BOOL'}, 'vertex_type': {'type': 'TYPENAME'}}}, 'DELETE /graph/edges/{source_vertex_type}/{source_vertex_id}/{edge_type}/{target_vertex_type}/{target_vertex_id}': {'parameters': {'edge_type': {'max_count': 1, 'min_count': 0, 'type': 'EDGETYPENAME'}, 'filter': {'max_count': 1, 'max_length': 2560, 'min_count': 0, 'type': 'STRING'}, 'limit': {'max_count': 1, 'min_count': 0, 'type': 'UINT64'}, 'not_wildcard': {'max_count': 1, 'min_count': 0, 'type': 'BOOL'}, 'permanent': {'default': 'false', 'max_count': 1, 'min_count': 1, 'type': 'BOOL'}, 'select': {'max_count': 1, 'max_length': 2560, 'min_count': 0, 'type': 'STRING'}, 'sort': {'max_count': 1, 'max_length': 2560, 'min_count': 0, 'type': 'STRING'}, 'source_vertex_id': {'id_type': '$source_ver

## Data Manipulation using Pandas

#### Grab Data From REST endpoint

In [5]:
preInstalledResult = graph.runInstalledQuery("userData", {"user":"217"})
parsR = (preInstalledResult)

print (parsR) # full return of REST call



{'version': {'edition': 'enterprise', 'api': 'v2', 'schema': 0}, 'error': False, 'message': '', 'results': [{'S2': [{'v_id': '566', 'v_type': 'MOVIE', 'attributes': {'movieID': '566', 'movieTitle': 'Clear and Present Danger (1994)', 'userRating': 0.39344, 'genre': ['Thriller', 'Action', 'Adventure']}}, {'v_id': '568', 'v_type': 'MOVIE', 'attributes': {'movieID': '568', 'movieTitle': 'Speed (1994)', 'userRating': 0.36875, 'genre': ['Thriller', 'Action', 'Romance']}}, {'v_id': '840', 'v_type': 'MOVIE', 'attributes': {'movieID': '840', 'movieTitle': 'Last Man Standing (1996)', 'userRating': -1.61538, 'genre': ['Western', 'Drama', 'Action']}}, {'v_id': '1034', 'v_type': 'MOVIE', 'attributes': {'movieID': '1034', 'movieTitle': 'Quest, The (1996)', 'userRating': 0.84615, 'genre': ['Action', 'Adventure']}}, {'v_id': '79', 'v_type': 'MOVIE', 'attributes': {'movieID': '79', 'movieTitle': 'Fugitive, The (1993)', 'userRating': 0.96537, 'genre': ['Thriller', 'Action']}}, {'v_id': '53', 'v_type': '

#### Put JSON Data into Data Frame

In [6]:
# Convert JSON to dataframe
df = pd.DataFrame(parsR["results"][0]["S2"]) # Grab only the data we are returning

df

Unnamed: 0,v_id,v_type,attributes
0,566,MOVIE,"{'movieID': '566', 'movieTitle': 'Clear and Pr..."
1,568,MOVIE,"{'movieID': '568', 'movieTitle': 'Speed (1994)..."
2,840,MOVIE,"{'movieID': '840', 'movieTitle': 'Last Man Sta..."
3,1034,MOVIE,"{'movieID': '1034', 'movieTitle': 'Quest, The ..."
4,79,MOVIE,"{'movieID': '79', 'movieTitle': 'Fugitive, The..."
...,...,...,...
71,825,MOVIE,"{'movieID': '825', 'movieTitle': 'Arrival, The..."
72,636,MOVIE,"{'movieID': '636', 'movieTitle': 'Escape from ..."
73,181,MOVIE,"{'movieID': '181', 'movieTitle': 'Return of th..."
74,300,MOVIE,"{'movieID': '300', 'movieTitle': 'Air Force On..."


In [7]:
# Normalize Data  
df_t1 = flat_table.normalize(df)
df_t1 # Output DataFrame

Unnamed: 0,index,v_id,v_type,attributes.genre,attributes.userRating,attributes.movieTitle,attributes.movieID
0,0,566,MOVIE,Thriller,0.39344,Clear and Present Danger (1994),566
1,0,566,MOVIE,Action,0.39344,Clear and Present Danger (1994),566
2,0,566,MOVIE,Adventure,0.39344,Clear and Present Danger (1994),566
3,1,568,MOVIE,Thriller,0.36875,Speed (1994),568
4,1,568,MOVIE,Action,0.36875,Speed (1994),568
...,...,...,...,...,...,...,...
193,73,181,MOVIE,Adventure,-2.99725,Return of the Jedi (1983),181
194,74,300,MOVIE,Thriller,0.39322,Air Force One (1997),300
195,74,300,MOVIE,Action,0.39322,Air Force One (1997),300
196,75,391,MOVIE,Action,1.26829,Last Action Hero (1993),391


### Prepping Data for TensorFlow

In [8]:
# Rename Columns
df_t2 = df_t1.rename(columns={
    'v_id':'ID',
    'v_type':'Type',
    'attributes.movieID':'movieID',
    'attributes.movieTitle':'movieTitle',
    'attributes.userRating':'userRating',
    'attributes.genre':'genre'
    })

df_t2 # Output DataFrame

Unnamed: 0,index,ID,Type,genre,userRating,movieTitle,movieID
0,0,566,MOVIE,Thriller,0.39344,Clear and Present Danger (1994),566
1,0,566,MOVIE,Action,0.39344,Clear and Present Danger (1994),566
2,0,566,MOVIE,Adventure,0.39344,Clear and Present Danger (1994),566
3,1,568,MOVIE,Thriller,0.36875,Speed (1994),568
4,1,568,MOVIE,Action,0.36875,Speed (1994),568
...,...,...,...,...,...,...,...
193,73,181,MOVIE,Adventure,-2.99725,Return of the Jedi (1983),181
194,74,300,MOVIE,Thriller,0.39322,Air Force One (1997),300
195,74,300,MOVIE,Action,0.39322,Air Force One (1997),300
196,75,391,MOVIE,Action,1.26829,Last Action Hero (1993),391


In [9]:
df_t3 = df_t2.pivot(index='movieID', columns='genre', values='userRating')
df_t3 = df_t3.fillna(0)
df_t3 = df_t3.rename(columns={
    'Children\'s':'Childrens'
    })
df_t3 # Output DataFrame

genre,Action,Adventure,Childrens,Comedy,Crime,Drama,Fantasy,Horror,Mystery,Romance,Sci-Fi,Thriller,War,Western
movieID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1034,0.84615,0.84615,0.0,0.0,0.0000,0.00000,0.0,0.0,0.0,0.0,0.00000,0.00000,0.00000,0.00000
11,0.00000,0.00000,0.0,0.0,0.1375,0.00000,0.0,0.0,0.0,0.0,0.00000,0.13750,0.00000,0.00000
117,0.39063,0.39063,0.0,0.0,0.0000,0.00000,0.0,0.0,0.0,0.0,0.00000,0.39063,0.00000,0.00000
118,0.77033,0.77033,0.0,0.0,0.0000,0.00000,0.0,0.0,0.0,0.0,0.00000,0.77033,0.00000,0.00000
121,-2.52098,0.00000,0.0,0.0,0.0000,0.00000,0.0,0.0,0.0,0.0,-2.52098,0.00000,-2.52098,0.00000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
82,1.23295,1.23295,0.0,0.0,0.0000,0.00000,0.0,0.0,0.0,0.0,1.23295,0.00000,0.00000,0.00000
825,0.03448,0.00000,0.0,0.0,0.0000,0.00000,0.0,0.0,0.0,0.0,0.03448,0.03448,0.00000,0.00000
827,-0.71111,-0.71111,0.0,0.0,0.0000,0.00000,0.0,0.0,0.0,0.0,0.00000,-0.71111,0.00000,0.00000
840,-1.61538,0.00000,0.0,0.0,0.0000,-1.61538,0.0,0.0,0.0,0.0,0.00000,0.00000,0.00000,-1.61538


In [10]:
#Putting dataframes together
df_t4 = pd.merge(df_t2, df_t3, how='outer', on=['movieID'])
df_t5 = df_t4.drop(columns=['genre'])

df_t6 = df_t5.drop_duplicates(subset ="movieID")
df_t6 # Output DataFrame

Unnamed: 0,index,ID,Type,userRating,movieTitle,movieID,Action,Adventure,Childrens,Comedy,Crime,Drama,Fantasy,Horror,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,0,566,MOVIE,0.39344,Clear and Present Danger (1994),566,0.39344,0.39344,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,0.00000,0.00000,0.39344,0.00000,0.00000
3,1,568,MOVIE,0.36875,Speed (1994),568,0.36875,0.00000,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,0.36875,0.00000,0.36875,0.00000,0.00000
6,2,840,MOVIE,-1.61538,Last Man Standing (1996),840,-1.61538,0.00000,0.0,0.00000,0.0,-1.61538,0.0,0.0,0.0,0.00000,0.00000,0.00000,0.00000,-1.61538
9,3,1034,MOVIE,0.84615,"Quest, The (1996)",1034,0.84615,0.84615,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,0.00000,0.00000,0.00000,0.00000,0.00000
11,4,79,MOVIE,0.96537,"Fugitive, The (1993)",79,0.96537,0.00000,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,0.00000,0.00000,0.96537,0.00000,0.00000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
182,71,825,MOVIE,0.03448,"Arrival, The (1996)",825,0.03448,0.00000,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,0.00000,0.03448,0.03448,0.00000,0.00000
185,72,636,MOVIE,-1.27273,Escape from New York (1981),636,-1.27273,-1.27273,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,0.00000,-1.27273,-1.27273,0.00000,0.00000
189,73,181,MOVIE,-2.99725,Return of the Jedi (1983),181,-2.99725,-2.99725,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,-2.99725,-2.99725,0.00000,-2.99725,0.00000
194,74,300,MOVIE,0.39322,Air Force One (1997),300,0.39322,0.00000,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,0.00000,0.00000,0.39322,0.00000,0.00000


## Setup TensorFlow

### Grab Features and Targets

In [0]:
# Features to grab
features = ['Action', 'Adventure', 'Childrens', 'Comedy', 'Crime', 'Drama', 'Fantasy', 'Horror', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

In [0]:
# Setting up your data for TensorFlow
dataset = (
    tf.data.Dataset.from_tensor_slices(
        (
            tf.cast(df_t6[features].values, tf.float32),
            tf.cast(df_t6['userRating'].values, tf.int32)
        )
    )
)

### Data Set Check


In [14]:
# Checking Data
for feat, targ in dataset.take(5):
    print('Features: {}, Target: {}'.format(feat, targ))

Features: [0.39344 0.39344 0.      0.      0.      0.      0.      0.      0.
 0.      0.      0.39344 0.      0.     ], Target: 0
Features: [0.36875 0.      0.      0.      0.      0.      0.      0.      0.
 0.36875 0.      0.36875 0.      0.     ], Target: 0
Features: [-1.61538  0.       0.       0.       0.      -1.61538  0.       0.
  0.       0.       0.       0.       0.      -1.61538], Target: -1
Features: [0.84615 0.84615 0.      0.      0.      0.      0.      0.      0.
 0.      0.      0.      0.      0.     ], Target: 0
Features: [0.96537 0.      0.      0.      0.      0.      0.      0.      0.
 0.      0.      0.96537 0.      0.     ], Target: 0


In [15]:
# Checking Data
tf.constant(df_t6['userRating'])

<tf.Tensor: id=23, shape=(76,), dtype=float64, numpy=
array([ 0.39344,  0.36875, -1.61538,  0.84615,  0.96537, -1.76744,
       -1.01622, -1.95714, -2.10526, -0.14063,  0.71264,  0.86538,
        0.77033,  0.06635, -0.35556,  1.03883, -0.25   , -3.24016,
        2.21053, -2.17347, -1.5    ,  1.23295, -0.54545, -1.29568,
        0.07692, -0.15702,  0.02679, -0.71111,  0.1375 ,  0.09091,
       -0.31765, -1.14563, -0.80769, -1.59155,  0.16846, -1.95541,
        1.38247, -1.53846, -1.33333, -0.46875,  0.85926,  0.06936,
       -1.1018 ,  0.81818, -2.39091,  0.12821, -1.22222,  1.61765,
       -1.04608, -0.43411,  2.     , -1.5625 , -0.72368,  0.61765,
       -2.52098,  1.16418, -0.02439,  1.68333,  0.14286, -3.36408,
        0.91241, -2.7932 ,  0.06404, -0.18085,  1.13043, -1.1954 ,
       -0.7    ,  1.64545,  0.1831 ,  0.39063,  1.20325,  0.03448,
       -1.27273, -2.99725,  0.39322,  1.26829])>

### Get Train and Test Data Set 

In [16]:
# Prep datasets, one for model creation, other for model validation
dataset = dataset.shuffle(len(df_t6)).batch(1)
train_dataset = dataset.take(int(len(df_t6)*.8))
test_dataset = dataset.skip(int(len(df_t6)*.8))
print(len(list(train_dataset)))

60


### Setup Model Parameters

In [0]:
# Setting up layers sizes, and defining which optimizer to use
def get_compiled_model():
  model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu'), #layer 1
    tf.keras.layers.Dense(20, activation='relu'), #layer 2
    tf.keras.layers.Dense(1, activation='sigmoid') #out
  ])

  model.compile(optimizer='adam', 
                loss='binary_crossentropy',
                metrics=['accuracy'])
  return model

## Train and Test Your Model

In [18]:
# Training Data
model = get_compiled_model()
model.fit(train_dataset, epochs=20)

Epoch 1/20
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fcf10a01b38>

In [19]:
# Testing Data
results = model.evaluate(test_dataset)

