# Featurizer
This notebook demonstrates how to use `pyTigerGraph` for feature engineering tasks on graphs stored in `TigerGraph`.

**NOTE**: Currently, your database needs to be activated (only once) to enjoy all the functions provided by the ML Workbench. If you are using ML Workbench on Cloud, then the activator is included and you can run the cell below (uncomment first) to activate. For other versions of the Workbench, you can download the activator at https://act.tigergraphlabs.com. Detailed instructions are also included on that website. 

In [None]:
# Uncomment below and fill out the necessary information. For detailed instructions, please see https://act.tigergraphlabs.com
# !mlwb activate [database address] -u [username] -p [password] -s [secret]

## Connection to Database

The `TigerGraphConnection` class represents a connection to the TigerGraph database. Under the hood, it stores the necessary information to communicate with the database. It is able to perform quite a few database tasks. Please see its [documentation](https://docs.tigergraph.com/pytigergraph/current/intro/) for details.

In [None]:
from pyTigerGraph import TigerGraphConnection

conn = TigerGraphConnection(
    host="http://127.0.0.1", # Change the address to your database server's
    graphname="Cora",
    username="tigergraph",
    password="tigergraph",
)

<span style="color:red">Uncomment cell below and run to get and set token if token authentication is enabled</span>. 
* This is required for all databases on tgcloud.
* `<secret>` is your user secret. See https://docs.tigergraph.com/tigergraph-server/current/user-access/managing-credentials#_secrets for details.
* If you don't know your secret, you can use `secret=conn.createSecret()` to create one.

In [None]:
#conn.getToken(<secret>)

In [None]:
# Graph schema and other information.
print(conn.gsql("ls"))

In [None]:
# Number of vertices for every vertex type
conn.getVertexCount('*')

In [None]:
# Number of vertices of a specific type
conn.getVertexCount("Paper")

In [None]:
# Number of edges for every type
conn.getEdgeCount()

In [None]:
# Number of edges of a specific type
conn.getEdgeCount("Cite")

## Feature Engineering
The ML Workbench includes quite a few graph algorithms to perform feature engineering tasks. The key functions are:

1. `listAlgorithm()`: If it gets the class of algorithms (e.g. Centrality) as an input, it will print the available algorithms for    the specified category; otherwise will print all available algorithms. 
2. `installAlgorithm()`: Gets tha name of the algorithmm as input and installs the algorithm if it is not already installed. 
3. `runAlgorithmm()`: Gets the algorithm name, schema type (e.g. vertex/edge, by default it is vertex), attribute name (if the result needs to be stored as an attribute in the database), and a list of schema type names (list of vertices/edges that the attribute needs to be saved in, by default it is for all vertices/edges).  

In [None]:
f = conn.gds.featurizer()

In [None]:
f.listAlgorithms()

### Examples of running graph algorithms from GDS library
In the following, one example of each class of algorithms are provided. Some algorithms will generate a feature per vertex/edge; other algorithms will calculate a number or statistics about the graph. For example, the common neighbor algorithm calculates the number of common neighbors between two vertices.

#### Get Pagerank as a feature
The pagerank is available in GDS library called tg_pagerank under the class of centrality algorithms https://github.com/tigergraph/gsql-graph-algorithms/blob/master/algorithms/Centrality/pagerank/global/unweighted/tg_pagerank.gsql.

In [None]:
f.installAlgorithm("tg_pagerank")

In [None]:
params = {'v_type': 'Paper', 'e_type': 'Cite', 'max_change': 0.001, 'max_iter': 25, 'damping': 0.85,
          'top_k': 10, 'print_accum': True, 'result_attr': '', 'file_path': '', 'display_edges': False}

f.runAlgorithm(
    'tg_pagerank', 
    params=params, 
    feat_name="pagerank",
    global_schema=False, # If a global schema change is needed to add an attribute, set it to True.
    timeout=2147480, 
    sizeLimit=2000000
)


#### Run Maximal Independent Set
The Maximal Independent Set algorithm is available in GDS library called tg_maximal_indep_set under the class of classification algorithms https://github.com/tigergraph/gsql-graph-algorithms/blob/master/algorithms/Classification/maximal_independent_set/deterministic/tg_maximal_indep_set.gsql.

In [None]:
f.installAlgorithm("tg_maximal_indep_set")

In [None]:
params = {'v_type': 'Paper', 'e_type': 'Cite',
          'max_iter': 100, 'print_accum': False, 'file_path': ''}

f.runAlgorithm('tg_maximal_indep_set', params=params)


#### Get Louvain as a feature
The Louvain algorithm is available in GDS library called tg_louvain under the class of community detection algorithms  https://github.com/tigergraph/gsql-graph-algorithms/blob/master/algorithms/Community/louvain/tg_louvain.gsql.

In [None]:
f.installAlgorithm(query_name='tg_louvain')

In [None]:
params = {'v_type': 'Paper', 'e_type': ['Cite', 'reverse_Cite'], 'wt_attr': "",
          'max_iter': 10, 'result_attr': "cid", 'file_path': "", 'print_info': True}

f.runAlgorithm(
    'tg_louvain', 
    params, 
    feat_name="cid",
    global_schema=False # If a global schema change is needed to add an attribute, set it to True.
)


#### Get fastRP as a feature
The fastRP algorithm is available in GDS library called tg_fastRP under the class of graph embedding algorithms  https://github.com/tigergraph/gsql-graph-algorithms/blob/master/algorithms/GraphML/Embeddings/FastRP/tg_fastRP.gsql.

In [None]:
f.installAlgorithm(
    "tg_fastRP", 
    global_change = False # If a global schema change is needed to add an attribute, set it to True.
)

In [None]:
params = {'v_type': 'Paper', 'e_type': ['Cite', 'reverse_Cite'], 'weights': '1,1,2', 'beta': -0.85, 'k': 3, 'reduced_dim': 128,
          'sampling_constant': 1, 'random_seed': 42, 'print_accum': False, 'result_attr': "", 'file_path': ""}
f.runAlgorithm(
    'tg_fastRP', 
    params, 
    feat_name="fastrp_embedding"
)


#### Run Breadth-First Search Algorithm from a single source node
The Breadth-First Search algorithm is available in GDS library called tg_bfs under the class of Path algorithms https://github.com/tigergraph/gsql-graph-algorithms/blob/master/algorithms/Path/bfs/tg_bfs.gsql.

In [None]:
f.installAlgorithm(query_name='tg_bfs')

In [None]:
params = {'v_type': 'Paper', 'e_type': ['Cite', 'reverse_Cite'], 'max_hops': 10, "v_start": ("2180", "Paper"),
          'print_accum': False, 'result_attr': "", 'file_path': "", 'display_edges': False}

f.runAlgorithm('tg_bfs', params, feat_name="bfs")


#### Calculates the number of common neighbors between two vertices
The common neighbors algorithm is available in GDS library called tg_common_neighbors under the class of Topological Link Prediction algorithms https://github.com/tigergraph/gsql-graph-algorithms/blob/master/algorithms/Topological%20Link%20Prediction/common_neighbors/tg_common_neighbors.gsql


In [None]:
f.installAlgorithm(query_name='tg_common_neighbors')  

In [None]:
params = {"a": ("2180", "Paper"), "b": ("431", "Paper"),
          "e_type": "Cite", "print_res": True}

f.runAlgorithm('tg_common_neighbors', params)


### Use User Defined Query

In [None]:
user_defined_query1 = '''CREATE QUERY user_defined_query1() FOR GRAPH Cora { 
  PRINT "user_defined_query works!"; 
}'''

print(user_defined_query1)

In [None]:
outFileName="./user_defined_query1.gsql"
outFile=open(outFileName, "w")
outFile.write(user_defined_query1)
outFile.close()

In [None]:
f.installAlgorithm(query_name="user_defined_query1", query_path="./user_defined_query1.gsql" )

In [None]:
f.runAlgorithm(query_name="user_defined_query1", custom_query=True)