# A NEAT Demonstration

NEAT provides a way to define graph machine learning tasks with minimal coding, an uncomplicated interface, and a process created with cloud compute in mind.

This notebook provides a demonstration of how to set up a NEAT configuration file. We define the values for parameters, write them to a YAML file, then pass that file to NEAT to generate graph embeddings.

You're likely reading this notebook while within the NEAT repository. If you haven't installed NEAT yet, please do so now using the next code block.

In [None]:
%cd ..
!pip install .
%cd notebooks/

## Define graph parameters

For demonstration purposes, we'll use a copy of the [ECTO ontology](https://obofoundry.org/ontology/ecto.html), pre-processed to graph form by [KG-OBO](https://github.com/Knowledge-Graph-Hub/kg-obo).

In [None]:
!wget https://kg-hub.berkeleybop.io/kg-obo/ecto/2022-03-09/ecto_kgx_tsv.tar.gz
!tar xvzf ecto_kgx_tsv.tar.gz

Now define the following graph parameters or just use the default values.

In [None]:
directed = True # Yes, this is technically a directed network, but we'll treat it as undirected
node_path = "ecto_kgx_tsv_nodes.tsv"
edge_path = "ecto_kgx_tsv_edges.tsv"

## Define embedding parameters

These parameters are quite simple for purposes of the demonstration.

In [None]:
embedding_file_name = "demo_embeddings.tsv"
embedding_history_file_name = "embedding_history.json"
node_embedding_method_name = "CBOW" # one of 'CBOW', 'GloVe', 'SkipGram', 'Siamese', 'TransE', 'SimplE', 'TransH', 'TransR'
walk_length = 10 # typically 100 or so
batch_size = 128 # typically 512? or more
window_size = 4
iterations = 5 # typically 20 or more

## Define classifier parameters

Here, we define a single classifier, but NEAT will accept a list of multiple classifier types.

In [None]:
edge_method = "Average" # one of EdgeTransformer.methods: Hadamard, Sum, Average, L1, AbsoluteL1, L2, or alternatively a lambda
classifier_type = "Logistic Regression"
classifier_model_outfile = "model_lr_demo"
classifier_model_type = "sklearn.linear_model.LogisticRegression"
classifier_model_random_state = 42
classifier_model_max_iter = 1000

## Define output parameters

We specify a local output path here, but NEAT can also upload to S3, given a bucket name and directory.

In [None]:
output_directory = "./"

config_filename = "demonstrate.yaml"

## Wrap it all up

In [None]:
outstring = f"""
graph_data:
  graph:
    directed: {directed}
    node_path: {node_path}
    edge_path: {edge_path}
    verbose: True
    nodes_column: 'id'
    node_list_node_types_column: 'category'
    default_node_type: 'biolink:NamedThing'
    sources_column: 'subject'
    destinations_column: 'object'
    default_edge_type: 'biolink:related_to'

embeddings:
  embedding_file_name: {embedding_file_name}
  embedding_history_file_name: {embedding_history_file_name}
  node_embedding_params:
      node_embedding_method_name: {node_embedding_method_name}
      walk_length: {walk_length}
      batch_size: {batch_size}
      window_size: {window_size}
      return_weight: 1.0
      explore_weight: 1.0
      iterations: {iterations}
      use_mirrored_strategy: False

  tsne:
    tsne_file_name: tsne.png

classifier:
  edge_method: {edge_method}
  classifiers:
    - type: {classifier_type}
      model:
        outfile: {classifier_model_outfile}
        type: {classifier_model_type}
        parameters:
          random_state: {classifier_model_random_state}
          max_iter: {classifier_model_max_iter}

output_directory: {output_directory}
"""
print(outstring)

In [None]:
with open(config_filename, "w") as outfile:
    outfile.write(outstring)

In [None]:
!neat run --config $config_filename

In [None]:
from IPython.display import Image
Image(filename='tsne.png') 