Skip to content

Commit

Permalink
adds config description to help
Browse files Browse the repository at this point in the history
  • Loading branch information
Justin Sybrandt committed May 18, 2020
1 parent 74e88c9 commit b22e763
Showing 1 changed file with 99 additions and 2 deletions.
101 changes: 99 additions & 2 deletions docs/help/embed_semantic_graph.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,105 @@ will not be included in the output.
Note that the argument passed with `--relations` should be a string with
space-separated relationship types. Each relationship should be a two character
long string. Relationships are also directed in PTBG, meaning that if you would
like to select both UMLS -> predicate edges, as well as predicate -> UMLS edges,
you will need to specify both edge types.
like to select both `UMLS -> predicate` edges, as well as `predicate -> UMLS`
edges, you will need to specify both edge types.

**WARNING:** You will need to remember the order you list the relationships.
This will determine the order of relationships in the PTBG config.

## Create a PTBG Config

Now that you have converted the agatha semantic graph for PTBG, you now need to
write a configuration script. Here's the [official docs for the PTBG
config](https://torchbiggraph.readthedocs.io/en/latest/configuration_file.html).
The following is an example PTBG config. The parts you need to worry about occur
in the header section of the `get_torchbiggraph_config` function. You should
copy this and change what you need.

```python3
#!/usr/bin/env python3
def get_torchbiggraph_config():

# CHANGE THESE #########################################################

DATA_ROOT = "/path/to/data/root"
""" This is the location you specified with the `-o` flag when running
`convert_graph_for_pytorch_biggraph` That tools should have created
`DATA_ROOT/entities` and `DATA_ROOT/edges`. This process will create
`DATA_ROOT/embeddings`. """

PARTS = 100
""" This is the number of partitions that all nodes and edges have been
split between when running `convert_graph_for_pytorch_biggraph`. By default,
we create 100 partitions. If you specified `--partition-count` (`-c`), then
you need to change this value to reflect the new partition count. """

ENT_TYPES = "selmnp"
""" This is the set of entities specified when running
`convert_graph_for_pytorch_biggraph`. The above value is the default. If you
used the `--types` flag, then you need to set this value accordingly."""

RELATIONS = [ "ss", "se", "es", "sl", "ls", "sm", "ms", "sn", "ns", "sp",
"ps", "pn", "np", "pm", "mp", "pl", "lp", "pe", "ep" ]
""" This is the ordered list of relationships that you specified when
running `convert_graph_for_pytorch_biggraph`. The above is the default. If
you specified `--relations` then you need to set this value accordingly.
WARNING: The order of relationships matters! This list should be in the same
order as the relationships specified in the `--relations` argument.
"""

EMBEDDING_DIM = 512
""" This is the number of floats per embedding per node in the resulting
embedding. """

NUM_COMPUTE_NODES = 20
""" This is the number of computers used to compute the embedding. We find
that around 20 machines is the sweet spot. More or less result in slower
embeddings. """

THREADS_PER_NODE = 24
""" This is the number of threads that each machine will use to compute
embeddings. """

#########################################################################

config = dict(
# IO Paths
entity_path=DATA_ROOT+"/entities",
edge_paths=[DATA_ROOT+"/edges"],
checkpoint_path=DATA_ROOT+"/embeddings",

# Graph structure
entities={t: {'num_partitions': PARTS} for t in ENT_TYPES},
relations=[
dict(name=rel, lhs=rel[0], rhs=rel[1], operator='translation')
for rel in RELATIONS
],

# Scoring model
dimension=EMBEDDING_DIM,
comparator='dot',
bias=True,

# Training
num_epochs=5,
num_uniform_negs=50,
loss_fn='softmax',
lr=0.02,

# Evaluation during training
eval_fraction=0,

# One per allowed thread
workers=THREADS_PER_NODE,
num_machines=NUM_COMPUTE_NODES,
distributed_init_method="env://",
num_partition_servers=-1,
)

return config
```

## Launch the PTBG training cluster

Now you are ready to start training!

0 comments on commit b22e763

Please sign in to comment.