Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating a language model with ULTRA #9

Open
daniel4x opened this issue Dec 24, 2023 · 4 comments
Open

Integrating a language model with ULTRA #9

daniel4x opened this issue Dec 24, 2023 · 4 comments

Comments

@daniel4x
Copy link
Contributor

daniel4x commented Dec 24, 2023

Hi @migalkin,
First of all, Kudus for your work!!!! (both ULTRA and nodepiece 😄 ) .

I'm curious to hear your thoughts about integrating a language model (LM) with ULTRA.
Previously, with other KG models such as nodepiece, it was straightforward to integrate a language model to enrich the graph embeddings with textual embeddings.
I used to concat both the entity textual and graph representations and maybe apply additional layers to match the desired dimensions.

example:

# code from pykeen framework + modification
x_e, x_r = entity_representations[0](), self.relation_representations[0]()
indicies = torch.arange(self.text_representation.weight.data.shape[0])
x_e = self.merge_model(self.text_representation(indicies), x_e)  # Concat + linear layer

# Perform message passing and get updated states
for layer in self.gnn_encoder:
        x_e, x_r = layer(
            x_e=x_e,
            x_r=x_r,
            edge_index=getattr(self, f"{mode}_edge_index"),
            edge_type=getattr(self, f"{mode}_edge_type"),
        )

So far, it worked well and boosted the model's performance from ~50% when used with transE and up to ~30% with nodepiece on my datasets.

With ULTRA I guess that I have some additional work to do :)...
I started with understanding how the entity representation is "generated" on the fly:
https://github.com/DeepGraphLearning/ULTRA/blob/33c6e6b8e522aed3d33f6ce5d3a1883ca9284718/ultra/models.py#L166-L174C4

I understand that from that point only the tail representations are used to feed the MLP.

I replaced the MLP with my own MLP - to match the dim to the concatenation of both representations. Then, I tried to contact both, output from ULTRA with the textual entity representation. As far as I understand, due to this "late" concatenation only the tail entity textual representation will be used.
When tested, I got (almost) the same results with/without the textual representation.

Not sure what I expect to hear :), but I hope you may have an idea for combining both representations.

@migalkin
Copy link
Collaborator

Hi!

I understand that from that point only the tail representations are used to feed the MLP.

Those aren't really tail representations anymore because message passing updates all node states and starts with initial node states called boundary (where you can append LLM features):

ULTRA/ultra/models.py

Lines 137 to 140 in 33c6e6b

# initial (boundary) condition - initialize all node states as zeros
boundary = torch.zeros(batch_size, data.num_nodes, self.dims[0], device=h_index.device)
# by the scatter operation we put query (relation) embeddings as init features of source (index) nodes
boundary.scatter_add_(1, index.unsqueeze(1), query.unsqueeze(1))

In the GNN layer code, we use those boundary states together with the current message (eg, with the sum aggregation):

ULTRA/ultra/layers.py

Lines 192 to 194 in 33c6e6b

if self.aggregate_func == "sum":
update = generalized_rspmm(edge_index, edge_type, edge_weight, relation, input, sum="add", mul=mul)
update = update + boundary

That is, in each GNN layer we actually do have an interaction function of (initial) head and (current) node states.

Adding other entity/relation features seems quite straightforward, I see two possible ways:

  1. Early interaction - you'd need to re-train the model from scratch because all weight matrices will be of different dimensions

    • You can add LLM features of relations right into the graph of relations (so structural and LLM features will be mixed in the GNN) by appending those extra features to the initialized boundary:

      ULTRA/ultra/models.py

      Lines 57 to 65 in 33c6e6b

      # initialize initial nodes (relations of interest in the batcj) with all ones
      query = torch.ones(h_index.shape[0], self.dims[0], device=h_index.device, dtype=torch.float)
      index = h_index.unsqueeze(-1).expand_as(query)
      # initial (boundary) condition - initialize all node states as zeros
      boundary = torch.zeros(batch_size, data.num_nodes, self.dims[0], device=h_index.device)
      #boundary = torch.zeros(data.num_nodes, *query.shape, device=h_index.device)
      # Indicator function: by the scatter operation we put ones as init features of source (index) nodes
      boundary.scatter_add_(1, index.unsqueeze(1), query.unsqueeze(1))
    • Or you can take the basic relational features from the graph of relations and append relational features in the entity-level NBFNet:

      ULTRA/ultra/models.py

      Lines 182 to 184 in 33c6e6b

      # initialize relations in each NBFNet layer (with uinque projection internally)
      for layer in self.layers:
      layer.relation = relation_representations
    • Add entity-level LLM features to the boundary in the entity-level NBFNet:

      ULTRA/ultra/models.py

      Lines 137 to 140 in 33c6e6b

      # initial (boundary) condition - initialize all node states as zeros
      boundary = torch.zeros(batch_size, data.num_nodes, self.dims[0], device=h_index.device)
      # by the scatter operation we put query (relation) embeddings as init features of source (index) nodes
      boundary.scatter_add_(1, index.unsqueeze(1), query.unsqueeze(1))
  2. Late interaction - you can freeze the main GNN models and only change the final MLP by adding LLM features to the output:

    ULTRA/ultra/models.py

    Lines 199 to 200 in 33c6e6b

    output = self.bellmanford(data, h_index[:, 0], r_index[:, 0]) # (num_nodes, batch_size, feature_dim)
    feature = output["node_feature"]

This is a less expressive way but won't require re-training the model from scratch.

In any case, if your graphs have >10k nodes, I'd recommend projecting down the LLM features (usually 768d or more, depends on the LLM) to smaller dimension (32/64d) in order to fit the full-batch GNN layer onto a GPU.

@daniel4x
Copy link
Contributor Author

daniel4x commented Jan 13, 2024

Just an update, I tested all three suggested methods.
I'll add a side branch later if you are interested, along with an example of language model integration for future reference.

Generally, the pre-trained embedding was added to the EntityNBFNet __init__:

ULTRA/ultra/models.py

Lines 106 to 127 in 04d5c13

def __init__(self, input_dim, hidden_dims, num_relation=1, **kwargs):
# dummy num_relation = 1 as we won't use it in the NBFNet layer
super().__init__(input_dim, hidden_dims, num_relation, **kwargs)
self.layers = nn.ModuleList()
for i in range(len(self.dims) - 1):
self.layers.append(
layers.GeneralizedRelationalConv(
self.dims[i], self.dims[i + 1], num_relation,
self.dims[0], self.message_func, self.aggregate_func, self.layer_norm,
self.activation, dependent=False, project_relations=True)
)
feature_dim = (sum(hidden_dims) if self.concat_hidden else hidden_dims[-1]) + input_dim
self.mlp = nn.Sequential()
mlp = []
for i in range(self.num_mlp_layers - 1):
mlp.append(nn.Linear(feature_dim, feature_dim))
mlp.append(nn.ReLU())
mlp.append(nn.Linear(feature_dim, 1))
self.mlp = nn.Sequential(*mlp)

Slightly modified the code and added the following:

        if lm_vectors is not None:
            # can decide whether to freeze or not...
            self.lm_vectors = nn.Embedding.from_pretrained(lm_vectors, freeze=True)
            self.merge_linear = nn.Linear(feature_dim, 64)

Per your 1st suggestion, it seems like training from scratch with the following:

       # .....original code.....
        boundary.scatter_add_(1, index.unsqueeze(1), query.unsqueeze(1))
        
        # interaction of boundary with lm vectors
        if self.lm_vectors is not None:
            lm_vectors = self.lm_vectors(h_index)  # mistake - see @migalkin
            lm_vectors = lm_vectors.unsqueeze(1).expand(-1, data.num_nodes, -1)
            boundary = torch.cat([boundary, lm_vectors], dim=-1)
            boundary = self.merge_linear(boundary)

The merge_linear may not be the best option and can be modified with any other interaction to fit into the Conv layers.

@migalkin
Copy link
Collaborator

migalkin commented Jan 13, 2024

Those lines

lm_vectors = self.lm_vectors(h_index)
lm_vectors = lm_vectors.unsqueeze(1).expand(-1, data.num_nodes, -1)

would take only lm features of head nodes in the batch and copy them to all nodes in the graph - is this what you want?

If you want to initialize each node with its own lm feature, then you don't need to call the embedding layer and just take all of its weights (and repeat along the batch dimension) like self.lm_vectors.weight.repeat(bs, 1, 1) or sending the whole number of nodes as the index self.lm_vectors(torch.arange(data.num_nodes)).repeat(bs, 1, 1)

@dhall1995
Copy link

@daniel4x Have you made this into a separate branch? I would be really interested to see your code and hear which of the integration methods performed best for you.

In my use case I have a mixture of LLM features (edges) and a set of pre-trained embedding features for each of the node types. Your experience that node features offer significant performance benefits tallies a lot with mine so it would be great to integrate this into my code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants