Integrating a language model with ULTRA #9

daniel4x · 2023-12-24T18:56:30Z

Hi @migalkin,
First of all, Kudus for your work!!!! (both ULTRA and nodepiece 😄 ) .

I'm curious to hear your thoughts about integrating a language model (LM) with ULTRA.
Previously, with other KG models such as nodepiece, it was straightforward to integrate a language model to enrich the graph embeddings with textual embeddings.
I used to concat both the entity textual and graph representations and maybe apply additional layers to match the desired dimensions.

example:

# code from pykeen framework + modification
x_e, x_r = entity_representations[0](), self.relation_representations[0]()
indicies = torch.arange(self.text_representation.weight.data.shape[0])
x_e = self.merge_model(self.text_representation(indicies), x_e)  # Concat + linear layer

# Perform message passing and get updated states
for layer in self.gnn_encoder:
        x_e, x_r = layer(
            x_e=x_e,
            x_r=x_r,
            edge_index=getattr(self, f"{mode}_edge_index"),
            edge_type=getattr(self, f"{mode}_edge_type"),
        )

So far, it worked well and boosted the model's performance from ~50% when used with transE and up to ~30% with nodepiece on my datasets.

With ULTRA I guess that I have some additional work to do :)...
I started with understanding how the entity representation is "generated" on the fly:
https://github.com/DeepGraphLearning/ULTRA/blob/33c6e6b8e522aed3d33f6ce5d3a1883ca9284718/ultra/models.py#L166-L174C4

I understand that from that point only the tail representations are used to feed the MLP.

I replaced the MLP with my own MLP - to match the dim to the concatenation of both representations. Then, I tried to contact both, output from ULTRA with the textual entity representation. As far as I understand, due to this "late" concatenation only the tail entity textual representation will be used.
When tested, I got (almost) the same results with/without the textual representation.

Not sure what I expect to hear :), but I hope you may have an idea for combining both representations.

migalkin · 2023-12-24T20:56:55Z

Hi!

I understand that from that point only the tail representations are used to feed the MLP.

Those aren't really tail representations anymore because message passing updates all node states and starts with initial node states called boundary (where you can append LLM features):

ULTRA/ultra/models.py

Lines 137 to 140 in 33c6e6b

    
           # initial (boundary) condition - initialize all node states as zeros 
        
           boundary = torch.zeros(batch_size, data.num_nodes, self.dims[0], device=h_index.device) 
        
           # by the scatter operation we put query (relation) embeddings as init features of source (index) nodes 
        
           boundary.scatter_add_(1, index.unsqueeze(1), query.unsqueeze(1))

In the GNN layer code, we use those boundary states together with the current message (eg, with the sum aggregation):

ULTRA/ultra/layers.py

Lines 192 to 194 in 33c6e6b

    
           if self.aggregate_func == "sum": 
        
               update = generalized_rspmm(edge_index, edge_type, edge_weight, relation, input, sum="add", mul=mul) 
        
               update = update + boundary

That is, in each GNN layer we actually do have an interaction function of (initial) head and (current) node states.

Adding other entity/relation features seems quite straightforward, I see two possible ways:

Early interaction - you'd need to re-train the model from scratch because all weight matrices will be of different dimensions

You can add LLM features of relations right into the graph of relations (so structural and LLM features will be mixed in the GNN) by appending those extra features to the initialized boundary:

ULTRA/ultra/models.py

Lines 57 to 65 in 33c6e6b

    
           # initialize initial nodes (relations of interest in the batcj) with all ones 
        
           query = torch.ones(h_index.shape[0], self.dims[0], device=h_index.device, dtype=torch.float) 
        
           index = h_index.unsqueeze(-1).expand_as(query) 
        
           # initial (boundary) condition - initialize all node states as zeros 
        
           boundary = torch.zeros(batch_size, data.num_nodes, self.dims[0], device=h_index.device) 
        
           #boundary = torch.zeros(data.num_nodes, *query.shape, device=h_index.device) 
        
           # Indicator function: by the scatter operation we put ones as init features of source (index) nodes 
        
           boundary.scatter_add_(1, index.unsqueeze(1), query.unsqueeze(1))

Or you can take the basic relational features from the graph of relations and append relational features in the entity-level NBFNet:

ULTRA/ultra/models.py

Lines 182 to 184 in 33c6e6b

    
           # initialize relations in each NBFNet layer (with uinque projection internally) 
        
           for layer in self.layers: 
        
               layer.relation = relation_representations

Add entity-level LLM features to the boundary in the entity-level NBFNet:

ULTRA/ultra/models.py

Lines 137 to 140 in 33c6e6b

    
           # initial (boundary) condition - initialize all node states as zeros 
        
           boundary = torch.zeros(batch_size, data.num_nodes, self.dims[0], device=h_index.device) 
        
           # by the scatter operation we put query (relation) embeddings as init features of source (index) nodes 
        
           boundary.scatter_add_(1, index.unsqueeze(1), query.unsqueeze(1))

Late interaction - you can freeze the main GNN models and only change the final MLP by adding LLM features to the output:

ULTRA/ultra/models.py

Lines 199 to 200 in 33c6e6b

    
           output = self.bellmanford(data, h_index[:, 0], r_index[:, 0])  # (num_nodes, batch_size, feature_dim） 
        
           feature = output["node_feature"]

This is a less expressive way but won't require re-training the model from scratch.

In any case, if your graphs have >10k nodes, I'd recommend projecting down the LLM features (usually 768d or more, depends on the LLM) to smaller dimension (32/64d) in order to fit the full-batch GNN layer onto a GPU.

daniel4x · 2024-01-13T08:52:52Z

Just an update, I tested all three suggested methods.
I'll add a side branch later if you are interested, along with an example of language model integration for future reference.

Generally, the pre-trained embedding was added to the EntityNBFNet __init__:

ULTRA/ultra/models.py

Lines 106 to 127 in 04d5c13

def __init__(self, input_dim, hidden_dims, num_relation=1, **kwargs):

# dummy num_relation = 1 as we won't use it in the NBFNet layer

super().__init__(input_dim, hidden_dims, num_relation, **kwargs)

self.layers = nn.ModuleList()

for i in range(len(self.dims) - 1):

self.layers.append(

layers.GeneralizedRelationalConv(

self.dims[i], self.dims[i + 1], num_relation,

self.dims[0], self.message_func, self.aggregate_func, self.layer_norm,

self.activation, dependent=False, project_relations=True)

)

feature_dim = (sum(hidden_dims) if self.concat_hidden else hidden_dims[-1]) + input_dim

self.mlp = nn.Sequential()

mlp = []

for i in range(self.num_mlp_layers - 1):

mlp.append(nn.Linear(feature_dim, feature_dim))

mlp.append(nn.ReLU())

mlp.append(nn.Linear(feature_dim, 1))

self.mlp = nn.Sequential(*mlp)

Slightly modified the code and added the following:

        if lm_vectors is not None:
            # can decide whether to freeze or not...
            self.lm_vectors = nn.Embedding.from_pretrained(lm_vectors, freeze=True)
            self.merge_linear = nn.Linear(feature_dim, 64)

Per your 1st suggestion, it seems like training from scratch with the following:

       # .....original code.....
        boundary.scatter_add_(1, index.unsqueeze(1), query.unsqueeze(1))
        
        # interaction of boundary with lm vectors
        if self.lm_vectors is not None:
            lm_vectors = self.lm_vectors(h_index)  # mistake - see @migalkin
            lm_vectors = lm_vectors.unsqueeze(1).expand(-1, data.num_nodes, -1)
            boundary = torch.cat([boundary, lm_vectors], dim=-1)
            boundary = self.merge_linear(boundary)

The merge_linear may not be the best option and can be modified with any other interaction to fit into the Conv layers.

migalkin · 2024-01-13T10:24:51Z

Those lines

lm_vectors = self.lm_vectors(h_index)
lm_vectors = lm_vectors.unsqueeze(1).expand(-1, data.num_nodes, -1)

would take only lm features of head nodes in the batch and copy them to all nodes in the graph - is this what you want?

If you want to initialize each node with its own lm feature, then you don't need to call the embedding layer and just take all of its weights (and repeat along the batch dimension) like self.lm_vectors.weight.repeat(bs, 1, 1) or sending the whole number of nodes as the index self.lm_vectors(torch.arange(data.num_nodes)).repeat(bs, 1, 1)

dhall1995 · 2024-01-24T09:48:09Z

@daniel4x Have you made this into a separate branch? I would be really interested to see your code and hear which of the integration methods performed best for you.

In my use case I have a mixture of LLM features (edges) and a set of pre-trained embedding features for each of the node types. Your experience that node features offer significant performance benefits tallies a lot with mine so it would be great to integrate this into my code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating a language model with ULTRA #9

Integrating a language model with ULTRA #9

daniel4x commented Dec 24, 2023 •

edited

migalkin commented Dec 24, 2023

daniel4x commented Jan 13, 2024 •

edited

migalkin commented Jan 13, 2024 •

edited

dhall1995 commented Jan 24, 2024

Integrating a language model with ULTRA #9

Integrating a language model with ULTRA #9

Comments

daniel4x commented Dec 24, 2023 • edited

migalkin commented Dec 24, 2023

daniel4x commented Jan 13, 2024 • edited

migalkin commented Jan 13, 2024 • edited

dhall1995 commented Jan 24, 2024

daniel4x commented Dec 24, 2023 •

edited

daniel4x commented Jan 13, 2024 •

edited

migalkin commented Jan 13, 2024 •

edited