Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Extract atomic encodings before aggregating them into molecular encodings #722

Open
zarkoivkovicc opened this issue Mar 12, 2024 · 3 comments
Assignees
Labels
enhancement a new feature request
Milestone

Comments

@zarkoivkovicc
Copy link

It would be nice if we could have an interface to extract atomic encodings from the last layer before they are aggregated to make molecular fingerprint.

Desired solution/workflow
Provide clear interface similar to chemprop.fingerprint to get atomic encodings. Perhaps something like chemprop.atomic_encodings(molecule) -> List of vectors (atomic encodings)

Discussion
This would make transfer learning using chemprop for atomistic predictions much easier. It should be almost trivial to implement because we already have a function that returns aggregated encodings as molecular fingerprint.

Additional context
Some other libraries already provide this feature and it's proving useful. The community would greatly benefit from this feature.

@zarkoivkovicc zarkoivkovicc added the enhancement a new feature request label Mar 12, 2024
@JacksonBurns
Copy link
Member

Hi @zarkoivkovicc could you link to some of the mentioned other libraries that implement this feature so that we could learn from them? I will also add that we are not accepting feature requests for v1, so this feature would be implemented in v2.

@zarkoivkovicc
Copy link
Author

Hi @JacksonBurns thank for a fast answer. Sure, here are some examples:

unimol
MACE

Is there any potential relase date for v2? I am currently working on the project that compares different atomic representations from latent spaces of different models. This should be relatively easy to implement, but I can't do it alone because the code base is too large and I don't have much time

@davidegraff
Copy link
Contributor

You can already do this in v2 without any new features:

import chemprop
import lightning as L
from torch_scatter import scatter_sum

trainer: L.Trainer = ...
model: chemprop.MPNN = ...
train, val: tuple[DataLoader, DataLoader] = ...

trainer.train(model, train, val) # fit your MPNN

H_vs = []
for batch in train: # could use any dataloader here
    bmg, V_d, *_ = batch
    H_v = model.message_passing(bmg, V_d)
    split_sizes = scatter_sum(torch.ones_like(bmg.batch), bmg.batch, dim=0, dim_size=len(bmg)).to_list()
    H_vs.extend(H_v.split(split_sizes))

H_vs is a list[Tensor] of shape $n \times \ast \times d_h$, where $n$ is the number of molecules in the dataloader, $\ast$ is the number of atoms in each molecule, and $d_h$ is the encoding dimensionality.

@kevingreenman kevingreenman added this to the v2.0.0 milestone Mar 12, 2024
@KnathanM KnathanM modified the milestones: v2.0.0, v2.1.0 Apr 4, 2024
@KnathanM KnathanM mentioned this issue May 9, 2024
2 tasks
@KnathanM KnathanM self-assigned this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement a new feature request
Projects
None yet
Development

No branches or pull requests

5 participants