You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice if we could have an interface to extract atomic encodings from the last layer before they are aggregated to make molecular fingerprint.
Desired solution/workflow
Provide clear interface similar to chemprop.fingerprint to get atomic encodings. Perhaps something like chemprop.atomic_encodings(molecule) -> List of vectors (atomic encodings)
Discussion
This would make transfer learning using chemprop for atomistic predictions much easier. It should be almost trivial to implement because we already have a function that returns aggregated encodings as molecular fingerprint.
Additional context
Some other libraries already provide this feature and it's proving useful. The community would greatly benefit from this feature.
The text was updated successfully, but these errors were encountered:
Hi @zarkoivkovicc could you link to some of the mentioned other libraries that implement this feature so that we could learn from them? I will also add that we are not accepting feature requests for v1, so this feature would be implemented in v2.
Is there any potential relase date for v2? I am currently working on the project that compares different atomic representations from latent spaces of different models. This should be relatively easy to implement, but I can't do it alone because the code base is too large and I don't have much time
You can already do this in v2 without any new features:
importchempropimportlightningasLfromtorch_scatterimportscatter_sumtrainer: L.Trainer= ...
model: chemprop.MPNN= ...
train, val: tuple[DataLoader, DataLoader] = ...
trainer.train(model, train, val) # fit your MPNNH_vs= []
forbatchintrain: # could use any dataloader herebmg, V_d, *_=batchH_v=model.message_passing(bmg, V_d)
split_sizes=scatter_sum(torch.ones_like(bmg.batch), bmg.batch, dim=0, dim_size=len(bmg)).to_list()
H_vs.extend(H_v.split(split_sizes))
H_vs is a list[Tensor] of shape $n \times \ast \times d_h$, where $n$ is the number of molecules in the dataloader, $\ast$ is the number of atoms in each molecule, and $d_h$ is the encoding dimensionality.
It would be nice if we could have an interface to extract atomic encodings from the last layer before they are aggregated to make molecular fingerprint.
Desired solution/workflow
Provide clear interface similar to chemprop.fingerprint to get atomic encodings. Perhaps something like chemprop.atomic_encodings(molecule) -> List of vectors (atomic encodings)
Discussion
This would make transfer learning using chemprop for atomistic predictions much easier. It should be almost trivial to implement because we already have a function that returns aggregated encodings as molecular fingerprint.
Additional context
Some other libraries already provide this feature and it's proving useful. The community would greatly benefit from this feature.
The text was updated successfully, but these errors were encountered: