Multitask ASE interface #224

laserkelvin · 2024-05-24T15:28:25Z

This PR closes #217 by providing an extended interface for mapping multitask models to ase workflows.

The main changes include:

Introduces the matsciml.interfaces.ase.multitask.AbstractStrategy class, whose children allows us to implement ways to aggregate the outputs of multitasks.
Implements the sole concrete class (for now), AverageTasks, which does a mean aggregation across the different dataset heads to compute energies/forces. Essentially, if you had ForceRegressionTask trained on multiple datasets, we outputs from all.
Implements a custom ase_calculate method for MultiTaskModule, which effectively streamlines inference using multitask modules in scenarios like as an ASE calculator. It's named specifically for ase, but in the future we can use it as a dedicated inference method.

The AbstractStrategy should be flexible enough to do other potentially more intelligent aggregations, e.g. a weighted average, MoE approaches, etc.

Signed-off-by: Lee, Kin Long Kelvin <kin.long.kelvin.lee@intel.com>

This isn't actually originally part of the PR scope, but apparently this typo never triggered as an issue until now!

This substantially simplifies the workflow, albeit adds yet another method to multitask modules. The new method simply passes input data into the encoder, and maps it to every single subtask regardless, instead of requiring that the batch shares the dataset keys.

laserkelvin · 2024-05-24T15:45:08Z

Ergh, issue with one of the tests...

This adjusts the logic, albeit maybe inconsistent with the rest of multitask, where we check the incoming batch for dataset names at the top level to determine if it's a multidata batch, instead of relying on the model expectations. This fixes the ase calculate behavior, which would have been mismatched since the module is inherently multidata but the incoming batch is not.

melo-gonzo · 2024-05-24T18:22:06Z

matsciml/models/base.py

@@ -2107,7 +2107,7 @@ def __init__(
            if index != 0:
                task.encoder = self.encoder
            # nest the task based on its category
-            task_map[dset_name][task.__task__] = task
+            task_map[dset_name][task.__class__.__name__] = task


Was this added for the single data multi-task case?

Not 100% sure what you mean - I don't think this part of the code matters if it's single data or not?

melo-gonzo

Overall looks good, I'll get some practical testing done once this is merged. Can you add an example similar to ase_from_pretrained.py? Otherwise good to merge!

Signed-off-by: Lee, Kin Long Kelvin <kin.long.kelvin.lee@intel.com>

laserkelvin · 2024-05-28T14:58:53Z

@melo-gonzo just wanted you to take a look at the example before I merge, and if you wanted to clarify what you meant in your question?

melo-gonzo · 2024-05-28T15:10:59Z

@laserkelvin thanks for the example! I am just curious why you made the change in that line. Feel free to merge ✅

laserkelvin added 16 commits May 16, 2024 15:10

feat: added a base method for extracting multitask results

c965377

refactor: setting signature for merge output function

aed05b2

refactor: returning per-key results

0f3066d

refactor: adding abstract run and __call__

a0cb969

feat: added merge method for averaging method

907a403

feat: added run call for averaging strategy

d542fa3

feat: defining __all__ in multitask strategies

db53652

refactor: adding multi task strategy interface in calculator

8c0ed40

refactor: adding multi task strategy application to calculate

17aa627

test: added unit tests for multi task aggregations

c982773

Signed-off-by: Lee, Kin Long Kelvin <kin.long.kelvin.lee@intel.com>

test: added tests to check force output shape

e5cdb8e

refactor: added temporary step to ensure force key consistency

951c124

refactor: making multitask output keys refer to task class names

3bcc446

test: updating test to make things work

b0d161e

Signed-off-by: Lee, Kin Long Kelvin <kin.long.kelvin.lee@intel.com>

fix: correcting graph key retrieval from batch

c7c25ff

This isn't actually originally part of the PR scope, but apparently this typo never triggered as an issue until now!

laserkelvin added enhancement New feature or request inference Issues related to model inference and testing labels May 24, 2024

laserkelvin requested a review from melo-gonzo May 24, 2024 15:28

laserkelvin added 2 commits May 24, 2024 08:45

fix: correcting graph get retrieval

2f8894c

melo-gonzo reviewed May 24, 2024

View reviewed changes

melo-gonzo approved these changes May 24, 2024

View reviewed changes

script: added pretrained example from multitask

22c5f36

Signed-off-by: Lee, Kin Long Kelvin <kin.long.kelvin.lee@intel.com>

laserkelvin merged commit 92f1600 into IntelLabs:main May 28, 2024
3 of 4 checks passed

laserkelvin deleted the multitask-ase-interface branch May 28, 2024 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multitask ASE interface #224

Multitask ASE interface #224

laserkelvin commented May 24, 2024

laserkelvin commented May 24, 2024

melo-gonzo May 24, 2024

laserkelvin May 28, 2024

melo-gonzo left a comment

laserkelvin commented May 28, 2024

melo-gonzo commented May 28, 2024

Multitask ASE interface #224

Multitask ASE interface #224

Conversation

laserkelvin commented May 24, 2024

laserkelvin commented May 24, 2024

melo-gonzo May 24, 2024

Choose a reason for hiding this comment

laserkelvin May 28, 2024

Choose a reason for hiding this comment

melo-gonzo left a comment

Choose a reason for hiding this comment

laserkelvin commented May 28, 2024

melo-gonzo commented May 28, 2024