Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading ASE calculator from model trained with basic config template fails #211

Closed
alikhamze opened this issue Dec 4, 2023 · 5 comments
Closed

Comments

@alikhamze
Copy link

alikhamze commented Dec 4, 2023

Hello,

I am trying to create an ASE calculator from a trained model like this:

from apax.md.ase_calc import ASECalculator
calc = ASECalculator(Path("test_model-1/"))

Inside test_model-1 is the config file used to train the model, as well as the outputs (in models/apax):

n_epochs: 2000

data:
  directory: models
  experiment: apax
  train_data_path: "./training.traj"
  val_data_path: "validation.traj"

  n_train: 7616
  n_valid: 853
  batch_size: 32
  valid_batch_size: 100

metrics:
  - name: energy
    reductions: [mae]
  - name: forces
    reductions: [mae, mse]

loss:
  - name: energy
  - name: forces
    weight: 4.0

This config file is the one generated by apax template train with the training and validation data paths filled in and number of training and validation structures updated.

When I try to make the calculator (to use for testing the model), I get this error:

----> 1 ASECalculator(Path("test_model-1/"))

File ~/venv3.10-apax/lib/python3.10/site-packages/apax/md/ase_calc.py:117, in ASECalculator.__init__(self, model_dir, dr_threshold, transformations, padding_factor, **kwargs)
    114 self.transformations = transformations
    115 self.n_models = 1 if isinstance(model_dir, (Path, str)) else len(model_dir)
--> 117 self.model_config, self.params = restore_parameters(model_dir)
    118 self.padding_factor = padding_factor
    120 if self.model_config.model.calc_stress:

File ~/venv3.10-apax/lib/python3.10/site-packages/apax/train/checkpoints.py:102, in restore_parameters(model_dir)
     98 """Restores one or more model configs and parameters.
     99 Parameters are stacked for ensembling.
    100 """
    101 if isinstance(model_dir, Path) or isinstance(model_dir, str):
--> 102     config, params = restore_single_parameters(model_dir)
    104 elif isinstance(model_dir, list):
    105     param_list = []

File ~/venv3.10-apax/lib/python3.10/site-packages/apax/train/checkpoints.py:94, in restore_single_parameters(model_dir)
     92 model_config = parse_config(Path(model_dir) / "config.yaml")
     93 ckpt_dir = model_config.data.model_version_path()
---> 94 return model_config, load_params(ckpt_dir)

File ~/venv3.10-apax/lib/python3.10/site-packages/apax/train/checkpoints.py:84, in load_params(model_version_path, best)
     82 except FileNotFoundError:
     83     print(f"No checkpoint found at {model_version_path}")
---> 84 params = jax.tree_map(jnp.asarray, raw_restored["model"]["params"])
     86 return params

TypeError: 'NoneType' object is not subscriptable

I have tried adding in the default model section from the apax template train --full to my config, but I still got the same error.
I also tried setting the directory argument in the ASECalculator to test_model-1/models/apax with the same results.

Am I loading the calculator incorrectly?

Thanks,
-Ali

@M-R-Schaefer
Copy link
Contributor

Hi Ali,

Sorry for the late response, we were all attending a conference these last few days.

The Path you need to specify is the one to ".../directory/experiment".
We don't assume a particular input file name and directory may contain multiple experiments (different model trainings).
Hence, pointing the ASE calculator to the directory with the configuration file is not sufficient.

We could however provide a better error message.

Best
Moritz

@alikhamze
Copy link
Author

No worries, Moritz, I hope the conference went well.

I have tried that directory and get the same error, unfortunately...
From my config file, I should be pointing the ASECalculator class to models/apax, (from the directory and experiments entries in the yaml file in the data heading), but doing so results in the same error mentioned above. This directory has a config.yaml file generated by apax with more details than my initial input file.

Any idea what could be going wrong?

@M-R-Schaefer
Copy link
Contributor

M-R-Schaefer commented Dec 13, 2023

Currently, model loading assumes that you are in the same directory the training was started from.
so if you are in mydir and run a training from there with

directory: models
experiment: exp

then you should be able to to load a model into ASE via
ASECalculator("models/exp") from a python file located in mydir.

I am not sure if you are attempting to load the model from a different relative path, but I suspect that might be the issue here.
It will be fixed by #212 .

The setup described above is what I use for working with apax standalone.
We usually hand over path handling to our workflow manager, IPSuite, so it's certainly true that training at and loading from arbitrary locations was not well tested until now.

@alikhamze
Copy link
Author

Ah, yes, I was trying to load it from one directory above the training directory. Other codes I use (and want to compare against) don't interface with IPSuite so I wasn't using it, hence this issue.

Thank you for your help!

@M-R-Schaefer
Copy link
Contributor

No problem! thanks for raising this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants