# Fine-tuning a classical interatomic potential on a ColabFit dataset

This notebook demonstrates fine-tuning Tersoff's 1988 potential for carbon using the Orchestrator functions for accessing the [KLIFF](https://kliff.readthedocs.io) fitting framework and the local instance of the [ColabFit](https://colabfit.org) database.

First, let's find a dataset to train on by initializing a `storage` object and seeing what's available.

---
**NOTE**

Throughout this example, we will be defining individual dictionaries as inputs to initialize modules for clarity. In practice, you may combine them into a single dictionary for convenience.

---

In [None]:
from orchestrator.utils.setup_input import init_and_validate_module_type

storage_input_dict = {
    "storage":{
        "storage_type":"COLABFIT",
        "storage_args":{
            "credential_file":"/usr/gapps/iap/kim-storage/iap-storage/test_colabfit_credentials.json"
        }
    }
}

storage = init_and_validate_module_type("storage", storage_input_dict)

print(storage.list_data())

The first dataset, "ChIMES_C_2_0_Small_model", seems just fine for a demo. We will need to use its `colabfit-id`, `DS_rbzxtis1zggr_0`, later.

To initialize the potential, we will install Tersoff's 1988 carbon model from OpenKIM, if it is not already installed. You can find the name of the model and its [dedicated page](https://openkim.org/id/Tersoff_LAMMPS_Tersoff_1988_C__MO_579868029681_004) by going to [openkim.org](https://openkim.org), choosing "Browse->Models", and searching for `C` under "Narrow species selection".

In [None]:
!kim-api-collections-management install system Tersoff_LAMMPS_Tersoff_1988_C__MO_579868029681_004

Now, let's create an Orchestrator `Potential` object from the OpenKIM model we installed:

In [None]:
potential_input_dict = {
    "potential":{
        "potential_type":"KIM",
        "potential_args":{
            "kim_id":"Tersoff_LAMMPS_Tersoff_1988_C__MO_579868029681_004",
            "kim_api":"kim-api-collections-management",
            "model_driver": "Tersoff_LAMMPS__MD_077075034781_005",
            "species":["C"]
        }
    },
}
potential = init_and_validate_module_type("potential", potential_input_dict)


Next, we should examine what parameters the model exposes. There are several ways to do this, here we will use the KLIFF `model` object internal to the `Potential` object to inspect the parameters:

In [None]:
potential.model.echo_model_params()

To understand the meaning of these parameters, we should read the README or Wiki entry (if available) of the OpenKIM Model Driver powering the model. At the [bottom of the model's page](https://openkim.org/id/Tersoff_LAMMPS_Tersoff_1988_C__MO_579868029681_004#files), you can see that the model depends on the Model Driver [Tersoff_LAMMPS__MD_077075034781_005](https://openkim.org/id/Tersoff_LAMMPS__MD_077075034781_005). At the bottom of the Model Driver's page, there is a [Wiki entry](https://openkim.org/id/Tersoff_LAMMPS__MD_077075034781_005#wiki) explaining the parameters.

If we tried to train all the parameters, we would find out that *m* is not optimizable (as it is an integer), and that *beta* causes an error if it goes negative (this is possible to handle, but we will simply omit it for simplicity). We also probably don't want to train the cutoffs, *Rc* and *Dc*.

Before we do any training, let's take a look at a prediction of our potential on a configuration from the training dataset.

In [None]:
import numpy as np
dataset_handle = "DS_rbzxtis1zggr_0"
example_config = storage.get_data(dataset_handle)[0]
ref_forces = example_config.get_array("atomic_forces_forces")
_,forces,_ = potential.evaluate(example_config)
print(f'Pre-training force error on first config: {np.linalg.norm(forces-ref_forces)}')

Now, set up the inputs to the `Trainer`and train the model.

In [None]:
trainer_input_dict = {
    "trainer":{
        "trainer_type":"KLIFF",
        "trainer_args": {
            "params_to_update": ['A', 'B', 'lambda1', 'lambda2', 'n', 'lambda3', 'gamma', 'c', 'd', 'h'], 
            "loss_method": "mse",
            "max_evals": 1,
            "optimization_method": "L-BFGS-B",
            "model_name":potential.kim_id,
            #"model_driver":potential.model_driver
        }
    }
}

trainer = init_and_validate_module_type("trainer", trainer_input_dict)

_, training_loss = trainer.train(
    "training_example",
    potential,
    storage,
    dataset_handle,
    eweight=0.1, 
    fweight=1.0,
    upload_to_kimkit=False,
)

Now, we evaluate the model again, and rename the model to avoid collisions which will reinitialize its calculator. This will automatically install it into the KIM API. We can see that indeed the force error on the same configuration is lower than before.

In [None]:
try:
    del potential.model_calculator
except AttributeError:
    pass

prefix = "Tersoff_LAMMPS_Orchestrator_2025"
potential.generate_new_kim_id(id_prefix=prefix,kim_item_type='portable-model')

try:
    _,forces,_ = potential.evaluate(example_config)
except FileExistsError:
    try:
        del potential.model_calculator
    except AttributeError:
        pass
    potential.generate_new_kim_id(id_prefix=prefix,kim_item_type='portable-model')
    _,forces,_ = potential.evaluate(example_config)
    
print(f'Post-training force error on first config: {np.linalg.norm(forces-ref_forces)}')

To look at how the parameters have changed, let's once again use `potential.model.echo_model_params()`. Comparing with the previous output, we can see how they have changed.

In [None]:
potential.model.echo_model_params()

The two cells below should clean everything you created out of your directories, KIM API, and KIMKit.

In [None]:
import kimkit

repository_list = kimkit.models.enumerate_repository()
for item in repository_list:
    if prefix in item:
        try:
            kimkit.models.delete(item)
        except kimkit.src.config.NotRunAsEditorError:
            pass

In [None]:
!rm -r Tersoff_LAMMPS_Orchestrator_2025* Tersoff_LAMMPS_Tersoff_1988_C__MO_579868029681_004 *.log
import os
os.system(f"kim-api-collections-management remove --force {potential.kim_id}")