# Quick tour with QM9

This notebook showcases a simple example of training a neural network potential on the QM9 dataset with PiNN.

In [1]:
from pinn.io import load_ase
from ase import Atoms
import tensorflow as tf
traj = [Atoms('Cu') for i in range(10)]
tf.data.DatasetSpec.from_value(load_ase(traj))._serialize()[0]

{'elems': TensorSpec(shape=(None,), dtype=tf.int32, name=None),
 'coord': TensorSpec(shape=(None, 3), dtype=tf.float32, name=None)}

In [5]:
import os, warnings
import tensorflow as tf
from glob import glob
from ase.collections import g2
from pinn.io import load_qm9, sparse_batch
from pinn import get_model, get_calc
# CPU is used for documentation generation, feel free to use your GPU!
os.environ['CUDA_VISIBLE_DEVICES'] = '' 
# We heavily use indexed slices to do sparse summations,
# which causes tensorflow to complain, 
# we believe it's safe to ignore this warning.
index_warning = 'Converting sparse IndexedSlices'
warnings.filterwarnings('ignore', index_warning)

## Getting the dataset

PiNN adapts TensorFlow's dataset API to handle different datasets.

For this and the following notebooks the QM9 dataset (https://doi.org/10.6084/m9.figshare.978904) is used.  
To follow the notebooks, download the dataset and change the directory accordingly.

The dataset will be automatically split into subsets according to the split_ratio.  
Note that to use the dataset with the estimator, the datasets should be a function, instead of a dataset object.

In [9]:
filelist = glob('/home/yunqi/datasets/QM9/dsgdb9nsd/*.xyz')
dataset = lambda: load_qm9(filelist, splits={'train':8, 'test':2})
train = lambda: dataset()['train'].repeat().shuffle(1000).apply(sparse_batch(100))
test = lambda: dataset()['test'].repeat().apply(sparse_batch(100))

## Defining the model
In PiNN, models are defined at two levels: models and networks. 

- A model (model_fn) defines the target, loss and training details of the network.
- A network defines the structure of the neural network.

In this example, we will use the potential model, and the PiNet network.

The configuration of a model is stored in a nested dictionary like: 

```Python
{'model_dir': 'PiNet_QM9',
 'network': {
     'name': 'PiNet',
     'params': {
         'depth': 4,
         'rc':4.0,
         'atom_types':[1,6,7,8,9]},
 'model':{
     'name': 'potential_model',
     'params':{
         'learning_rate': 3e-4,
         ...
         }
     }
}
```

Available options of the network and model can be found in the documentation.

The easiest way to specify the network function is with its name.  
You can also provide a custom function here.

In [4]:
params = {'model_dir': '/tmp/PiNet_QM9',
          'network': {
              'name': 'PiNet',
              'params': {
                  'depth': 4,
                  'rc':4.0,
                  'atom_types':[1,6,7,8,9]
              },
          },
          'model': {
              'name': 'potential_model',
              'params': {
                  'learning_rate': 1e-3
              }
          }
}
model = get_model(params)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/PiNet_QM9', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [7]:
import pinn

## Configuring the training process
The defined model is indeed a tf.Estimator object, thus, the training can be easily controlled

In [4]:
train_spec = tf.estimator.TrainSpec(input_fn=train, max_steps=1000)
eval_spec = tf.estimator.EvalSpec(input_fn=test, steps=100)

## Train and evaluate

In [5]:
tf.estimator.train_and_evaluate(model, train_spec, eval_spec)

NameError: name 'model' is not defined

## Using the model

The trained model can be used as an ASE calculator.

In [11]:
import gast
gast.__path__

['/home/yunqi/.miniconda/envs/pinn-tf2/lib/python3.9/site-packages/gast']

In [10]:
get_model(params).predict(train)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/PiNet_QM9', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


<generator object Estimator.predict at 0x7fa4eb432ac0>

In [11]:
next(_)

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
INFO:tensorflow:Calling model_fn.


NotImplementedError: Cannot convert a symbolic Tensor (gradients/UnsortedSegmentSum_grad/sub:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

In [2]:
from ase.collections import g2
from pinn import get_calc
params = {'model_dir': '/tmp/PiNet_QM92',
          'network': {
              'name': 'PiNet',
              'params': {
                  'depth': 4,
                  'rc':4.0,
                  'atom_types':[1,6,7,8,9]
              },
          },
          'model': {
              'name': 'potential_model',
              'params': {
                  'learning_rate': 1e-3
              }
          }
}

calc = get_calc(params)
calc.properties = ['energy']
atoms = g2['C2H4']
atoms.set_calculator(calc)
atoms.get_forces(), atoms.get_potential_energy()

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/PiNet_QM92', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tens

(array([[ 0.0000000e+00, -4.7683716e-07, -1.0249622e+01],
        [ 0.0000000e+00,  4.7683716e-07,  1.0249622e+01],
        [ 0.0000000e+00, -9.5919666e+00, -1.0566272e+01],
        [ 0.0000000e+00,  9.5919666e+00, -1.0566273e+01],
        [ 0.0000000e+00, -9.5919657e+00,  1.0566273e+01],
        [ 0.0000000e+00,  9.5919647e+00,  1.0566273e+01]], dtype=float32),
 -79.43273162841797)

## Conclusion

You have trained your first PiNN model, though the accuracy is not so satisfying
(RMSE=21 Hartree!). Also, the training speed is slow as it's limited by the IO and 
pre-processing of data.  

We will show in following notebooks that:

- Proper scaling of the energy will improve the accuracy of the model.
- The training speed can be enhanced by caching and pre-processing the data.