# 1. Quick tour

This notebook showcases a easiest example of training neural network potentials with PiNN

In [1]:
import os, warnings
import tensorflow as tf
from glob import glob
from ase.collections import g2
from pinn.io import load_qm9, sparse_batch
from pinn.models import potential_model
from pinn.calculator import PiNN_calc
# CPU is used for documentation generation, feel free to use your GPU!
os.environ['CUDA_VISIBLE_DEVICES'] = '' 
# We heavily use indexed slices to do sparse summations,
# which cause tensorflow to complain, 
# we believe it's safe to ignore this warning.
index_warning = 'Converting sparse IndexedSlices'
warnings.filterwarnings('ignore', index_warning)

## Getting the dataset

PiNN adapted tensorflow's dataset API to handel different datasets.

For this and the following notebooks the QM9 dataset (https://doi.org/10.6084/m9.figshare.978904) is used.  
To follow the notebooks, download the dataset and change the directory accordingly.

The dataset will be automatically split into subsets according to the split_ratio.  
Note that to use the dataset with the estimator, the datasets should be a function, instead of a dataset object.

In [2]:
filelist = glob('/home/yunqi/datasets/QM9/dsgdb9nsd/*.xyz')
dataset = lambda: load_qm9(filelist, split_ratio={'train':8, 'test':2})
train = lambda: dataset()['train'].repeat().shuffle(1000).apply(sparse_batch(100))
test = lambda: dataset()['test'].repeat().apply(sparse_batch(100))

## Defining the model
In PiNN, models are defined at two levels: models and networks. 
- A model (model_fn) defines the target, loss and training details of the network
- A network defines the structure of the neural network  

In this example, we will use the potential model, and the PiNN network.

The configuration of a model is stored in a nested json-like structure. 

```Python
{'model_dir': 'training_models/PiNN_QM9',
 'network': 'pinn_network',
 'netparam': {'depth': 4,
              'rc':4.0,
              'atom_types':[1,6,7,8,9]},
 'train':{
     'learning_rate': 3e-4,
     'regularization': 'clip',
     ...}}
```
A detailed description of the configuration **is NOT written yet**.

The easiest way to specify the network function is with its name.  
You can also provide a custom function here.

In [3]:
params = {'model_dir': '/tmp/PiNN_QM9',
          'network': 'pinn_network',
          'netparam': {},
          'train': {'learning_rate': 1e-3}}
model = potential_model(params)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/PiNN_QM9', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f13c38cb9e8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


## Configuring the training process
The defined model is indeed a tf.Estimator object, thus, the training can be easily controlled

In [4]:
train_spec = tf.estimator.TrainSpec(input_fn=train, max_steps=1000)
eval_spec = tf.estimator.EvalSpec(input_fn=test, steps=100)

## Train and evaluate

In [5]:
tf.estimator.train_and_evaluate(model, train_spec, eval_spec)

INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Skipping training since max_steps has already saved.


(None, None)

## Using the model

The trained model can be used as a ASE calculator.

In [6]:
calc = PiNN_calc(model)
atoms = g2['C2H4']
atoms.set_calculator(calc)
atoms.get_forces(), atoms.get_potential_energy()

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    
INFO:tensorflow:Calling model_fn.
Instructions for updating:
Use keras.layers.dense instead.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /tmp/PiNN_QM9/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


(array([[-0.0000000e+00, -2.5331974e-07, -2.7395887e+00],
        [-0.0000000e+00,  4.1723251e-07,  2.7395873e+00],
        [-0.0000000e+00,  1.4450596e+01,  1.1502363e+01],
        [-0.0000000e+00, -1.4450594e+01,  1.1502361e+01],
        [-0.0000000e+00,  1.4450591e+01, -1.1502361e+01],
        [-0.0000000e+00, -1.4450592e+01, -1.1502361e+01]], dtype=float32),
 -98.2684326171875)

## Conclusion

You've have trained your first PiNN model, though the accuracy is not so satisfying
(RMSE=21 Hartree!). Also, the training speed is slow as it's limited by the IO and 
pre-processing of data.  

We will show in following notebooks that:
- Proper scaling of the energy will improve the accuracy of the model
- The training speed can be enhancing by caching and pre-processing the data