# Introduction to CP2K:
<hr>
<h2>Prepare <code>input.data</code> for N2P2</h2>

<blockquote>
Date: 08-04-2024<br>
Written by:<br>
    &nbsp; Lei Lei: <a href="mailto:lei.lei2@nottingham.ac.uk">Lei.Lei2@nottingham.ac.uk</a><br>
    &nbsp; Sanliang Ling: <a href="mailto:sanliang.ling@nottingham.ac.uk">Sanliang.Ling@nottingham.ac.uk</a>
</blockquote>

## Install [AML](https://github.com/MarsalekGroup/aml) package
<span style="color: #b2182b; font-size:150%; font-weight: bold;">Note:</span> We made minor changes on `aml`:
- modified `aml.io.cp2k` module so that frames can deal with changing cell parameters
- added `random_split` method for  `Structures` class in `structures` module  

In [4]:
!cd aml #change to your actual aml directory

In [None]:
!source env.sh

If you'd like to install the original `AML`:

```shell
git clone https://github.com/MarsalekGroup/aml.git
cd aml
source env.sh
```

## Import modules

In [4]:
import os
import numpy as np
import pandas as pd
from aml.io import cp2k
from aml.structures import Structures

Define a function to extract the cell parameters from 'PROJECT-1.cell' file

In [5]:
def get_cell(folder, project = "sys"):
    cell_df = pd.read_csv(f"{folder}/{project}-1.cell", sep = "[ ]{2,}", engine='python')
    names = [name for name in list(cell_df) if name.endswith("[Angstrom]")]
    cell_df = cell_df[names]
    cells = cell_df.values.reshape((len(cell_df), 3, 3))
    return cells

Define the function to convert CP2K trajectory data into RuNNEr/N2P2 format

In [18]:
def write_input(folder, project = "sys",
                random_split = False,
                test_size = None
               ):
    fn_positions =  f'{folder}/{project}-pos-1.xyz'
    fn_forces =  f'{folder}/{project}-frc-1.xyz'
    cells = get_cell(folder, project = project)
    
    frames = cp2k.read_frames_cp2k(
        fn_positions = fn_positions,
        fn_forces = fn_forces,
        cells = cells
    )
    
    structures = Structures.from_frames(frames)
    
    if random_split:
        assert any([test_size is None, isinstance(test_size, list)]), \
        "Please use list as test_size, e.g. [0.9, 0.1]."
        if test_size is None:
            print("Using default test size!")
            test_size = [0.9, 0.1]
            
        train_structs, test_structs = structures.random_split()
        
        train_structs.to_file(f'{folder}/input.data', label_prop='reference')
        test_structs.to_file(f'{folder}/hold_test.data', label_prop='reference')
        
        print(
            f"Done! CP2K reference data converted to N2P2 input file: 'input.data'\n \
            {test_size[-1] * 100} % held as test data: 'hold_test.data'"
        )
    
    else:
        structures.to_file(f'{folder}/input.data', label_prop='reference')
        print("Done! CP2K reference data converted to N2P2 input file: 'input.data'")

## Convert files using defined functions

In [19]:
write_input("Mg_ML")

Done! CP2K reference data converted to N2P2 input file: 'input.data'


In [14]:
write_input("Mg_ML", random_split = True)

Using default test size!
Done! CP2K reference data converted to N2P2 input file: 'input.data'
             10.0 % held as test data: 'hold_test.data'


## Appendix

### N2P2 docs
https://compphysvienna.github.io/n2p2/index.html

### N2P2 training procedure
https://compphysvienna.github.io/n2p2/topics/training.html