In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [2]:
import pandas as pd

In [5]:
from pyace.preparedata import WEIGHTS_ENERGY_COLUMN, WEIGHTS_FORCES_COLUMN, normalize_energy_forces_weights

# 1. Loading reference dataframe

# 1.1 Load from cache/dataset

In [3]:
config={
    "calculator":StructuresDatasetSpecification.FHI_AIMS_PBE_TIGHT, #='FHI-aims/PBE/tight'
    "element":"Al",
    "seed":42 # for shuffling the data
}

2. Create `StructuresDatasetSpecification` object, specify `datapath` to point to the location of cached pickl files

In [4]:
spec = StructuresDatasetSpecification(
    config=config,
    datapath="../data", # where to look and store cached pickle files
#     parallel=3, # parallel processing of data? uncomment it out, if you want
)

There are two dataframes: 
 * `reference` dataframe, which has original data, energies, forces, corrected energies, ASE atoms and atomic environments (neighbour listst)

In [5]:
df = spec.get_ref_dataframe() 

2020-07-06 12:17:01,778 - pyace.preparedata - INFO - Search for cache ref-file: ../data/df-FHI-aims_PBE_tight-Al-ref.pckl.gzip
2020-07-06 12:17:01,779 - pyace.preparedata - INFO - ../data/df-FHI-aims_PBE_tight-Al-ref.pckl.gzip found, try to load
2020-07-06 12:17:01,779 - pyace.preparedata - INFO - Loading dataframe from pickle file: ../data/df-FHI-aims_PBE_tight-Al-ref.pckl.gzip
2020-07-06 12:17:29,062 - pyace.preparedata - INFO - Setting up structures dataframe - please be patient...
2020-07-06 12:17:29,072 - pyace.preparedata - INFO - Processing structures dataframe. Shape: (95969, 18)
2020-07-06 12:17:29,076 - pyace.preparedata - INFO - Total number of atoms: 798245
2020-07-06 12:17:29,077 - pyace.preparedata - INFO - Mean number of atoms per structure: 8.31773801956882
2020-07-06 12:17:29,078 - pyace.preparedata - INFO - 'energy' columns found
2020-07-06 12:17:29,080 - pyace.preparedata - INFO - 'forces' columns found
2020-07-06 12:17:29,082 - pyace.preparedata - INFO - 'pbc' colum

## 1.2 Load custom pickled dataframe

In [6]:
df=pd.read_pickle("df-FHI-aims_PBE_tight-Al-ref.pckl.gzip", compression="gzip")

In [7]:
df.shape

(6, 18)

# 2. Generate custom energy/forces weights

Example: Energy weights are inverse proportional to number of atoms

In [12]:
df[WEIGHTS_ENERGY_COLUMN] = 1./df["NUMBER_OF_ATOMS"]

Example: Force weights are inverse proportional to number of atoms

In [11]:
def generate_force_weights(row):
    n = row["NUMBER_OF_ATOMS"]
    return np.ones(n)/n

In [14]:
df[WEIGHTS_FORCES_COLUMN] =  df.apply(generate_force_weights, axis=1)

Normalize the weights

In [18]:
normalize_energy_forces_weights(df);

Store dataframe with weights to pickle gzip file

In [19]:
df.to_pickle("df_weights.pckl.gzip", compression="gzip")

# Usage

**input.yaml:**

```
...

data:
    filename: df_weights.pckl.gzip

...
``` 

If you want to use dataframe, but ignore provided weights:

**input.yaml:**

```
...

data:
    filename: df_weights.pckl.gzip
    ignore_weights: True

...
``` 