This notebook shows the functions provided by TINC to create parameter spaces from configuration and output files

In [1]:
from tinc import *

# Extracting from configuration files

To extract a parameter space from configuration files, you must provide a data root directory and the name of the configuration file that can be found in subdirectories. This assumes all configuration file names are the same.

You will also need to describe how to extract the information from the configuration files. This is done by specifying the keys where the parameter data is found. For example if the configuration files look like:

```json
{
  "driver" : {
    "mode" : "incremental", 
    "motif" : {
      "configname" : "restricted_auto",
      "_configname" : "SCEL1_1_1_1_0_0_0/0",
      "_configdof" : "$HOME/laptop_share/NbO_rocksalt_gs/mc_runs/fit_13.02/coarse_grid/set2_cooling_grid2/A_3.9B_-19.1/conditions.298/tto/final_state.json"
    },
    "initial_conditions" : {
      "param_chem_pot" : {
        "a" : 3.90,
        "b" : -19.80
      },
      "temperature" : 20.0,
      "tolerance" : 0.001
    },
    "final_conditions" : {
      "param_chem_pot" : {
        "a" : 3.90,
        "b" : -19.80
      },
      "temperature" :2800.0,
      "tolerance" : 0.001
    },
    "incremental_conditions" : {
      "param_chem_pot" : {
        "a" : 0.0,
        "b" : 0.0
      },
      "temperature" : 10.0,
      "tolerance" : 0.001
    }
  }
}

```

You specify the starting value key as: ```driver/initial_conditions/*``` because the starting values are a list within the "driver" and " initial_conditions" keys. A similar string needs to be constructed for end and increment keys.


Current limitations:
  * JSON only
  * space must be described by its boundaries and the incremement
  * Limited format to describe how to extract the information. Currently values must be leaf nodes.

## Extracting parameter space values

The function ```extract_parameter_space_data``` will extract the parameter values as a dictionary. This can be useful as a initial step to ensure values are being extracted correctly.

In [2]:
data_dir = r'C:\Users\Andres\source\repos\vdv_data\nbO_2chempot'
config_file = 'mc_settings.json'
parameter_start_key = 'driver/initial_conditions/*'
parameter_end_key = 'driver/final_conditions/*'
parameter_increment_key = 'driver/incremental_conditions/*'

extract_parameter_space_data(data_dir, config_file, parameter_start_key, parameter_end_key, parameter_increment_key)

{'param_chem_pot(a)': array([3.8, 3.9]),
 'param_chem_pot(b)': array([-19.9, -19.8]),
 'temperature': array([  20.,   30.,   40.,   50.,   60.,   70.,   80.,   90.,  100.,
         110.,  120.,  130.,  140.,  150.,  160.,  170.,  180.,  190.,
         200.,  210.,  220.,  230.,  240.,  250.,  260.,  270.,  280.,
         290.,  300.,  310.,  320.,  330.,  340.,  350.,  360.,  370.,
         380.,  390.,  400.,  410.,  420.,  430.,  440.,  450.,  460.,
         470.,  480.,  490.,  500.,  510.,  520.,  530.,  540.,  550.,
         560.,  570.,  580.,  590.,  600.,  610.,  620.,  630.,  640.,
         650.,  660.,  670.,  680.,  690.,  700.,  710.,  720.,  730.,
         740.,  750.,  760.,  770.,  780.,  790.,  800.,  810.,  820.,
         830.,  840.,  850.,  860.,  870.,  880.,  890.,  900.,  910.,
         920.,  930.,  940.,  950.,  960.,  970.,  980.,  990., 1000.,
        1010., 1020., 1030., 1040., 1050., 1060., 1070., 1080., 1090.,
        1100., 1110., 1120., 1130., 1140., 1150

## Creating parameter spaces

The ```make_parameter_space``` function returns a fully created parameter space from the configuration files. The only remaining step to perform to make the parameter space usable is to set the path template using ```set_current_path_template```. This template describes how the parameter values map to the filesystem.

In [3]:
data_dir = r'C:\Users\Andres\source\repos\vdv_data\nbO_2chempot'
config_file = 'mc_settings.json'
parameter_start_key = 'driver/initial_conditions/*'
parameter_end_key = 'driver/final_conditions/*'
parameter_increment_key = 'driver/incremental_conditions/*'

ps = make_parameter_space(data_dir, config_file, parameter_start_key, parameter_end_key, parameter_increment_key, ps_name="casmParams")

ps.set_current_path_template("A_%%param_chem_pot(a)%%B_%%param_chem_pot(b)%%")
ps.print()

 ** ParameterSpace None: <tinc.parameter_space.ParameterSpace object at 0x000001A5C9930850>
   -- Parameter param_chem_pot(a) /param_chem_pot(a)
   -- Parameter param_chem_pot(b) /param_chem_pot(b)
   -- Parameter temperature /temperature


## Data pools for the new parameter space

Once the parameter space has been extracted, a DataPool can be created to access the data across all the directories. After creating the data pool, you need to register the files that contain the data, in this case, the "results.json" file spans the temperature parameter. These data files must be located in the path defined through ```set_current_path_template()``` above.

In [4]:
dp = DataPoolJson("results", ps, "slice_dir")
dp.register_data_file("results.json", "temperature")

In [5]:
dp.get_current_files()

['C:\\Users\\Andres\\source\\repos\\vdv_data\\nbO_2chempot/A_3.8B_-19.8/results.json']

In [6]:
ps.get_current_relative_path()

'A_3.8B_-19.8'

You can query the fields available in the data files:

In [7]:
dp.list_fields()

['<atom_frac(Nb)>',
 '<atom_frac(O)>',
 '<comp(a)>',
 '<comp(b)>',
 '<comp_n(Nb)>',
 '<comp_n(O)>',
 '<comp_n(Va)>',
 '<formation_energy>',
 '<potential_energy>',
 '<site_frac(Nb)>',
 '<site_frac(O)>',
 '<site_frac(Va)>',
 'Beta',
 'N_avg_samples',
 'N_equil_samples',
 'T',
 'heat_capacity',
 'is_converged',
 'is_equilibrated',
 'param_chem_pot(a)',
 'param_chem_pot(b)',
 'prec(<atom_frac(Nb)>)',
 'prec(<atom_frac(O)>)',
 'prec(<comp(a)>)',
 'prec(<comp(b)>)',
 'prec(<comp_n(Nb)>)',
 'prec(<comp_n(O)>)',
 'prec(<comp_n(Va)>)',
 'prec(<formation_energy>)',
 'prec(<potential_energy>)',
 'prec(<site_frac(Nb)>)',
 'prec(<site_frac(O)>)',
 'prec(<site_frac(Va)>)',
 'susc_n(Nb,Nb)',
 'susc_n(Nb,O)',
 'susc_n(Nb,Va)',
 'susc_n(O,O)',
 'susc_n(S,Nb)',
 'susc_n(S,O)',
 'susc_n(S,Va)',
 'susc_n(Va,O)',
 'susc_n(Va,Va)',
 'susc_x(S,a)',
 'susc_x(S,b)',
 'susc_x(a,a)',
 'susc_x(a,b)',
 'susc_x(b,b)']

In [8]:
ps.get_dimension("param_chem_pot(a)").value = 3.9

You can request slices of data, from a single data file (temperature is the parameter contained in the individual files):

In [9]:
#dp.debug = True
dp.get_slice("<formation_energy>", "temperature")

masked_array(data=[-20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115576, -20.11559 ,
                   -20.115597, -20.115618, -20.11561 , -20.115568,
                   -20.115623, -20.115692, -20.115662, -20.1156  ,
                   -20.115719, -20.11563 , -20.115742, -20.11576 ,
                   -20.115826, -20.11572 , -20.115856, -20.115

In [10]:
len(dp.get_slice("<formation_energy>", "temperature")), len(ps.get_parameter('temperature').values)

(279, 279)

Or you can request slices that take a single value from a number of results file, when the requested slicing dimension is a dimension that affects the current path:

In [11]:
dp.get_slice("<formation_energy>", "param_chem_pot(a)")

masked_array(data=[-20.115568, -20.115568],
             mask=False,
       fill_value=1e+20,
            dtype=float32)

In [12]:
len(dp.get_slice("<formation_energy>", "param_chem_pot(a)")), len(ps.get_parameter('param_chem_pot(a)').values)

(2, 2)

## Making parameter space from output files

You can also extract parameter spaces from output files, to analyze output data, or when the input parameters are available in output data files.

To do this, you must provide a root path and define a function that reads the a data file and returns a dictionary with the possible values each parameter can take according to the data file. The format should be the same provided by the ```extract_parameter_space_data``` function above.

In [13]:
def read_func(path):
    with open(path) as f:
        j = json.load(f)
    return j

data_dir = r'C:\Users\Andres\source\repos\vdv_data\nbO_2chempot'
ps = extract_parameter_space_from_output(data_dir, "results.json", read_func)
ps.print()

Found more than one potential parameter in files:['Beta', 'T'] 
Using:Beta 
 ** ParameterSpace ps: <tinc.parameter_space.ParameterSpace object at 0x000001A5EA1140D0>
   -- Parameter Beta /Beta
   -- Parameter param_chem_pot(a) /param_chem_pot(a)
   -- Parameter param_chem_pot(b) /param_chem_pot(b)


If there are two potential parameters inside the result files, the first one found will be used. You can instruct which ones to ignore using the ```ignore_params=``` argument:

In [14]:
ps = extract_parameter_space_from_output(data_dir, "results.json", read_func, ignore_params=['Beta'])
ps.print()

Found more than one potential parameter in files:['Beta', 'T'] 
Using:T 
 ** ParameterSpace ps: <tinc.parameter_space.ParameterSpace object at 0x000001A5EA1392B0>
   -- Parameter T /T
   -- Parameter param_chem_pot(a) /param_chem_pot(a)
   -- Parameter param_chem_pot(b) /param_chem_pot(b)


You can create a data pool directly from results using ```create_datapool_from_output()``` 

In [15]:
dp = create_datapool_from_output(data_dir, "results.json", read_func, ignore_params=['Beta'])

Found more than one potential parameter in files:['Beta', 'T'] 
Using:T 


In [16]:
dp.list_fields()

['<atom_frac(Nb)>',
 '<atom_frac(O)>',
 '<comp(a)>',
 '<comp(b)>',
 '<comp_n(Nb)>',
 '<comp_n(O)>',
 '<comp_n(Va)>',
 '<formation_energy>',
 '<potential_energy>',
 '<site_frac(Nb)>',
 '<site_frac(O)>',
 '<site_frac(Va)>',
 'Beta',
 'N_avg_samples',
 'N_equil_samples',
 'T',
 'heat_capacity',
 'is_converged',
 'is_equilibrated',
 'param_chem_pot(a)',
 'param_chem_pot(b)',
 'prec(<atom_frac(Nb)>)',
 'prec(<atom_frac(O)>)',
 'prec(<comp(a)>)',
 'prec(<comp(b)>)',
 'prec(<comp_n(Nb)>)',
 'prec(<comp_n(O)>)',
 'prec(<comp_n(Va)>)',
 'prec(<formation_energy>)',
 'prec(<potential_energy>)',
 'prec(<site_frac(Nb)>)',
 'prec(<site_frac(O)>)',
 'prec(<site_frac(Va)>)',
 'susc_n(Nb,Nb)',
 'susc_n(Nb,O)',
 'susc_n(Nb,Va)',
 'susc_n(O,O)',
 'susc_n(S,Nb)',
 'susc_n(S,O)',
 'susc_n(S,Va)',
 'susc_n(Va,O)',
 'susc_n(Va,Va)',
 'susc_x(S,a)',
 'susc_x(S,b)',
 'susc_x(a,a)',
 'susc_x(a,b)',
 'susc_x(b,b)']

In [17]:
ps = dp.get_parameter_space()

In [18]:
ps.get_current_relative_path()

'C:\\Users\\Andres\\source\\repos\\vdv_data\\nbO_2chempot\\A_3.8B_-19.9'

In [19]:
ps.get_root_path()

''

In [20]:
#dp.debug = True
dp.get_slice("<formation_energy>", "T")

masked_array(data=[-20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115568, -20.115576, -20.115568,
                   -20.115568, -20.115568, -20.115568, -20.115568,
                   -20.115568, -20.115717, -20.115568, -20.115568,
                   -20.115602, -20.115568, -20.115625, -20.115576,
                   -20.11562 , -20.115593, -20.115604, -20.115654,
                   -20.115614, -20.115604, -20.115646, -20.115593,
                   -20.115671, -20.115788, -20.11571 , -20.115

In [21]:
dp.get_slice("Beta", "param_chem_pot(b)")

masked_array(data=[580.2253   , 386.81686  , 290.11264  , 232.09012  ,
                   193.40843  , 165.77866  , 145.05632  , 128.93895  ,
                   116.04506  , 105.49551  ,  96.704216 ,  89.26543  ,
                    82.88933  ,  77.36337  ,  72.52816  ,  68.2618   ,
                    64.469475 ,  61.076347 ,  58.02253  ,  55.259552 ,
                    52.747753 ,  50.454372 ,  48.352108 ,  46.418022 ,
                    44.632713 ,  42.979652 ,  41.444664 ,  40.015537 ,
                    38.681686 ,  37.43389  ,  36.26408  ,  35.16517  ,
                    34.1309   ,  33.15573  ,  32.234737 ,  31.36353  ,
                    30.538174 ,  29.755144 ,  29.011265 ,  28.303673 ,
                    27.629776 ,  26.987223 ,  26.373877 ,  25.78779  ,
                    25.227186 ,  24.690437 ,  24.176054 ,  23.682665 ,
                    23.209011 ,  22.753933 ,  22.316357 ,  21.895294 ,
                    21.489826 ,  21.099102 ,  20.722332 ,  20.358782 ,
      

TODO... Ignore below...

In [22]:
import netCDF4

ps_filename = "parameter_space.nc"
sub_dir = sub_dirs[0]
full_path = data_dir + subdir + ps_filename
ps_file = netCDF4.Dataset(full_path, "w", format="NETCDF4")

params = ps_file.createGroup("internal_parameters")
mapped_params = ps_file.createGroup("mapped_parameters")
index_params = ps_file.createGroup("index_parameters")

for param_name, space in param_space.items():
    param_group = rootgrp.createVariable("values","f8",("internal_parameters",))
    mapped_group = rootgrp.createVariable("values","f8",("mapped_parameters",))
    mapped_var_ids = rootgrp.createVariable("ids","s",("mapped_parameters",))
    index_group = rootgrp.createVariable("values","f8",("index_parameters",))


param_var = rootgrp.createVariable("values","f8",("internal_parameters",))
mapped_var = rootgrp.createVariable("values","f8",("mapped_parameters",))
mapped_var_ids = rootgrp.createVariable("ids","s",("mapped_parameters",))
index_params = rootgrp.createVariable("values","f8",("index_parameters",))

ps_file.close()

NameError: name 'sub_dirs' is not defined

In [23]:
def read_func(path):
    with open(path) as f:
        j = json.load(f)
    return j

data_dir = r'C:\Users\Andres\source\repos\vdv_data\MonteCarlo_0'
dp = create_datapool_from_output(data_dir, "results.json", read_func ,ignore_params=["Beta"] )
ps = dp.get_parameter_space()
ps.print()

Found more than one potential parameter in files:['param_chem_pot(a)'] 
Using:param_chem_pot(a) 
 ** ParameterSpace ps: <tinc.parameter_space.ParameterSpace object at 0x000001A5EA139E20>
   -- Parameter param_chem_pot(a) /param_chem_pot(a)
   -- Parameter T /T


In [24]:
ps._path_template

'%%T:ID%%'

In [25]:
ps.get_current_relative_path()

'C:\\Users\\Andres\\source\\repos\\vdv_data\\MonteCarlo_0\\T_300'

In [26]:
dp.list_fields()

['<atom_frac(Li)>',
 '<atom_frac(Nb)>',
 '<atom_frac(O)>',
 '<atom_frac(P)>',
 '<comp(a)>',
 '<comp_n(Li)>',
 '<comp_n(Nb)>',
 '<comp_n(O)>',
 '<comp_n(P)>',
 '<comp_n(Va)>',
 '<corr(0)>',
 '<corr(1)>',
 '<corr(10)>',
 '<corr(100)>',
 '<corr(101)>',
 '<corr(102)>',
 '<corr(103)>',
 '<corr(104)>',
 '<corr(105)>',
 '<corr(106)>',
 '<corr(107)>',
 '<corr(108)>',
 '<corr(109)>',
 '<corr(11)>',
 '<corr(110)>',
 '<corr(111)>',
 '<corr(112)>',
 '<corr(113)>',
 '<corr(114)>',
 '<corr(115)>',
 '<corr(116)>',
 '<corr(117)>',
 '<corr(118)>',
 '<corr(119)>',
 '<corr(12)>',
 '<corr(120)>',
 '<corr(121)>',
 '<corr(122)>',
 '<corr(123)>',
 '<corr(124)>',
 '<corr(125)>',
 '<corr(126)>',
 '<corr(127)>',
 '<corr(128)>',
 '<corr(129)>',
 '<corr(13)>',
 '<corr(130)>',
 '<corr(131)>',
 '<corr(132)>',
 '<corr(133)>',
 '<corr(134)>',
 '<corr(135)>',
 '<corr(136)>',
 '<corr(137)>',
 '<corr(138)>',
 '<corr(139)>',
 '<corr(14)>',
 '<corr(140)>',
 '<corr(141)>',
 '<corr(142)>',
 '<corr(143)>',
 '<corr(144)>',
 '

In [27]:
dp.get_slice("<formation_energy>", "T")

masked_array(data=[-6.514254 , -6.5118012, -6.501532 , -6.494609 ,
                   -6.481887 ],
             mask=False,
       fill_value=1e+20,
            dtype=float32)

In [28]:
dp.get_slice("<formation_energy>", "param_chem_pot(a)")

masked_array(data=[-0.12356226, -0.12449209, -0.12752177, -0.13584016,
                   -0.15942748, -0.23135354, -0.48277244, -1.0396816 ,
                   -1.3671179 , -1.503868  , -1.5821247 , -1.6454239 ,
                   -1.7325352 , -1.8850185 , -2.1147645 , -2.4558954 ,
                   -2.6337516 , -2.683888  , -2.7110467 , -2.748431  ,
                   -3.3086352 , -3.7896414 , -3.933403  , -4.0647483 ,
                   -4.201793  , -4.3348927 , -4.429213  , -4.5827074 ,
                   -4.78959   , -5.035594  , -5.2768717 , -5.4365582 ,
                   -5.608032  , -5.7185006 , -5.8714495 , -5.937009  ,
                   -6.018578  , -6.0812917 , -6.1558046 , -6.219416  ,
                   -6.27296   , -6.3190727 , -6.354755  , -6.3892612 ,
                   -6.422431  , -6.4496083 , -6.4690537 , -6.4874907 ,
                   -6.5026927 , -6.5113926 , -6.514254  , -6.5142307 ,
                   -6.506843  , -6.495154  , -6.477317  , -6.456723  ,
      

In [29]:
ps.is_filesystem_dimension("T")

True