In [1]:
%%bash 
rm tmp/*

# ase-db: A database for ASE calculations

The ase database features offer a powerful way to explore and manage batches of calculations. It can be very convenient to parse a number of output files, scattered across a directory structure, to form a single file with all the useful data. The [official documention](https://wiki.fysik.dtu.dk/ase/ase/db/db.html#module-ase.db) is not comprehensive, but contains some useful information. **ase-db** has somewhat suffered from a lack of interest and a couple of API changes, but it seems like nothing a few serious users couldn't sort out!

ASE supports three backends:

* [SQLite3](https://en.wikipedia.org/wiki/SQLite) is fully-featured relational database system which stores databases in local files. It's fast and widely-used for data serialisation in software. The file format is binary. You can inspect and modify ASE-generated SQLite3 files with regular sqlite3 tools, but some parameters (e.g. atomic positions) are stored as binary blobs.
* [JSON](https://en.wikipedia.org/wiki/JSON) is a simple text-based format for data serialisation. This can be a good choice for long-term archiving and publication support. It will always be readable, but it's human-friendliness is overrated...
* [PostgreSQL](https://en.wikipedia.org/wiki/PostgreSQL) is a more traditional server-hosted relational database system. This might be suitable for a group sharing data, but the configuration is correspondingly more complex.

## Exploring an existing DB 

An ASE-DB can be a useful way of storing an interesting set of calculations. You can apply search/filter operations, and add your own metadata.

**data/sulfur_pbe0_96.db** is a set of data from calculations on sulfur clusters S2-S8, stored using the SQLite3 backend. As well as the default attributes including geometry and energy, we have added "name" fields and packaged vibrational frequencies into the "data" field. Suppose we wish to examine the relative energies of the clusters with size S4-S7:

In [12]:
import ase
import ase.db

db = ase.db.connect('data/sulfur_pbe0_96.db')
rows = db.select('S>3,S<8')
ref_energy = db.get(name='S8').energy/8
print 'Species name     Total energy      Relative energy/atom'
for row in rows:
    print '{0:14s} {1:12.2f} {2:12.2f}'.format(row.name, row.energy,
                                              row.energy/row.natoms - ref_energy)

Species name     Total energy      Relative energy/atom
S5_ring           -54397.19         0.16
S7_ring           -76156.82         0.05
S6_buckled        -65276.60         0.16
S7_branched       -76156.00         0.17
S6_chain_63       -65275.98         0.26
S6_stack_S3       -65276.31         0.21
S4_buckled        -43516.41         0.49
S6_branched       -65276.18         0.23
S4_eclipsed       -43516.84         0.38


There are some useful tools provided to explore the database. 
Have a look at the options for the command line tool with `ase-db -h`.
If you have the **Flask** Python library installed, you can use an attractive web interface by calling `ase-db -w mydatabase.db`.
When the flask server is running, your locally-hosted page will appear at http://0.0.0.0:5000 .

## Writing calculation parameters to the DB

Let's invent a Morse potential so we have a calculation to play with.

In [2]:
from ase.calculators.morse import MorsePotential

h2 = ase.Atoms('H2',positions=[[0,0,0],[0,0,1]])
calc = MorsePotential(epsilon=2., rho0=5.0, r0=1.2)
h2.set_calculator(calc)
# Trigger a calculation
print "Forces: ", h2.get_forces()

Forces:  [[  0.           0.         -49.89190266]
 [  0.           0.          49.89190266]]


We open a database connection and write the Atoms object (which includes computer properties and the Calculator) into the db. This returns an ID.

(In older version of ASE, you can specify the ID, but this causes errors in recent versions.)

In [3]:
# Filenames with the .db suffix automatically use the sqlite3 backend
db = ase.db.connect('tmp/morse_calcs.db')
# The write method returns a unique id
id = db.write(h2)

Now let's pull the row back out of the database and have a look at it.

In [4]:
h2_from_db = db.get(id)
print h2_from_db
print "Methods list: ", [key for key in h2_from_db]

<ase.db.row.AtomsRow instance at 0x106123878>
Methods list:  ['pbc', 'energy', 'calculator_parameters', 'numbers', 'mtime', 'ctime', 'positions', 'id', 'cell', 'forces', 'calculator', 'unique_id', 'user']


The "calculator parameters" entry is interesting as this allows us to reproduce calculations.

In [5]:
# Parameters are stored, but in a string representation
print h2_from_db.calculator_parameters
print type(h2_from_db.calculator_parameters)

{"epsilon": 2.0, "rho0": 5.0, "r0": 1.2}
<type 'unicode'>


Note that the data is *serialised* into a string. Python's regular serialisation tools **pickle**, **shelve** and **marshal** are not secure and will execute arbitrary code. Database modules tend to use "safe" string-based serialisation of a subset of Python objects. Here we can use a function provided by the **json** module.

In [6]:
# the json module has a safe tool for getting these back into Python objects
from json import loads
def dbrow2params(db_row):
    return loads(db_row.calculator_parameters)

params = dbrow2params(h2_from_db)
print params
print type(params)

{u'epsilon': 2.0, u'rho0': 5.0, u'r0': 1.2}
<type 'dict'>


Let's initialise a new calculation closer to the optimum, using the imported parameters.

In [7]:
new_atoms = h2_from_db.toatoms()
new_calc = MorsePotential(**params) # In Python, **args expands a dict {arg1:x, arg2:y,...} to a 
                                # list of optional args in format arg1=x, arg2=y, ...
new_atoms.set_calculator(new_calc)
new_atoms.positions = [[0.,0.,0.],[0.,0.,1.199]]
print "Forces: ", new_atoms.get_forces()

Forces:  [[ 0.          0.         -0.06987988]
 [ 0.          0.          0.06987988]]


### More chemistry packages

The various chemistry codes that have ASE interfaces include a "read" method which can get information about finished calculations from their output files. Here we look at an output file from GPAW. Note that in the resulting Atoms object does *not* have the correct Calculator object attached.

In [8]:
import ase.io
gpaw_h2_txt = ase.io.read('data/h2.gpaw.0.1.txt')
print "Calculator type: ", gpaw_h2_txt.calc

Calculator type:  <ase.calculators.singlepoint.SinglePointDFTCalculator instance at 0x106123ef0>


Lots of data *is* available:

In [9]:
calc = gpaw_h2_txt.calc
print "HOMO and LUMO: ", calc.get_homo_lumo(), " Fermi energy: ", calc.eFermi
print "Spin-polarised? ", calc.get_spin_polarized(), "      K-points: ", calc.kpts

HOMO and LUMO:  (-10.091749999999999, 0.37676999999999999)  Fermi energy:  -4.85749
Spin-polarised?  False       K-points:  [<ase.calculators.singlepoint.SinglePointKPoint instance at 0x106123dd0>]


but not the GPAW parameters:

In [10]:
calc.parameters

{}

There is clearly and advantage to writing to the ASE-db directly from the original ASE Atoms objecct, which has this information. This is the origin of the file "H2.gpaw.db", in which the calculator is correctly identified...

In [11]:
gpaw_db = ase.db.connect('data/H2.gpaw.db')
H2_GPAW_dict = gpaw_db.get(4)
print "Calculator: ", H2_GPAW_dict.calculator

Calculator:  gpaw


but the parameters are still lost!

In [12]:
H2_GPAW_dict.calculator_parameters

u'{}'

The cause of this can be found in the GPAW source code. ASE uses the "todict" method to combine all the parameters into one dictionary for saving to the DB.

    # name, nolabel, check_state, todict and get_property are hacks
    # for compatibility with ASE-3.8's new calculator specification.
    # In the future, we will get this stuff for free by inheriting from
    # ase.calculators.calculator.Calculator.
    name = 'GPAW'
    nolabel = True

    ...
      
    def todict(self):
        return {}

Clearly some implementation work remains to be done. However, it is actually quite easy to work around this and add your parameters manually! For example, let's set up an FHI-aims calculation with spin polarisation.

In [13]:
from ase.calculators.aims import Aims
atoms = gpaw_db.get_atoms(4)
my_params = {'xc':'PBE', 'spin':'collinear', 'default_initial_moment':2}
calc = Aims(**my_params)
atoms.set_calculator(calc)

The FHI-aims interface isn't actually set up here, so we'll skip doing the calculation.
When dumping this into the DB, arbitrary *strings* can be added to the DB entry, so we just add our parameter dictionary after serialising to a string.

In [14]:
from json import dumps
others_db = ase.db.connect('tmp/H2.others.db')
my_params_str = dumps(my_params)
others_db.write(atoms, my_params=my_params_str)

1

And these are available with the Atoms object:

In [15]:
new_aims_H2 = others_db.select('H=2').next()
new_aims_H2.my_params

u'{"xc": "PBE", "default_initial_moment": 2, "spin": "collinear"}'

We can see that this is identical to the way it has been implemented automatically for the aims calculator.

In [16]:
new_aims_H2.calculator_parameters

u'{"xc": "pbe", "default_initial_moment": 2, "spin": "collinear"}'

This isn't very searchable, though. We can put some of these fields in at a higher level if we like...

In [25]:
for xc, forces in(('PBE',1e-3), ('LDA', 1e-4), ('PBEsol', 1e-5)):
    my_params = {'xc':xc, 'spin':'collinear', 'default_initial_moment':2, 'sc_accuracy_forces':forces}
    calc = Aims(**my_params)
    atoms.set_calculator(calc)
    others_db.write(atoms, **my_params)

These are now searchable!

In [28]:
for row in others_db.select('xc=LDA'):
    print "LDA calculation has force convergence: ", row.sc_accuracy_forces

LDA calculation has force convergence:  0.0001
LDA calculation has force convergence:  0.0001


If you open the web interface, you'll be able to add these new columns.

![](images/db_addfield.png)

### (Lack of) Vasp implementation

The Vasp calculator has a slightly non-standard base and so appears to lack the ability to dump its parameters, and actually causes an error in the process. A GPAW-like hack would sort this out, although it would be nice if someone contributed the real solution...

In [19]:
from ase.calculators.vasp import Vasp
my_params = {'encut':500., 'algo':'Fast', 'isif':3, 'nsw':10}
calc = Vasp(**my_params)
atoms.set_calculator(calc)
others_db.write(atoms, params=dumps(my_params))

AttributeError: Vasp instance has no attribute 'todict'