Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geometry unit conversion factor #8

Open
loriab opened this issue Aug 1, 2017 · 10 comments
Open

Geometry unit conversion factor #8

loriab opened this issue Aug 1, 2017 · 10 comments

Comments

@loriab
Copy link
Collaborator

loriab commented Aug 1, 2017

posting for @andysim

Instead of providing the units, it may make sense to provide conversion factors to atomic units because they can vary fairly significantly between packages.

Providing an input_units_to_au field kills both different units and different physconst conversions with one stone. Helps universal printing labels like Geometry (in Bohr * 1.00000000):.

May also consider OpenMM's units solution: https://github.com/pandegroup/openmm/blob/master/wrappers/python/simtk/unit/unit_definitions.py

@chrisjsewell
Copy link

chrisjsewell commented Aug 10, 2017

Hey guys, big fan of your aspirations here, I wish people would put as much thought into their output formats as they do the rest of the program! I actually wrote the jsonextended package to help me parse and manipulate the data I'm working with from Gaussian, CRYSTAL, LAMMPS, etc, in the same kind of format you envisage. In particular, I thought you might be interested in how I am handling unit standardisation; with a "combine-apply-split" methodology, utilising the pint package. Here's a quick demo:

  1. read in your front page example output:
import json
from jsonextended import edict
test = json.load('test.json')
edict.pprint(test,depth=1)
driver:     energy
error:      
method:     {...}
molecule:   {...}
provenance: {...}
raw_output: Output storing was not requested.
return_value: {...}
success:    True
variables:  {...}
  1. Combine all ('val','units') leaf nodes into pint.Quantity objects:
from jsonextended import units as eunits
withunits = eunits.combine_quantities(test,'units','val')
edict.pprint(withunits,depth=2)
driver:       energy
error:        
method:       
  basis:      sto-3g
  expression: SCF
molecule:     
  atoms:    [He, He]
  geometry: [[0 0 0] [0 0 1]] Å
provenance:   
  creator: QM Program
  routine: program.run_json
  version: 1.1rc1
raw_output:   Output storing was not requested.
return_value: -5.433191881443323 E_h
success:      True
variables:    
  NUCLEAR REPULSION ENERGY: 2.11670883436 E_h
  ONE-ELECTRON ENERGY:      -11.67399006298957 E_h
  SCF DIPOLE X:             0.0 E_h
  SCF DIPOLE Y:             0.0 E_h
  SCF DIPOLE Z:             0.0 E_h
  SCF N ITERS:              2.0
  SCF TOTAL ENERGY:         -5.433191881443323 E_h
  SCF TWO-ELECTRON ENERGY:  4.124089347186247 E_h
  1. Apply a unit schema to the data, to convert specified fields to the required units.
newunits = eunits.apply_unitschema(withunits,{'geometry':'nm',
                                              'return_value':'kcal',
                                              'variables':{'SCF*':'eV'}},
                                   use_wildcards=True)
edict.pprint(newunits,depth=2)
driver:       energy
error:        
method:       
  basis:      sto-3g
  expression: SCF
molecule:     
  atoms:    [He, He]
  geometry: [[ 0. 0. 0. ] [ 0. 0. 0.1]] nm
provenance:   
  creator: QM Program
  routine: program.run_json
  version: 1.1rc1
raw_output:   Output storing was not requested.
return_value: -5.661406639574504e-21 kcal
success:      True
variables:    
  NUCLEAR REPULSION ENERGY: 2.11670883436 E_h
  ONE-ELECTRON ENERGY:      -11.67399006298957 E_h
  SCF DIPOLE X:             0.0 eV
  SCF DIPOLE Y:             0.0 eV
  SCF DIPOLE Z:             0.0 eV
  SCF N ITERS:              2.0 eV
  SCF TOTAL ENERGY:         -147.84466590569593 eV
  SCF TWO-ELECTRON ENERGY:  112.22217528934715 eV
  1. Split the pint.Quantity objects back into their ('val','units') pairs:
removeunits = eunits.split_quantities(newunits,'units','val')
edict.pprint(removeunits,depth=3)
driver:     energy
error:      
method:     
  basis:      sto-3g
  expression: SCF
molecule:   
  atoms: [He, He]
  geometry: 
    units: nanometer
    val:   [[ 0. 0. 0. ] [ 0. 0. 0.1]]
provenance: 
  creator: QM Program
  routine: program.run_json
  version: 1.1rc1
raw_output: Output storing was not requested.
return_value: 
  units: kilocalorie
  val:   -5.661406639574504e-21
success:    True
variables:  
  NUCLEAR REPULSION ENERGY: 
    units: hartree
    val:   2.11670883436
  ONE-ELECTRON ENERGY: 
    units: hartree
    val:   -11.67399006298957
  SCF DIPOLE X: 
    units: electron_volt
    val:   0.0
  SCF DIPOLE Y: 
    units: electron_volt
    val:   0.0
  SCF DIPOLE Z: 
    units: electron_volt
    val:   0.0
  SCF N ITERS: 
    units: electron_volt
    val:   2.0
  SCF TOTAL ENERGY: 
    units: electron_volt
    val:   -147.84466590569593
  SCF TWO-ELECTRON ENERGY: 
    units: electron_volt
    val:   112.22217528934715

Ta,
Chris

@tovrstra
Copy link
Contributor

jsonextended and pint are very impressive but I guess, for the sake of defining a JSON schema, they may add too much complexity? It would be nice though to design the schema such that it plays nice with these packages.

jsonextended and pint do not seem solve the original problem mentioned by @loriab, namely that different QC codes have different definitions of unit conversion factors, e.g. they use (slightly) different numbers to convert from Bohr to Angstrom. Is there a way to get around this?

@dgasmith
Copy link
Collaborator

@tovrstra Agreed, I think we can recommend tools. However, the spec itself is tool independent.

Using slightly different conversion factors is tricky. We could take the following steps:

  • Request that all input/output values to QM programs be in Hartree
  • MolSSI could build a repository that had the updated values for everyone to use.

@tovrstra
Copy link
Contributor

@dgasmith So you suggest to drop any support for different units and require all numbers to use atomic units?

@wadejong
Copy link
Collaborator

wadejong commented Aug 18, 2017 via email

@wadejong
Copy link
Collaborator

wadejong commented Aug 18, 2017 via email

@andysim
Copy link

andysim commented Aug 18, 2017

This is a very tricky problem, with many different codes using different conversion factors and units in their output. In a JSON context, one possible approach would be to have an extra field that specifies the conversion factor for each quantity (length, energy, etc.) used by the program of interest to some specific convention, e.g. atomic units. This would allow a.u. input to be converted internally by any code, using their native conventions, as usual. It would also provide a mechanism for converting output received to a 'standard' form (a.u. in the example I provided).

@matt-chan
Copy link

Instead of accepting a variety of units, it would be nice to work with one set. That way, a simple project implementing the spec wouldn't be required to include code to convert from a plethora of possible units.

As others have suggested we would need an agreed standard (molssi or iupac) for conversion.

We could include test cases which would help codes that don't natively work with those units to minimize bugs. (Even if we decide to accept multiple unit systems in the spec, it'd still be a good idea to have the tests)

@cryos
Copy link
Collaborator

cryos commented Aug 30, 2017

Agreed, strongly recommend one variety of units. Support others, but have a recommended set of units for the format. Agreed conversion factors to apply would then be available.

@tovrstra
Copy link
Contributor

@dgasmith @wadejong @matt-chan @cryos Default units and conversion factors cannot work, some of which is explained in earlier comments. I'll try to summarize the problem:

Different programs work in different units internally and they usually already have conversion factors to transform results to other units before printing. These aspects of existing software will not change. If you settle on standard units and conversion factors, one of the following two things is going to happen and neither are great:

  1. Such programs use the conversion factors of the JSON spec to write results in an agreed unit, which may be inconsistent with the usual output of that program.
  2. Such programs may ignore the standard conversion factors and become inconsistent with the spec.

A cleaner solution would be to let every program write results in a JSON file in its internal units, and to let it specify what these units mean. Then the receiver of the JSON data is free to handle the units in whichever way he/she likes. If conversion is needed, the most reasonable choice would be to take the units from the NIST website (which get refined occasionally as more literature becomes available). The disadvantage is that the spec becomes more complicated.

P.S. Most QC programs work in atomic units, which may not cause too much trouble. As soon as you want to exchange data with MM programs, all sorts of units are being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants