Skip to content

davidlmobley/open-forcefield-data

 
 

Repository files navigation

open-forcefield-data

Datasets for open forcefield parameterization and development

General protocol for filtering ThermoML data for pure solvent data

ThermoML data compiled and filtered using ThermoPyL tool developed by Chodera Lab @ MSKCC (https://github.com/choderalab/thermopyl)

FILTER PROCEDURE:

  1. Pull full ThermoML archive

  2. Discard known erroneous data (j.fluid.2013.12.014 the only one I know of now)

  3. Define properties of interest to pass filter

  4. Allow only C, O and H atoms to pass

  5. Generate SMILES formulae from component names (NIH CirPy module)

  6. Apply filter for "=" and "#" to SMILES formulae (get rid of double and triple bonding)

  7. Generate CAS from component names (CirPy)

  8. Apply temperature and pressure filters (250 K - 400 K and 1 atm - 1000 atm)

  9. Keep only liquid phase data points

  10. Separate final large dataframe into subframes by property of interest a. Remove data with no associated uncertainties from subframes

  11. Generate counts by component and journal article for all dataframes

  12. Save everything as separate text .csv

Energetic data for model systems

Toy dataset for alkanes, alcohols, and ethers

Christopher I. Bayly developed a toy dataset of potential molecules of interest which is deposited in the "Model Systems" directory in the "AlkEthOH_distrib" subdirectory. Construction of this set is described in the README.txt there, which should be converted to md.

About

Datasets for open forcefield parameterization and development

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 73.8%
  • Python 26.2%