Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

License: Apache

Exabyte Parser (ExaParser)

Exabyte parser is a python package to extract and convert materials modeling data (eg. Density Functional Theory, Molecular Dynamics) on disk to ESSE/EDC format.


As below:

  • Extract structural information and material properties from simulation data
  • Serialize extracted information according to ESSE/EDC
  • Store serialized data on disk or remote databases
  • Support for multiple simulation engines, including:

The package is written in a modular way easy to extend for additional applications and properties of interest. Contributions can be in the form of additional functionality and bug/issue reports.


ExaParser can be installed as below.

  1. Install git-lfs in order to pull the files stored on Git LFS.

  2. Clone repository:

    git clone
  3. Install virtualenv using pip if not already present:

    pip install virtualenv
  4. Create virtual environment and install required packages:

    cd exaparser
    virtualenv venv
    source venv/bin/activate
    export GIT_LFS_SKIP_SMUDGE=1
    pip install -r requirements.txt


  1. Exaparser will look in the following locations for the config file, and use the first one it finds:

    • The existing file in the root of this repository, if installed as editable source. This won't work for production installs, and is just for testing scenarios.
    • Your user's home directory at ~/.exabyte/exaparser/config
    • A global system configuration at /etc/exabyte/exaparser/config

    Copy the config file from the root of this repo to one of the above locations and edit it.

  2. Edit the config file and adjust parameters as necessary. The most important ones are listed below.

    • Add ExabyteRESTfulAPI to data_handlers parameters list (comma-separated), if not already present. This will enable upload the data into account.

      • New users can register here to obtain an account.
    • Set owner_slug, project_slug, api_account_id, and api_auth_token if ExabyteRESTfulAPI is enabled.

    • Adjust workflow_template_name parameter in case a different template should be used.

    • Adjust properties parameter to extract desired properties; all listed properties will be attempted for extraction.

  3. Run the below commands to extract the data.

source venv/bin/activate

or just call exaparser with the explicit path to the virtualenv binary:

venv/bin/activate/exaparser -w PATH_TO_JOB_WORKING_DIRECTORY


Run the following command to run the tests.


All the passed parameters are optional, with the defaults being python3, venv, and unit, respectively.

The script will create a virtual environment and populate it, so there's no need to create one manually for testing.

Note that the testing virtualenv uses the requirements-dev.txt file, where a production usage should use the requirements.txt file. This avoids installing test dependencies when not needed.


This repository is an open-source work-in-progress and we welcome contributions. We suggest forking this repository and introducing the adjustments there, the changes in the fork can further be considered for merging into this repository as explained in GitHub Standard Fork and Pull Request Workflow.


The following diagram presents the package architecture.


Here's an example flow of data/events:

  • User invokes the parser with a path to a job working directory.

  • The parser initializes a Job class to extract and serialize the job.

  • Job class uses Workflow parser to extract and serialize the workflow.

  • The Workflow is initialized with a Template to help the parser to construct the workflow.

    • Users can add new templates or adjust the current ones to support complex workflows.
  • Workflow parser iterates over the Units to extract

    • application-related data
    • input and output files
    • materials (initial/final structures) and properties
  • The job utilizes Compute classes to extract compute configuration from the resource management system.

  • Once the job is formed it is passed to Data Handler classes to handle data, e.g. storing data in Exabyte platform.


Workflow templates are used to help the parser extracting the data as users follow different approaches to name their input/output files and organize their job directories. Readers are referred to Documentation for more information about the structure of workflows. As explain above a Shell Workflow Template is used by default to construct the workflow. For each unit of the workflow one should specify stdoutFile, the relative path to the file containing the standard output of the job, workDir, the relative path to directory containing data for the unit and the name of input files.


Desirable features for implementation:

  • Implement PBS/Torque and SLURM compute parsers
  • Implement VASP and Espresso execution unit parsers
  • Add other data handlers
  • Add complex workflow templates


  1. Exabyte Source of Schemas and Examples (ESSE), Github Repository
  2. Vienna Ab-initio Simulation Package (VASP), official website
  3. Quantum ESPRESSO, Official Website


A python package converting materials modeling (eg. DFT) data on disk to a structured representation according to ESSE and ready for indexing and database storage.







No packages published