Skip to content
A python package converting materials modeling (eg. DFT) data on disk to a structured representation according to ESSE and ready for indexing and database storage.
Python Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
src
tests
.gitattributes
.gitignore
LICENSE
README.md
config
requirements.txt
run-tests.sh

README.md

Exabyte Parser (ExaParser)

Exabyte parser is a python package to extract and convert materials modeling data (eg. Density Functional Theory, Molecular Dynamics) on disk to ESSE/EDC format.

Functionality

As below:

  • Extract structural information and material properties from simulation data
  • Serialize extracted information according to ESSE/EDC
  • Store serialized data on disk or remote databases
  • Support for multiple simulation engines, including:

The package is written in a modular way easy to extend for additional applications and properties of interest. Contributions can be in the form of additional functionality and bug/issue reports.

Installation

ExaParser can be installed as below.

  1. Install git-lfs in order to pull the files stored on Git LFS.

  2. Clone repository:

    git clone git@github.com:Exabyte-io/exaprser.git
  3. Install virtualenv using pip if not already present:

    pip install virtualenv
  4. Create virtual environment and install required packages:

    cd exaprser
    virtualenv venv
    source venv/bin/activate
    export GIT_LFS_SKIP_SMUDGE=1
    pip install -r requirements.txt

Usage

  1. Open config and adjust parameters as necessary. The most important ones are listed below.

    • Add ExabyteRESTfulAPI to data_handlers parameters list (comma-separated), if not already present. This will enable upload the data into Exabyte.io account.

      • New users can register here to obtain an Exabyte.io account.
    • Set owner_slug, project_slug, api_account_id, and api_auth_token if ExabyteRESTfulAPI is enabled.

    • Adjust workflow_template_name parameter in case a different template should be used.

    • Adjust properties parameter to extract desired properties; all listed properties will be attempted for extraction.

  2. Run the below commands to extract the data.

source venv/bin/activate
./bin/exaparser -w PATH_TO_JOB_WORKING_DIRECTORY

Tests

Run the following command to run the tests.

source venv/bin/activate
sh run-tests.sh

Contribution

This repository is an open-source work-in-progress and we welcome contributions. We suggest forking this repository and introducing the adjustments there, the changes in the fork can further be considered for merging into this repository as explained in GitHub Standard Fork and Pull Request Workflow.

Architecture

The following diagram presents the package architecture.

ExaParser

Here's an example flow of data/events:

  • User invokes the parser with a path to a job working directory.

  • The parser initializes a Job class to extract and serialize the job.

  • Job class uses Workflow parser to extract and serialize the workflow.

  • The Workflow is initialized with a Template to help the parser to construct the workflow.

    • Users can add new templates or adjust the current ones to support complex workflows.
  • Workflow parser iterates over the Units to extract

    • application-related data
    • input and output files
    • materials (initial/final structures) and properties
  • The job utilizes Compute classes to extract compute configuration from the resource management system.

  • Once the job is formed it is passed to Data Handler classes to handle data, e.g. storing data in Exabyte platform.

Templates

Workflow templates are used to help the parser extracting the data as users follow different approaches to name their input/output files and organize their job directories. Readers are referred to Exabyte.io Documentation for more information about the structure of workflows. As explain above a Shell Workflow Template is used by default to construct the workflow. For each unit of the workflow one should specify stdoutFile, the relative path to the file containing the standard output of the job, workDir, the relative path to directory containing data for the unit and the name of input files.

TODO List

Desirable features for implementation:

  • Implement PBS/Torque and SLURM compute parsers
  • Implement VASP and Espresso execution unit parsers
  • Add other data handlers
  • Add complex workflow templates

Links

  1. Exabyte Source of Schemas and Examples (ESSE), Github Repository
  2. Vienna Ab-initio Simulation Package (VASP), official website
  3. Quantum ESPRESSO, Official Website
You can’t perform that action at this time.