Skip to content


Repository files navigation

python RDKit badge MIT license DOI Citations

SURF: a simple user-friendly reaction format

This repository containes example files and code to read, write and transform SURF files containing chemical reaction data.


First, clone this repository and then install the necessary requirements by running the following command (assuming you have pip installed):

pip install -r requirements.txt

Now you should be ready to go.

SURF Structure

SURF is a tabular file structure that is both human- and machine-readable. In a SURF spreadsheet, each row stores data of one reaction. The column headers structure the data and are split into constant (CC) and flexible (FC) categories. CCs never change and should be always present, independent of the number of reaction components. They capture the identifiers and provenance of the reaction as well as basic characteristics (reaction type, named reaction, reaction technology) and conditions (temperature, time, atmosphere, scale, concentration, stirring/shaking). Add-ons, such as the procedure or comments, belong to the CCs as well. The FCs describe the more variable part of a reaction, the different starting material(s), solvent(s), reagent(s) and product(s). Each reaction component is described by at an identifier such as the CAS number or molecule name, and a SMILES or InChI string storing the chemcial structure. While the SMILES/InChI string is available for every compound and can also serve as structural input for machine learning models, the CAS number, even though not always available, can be handy for chemists in the lab to order, itemize and find chemicals. For the starting material(s) and reagent(s), e.g. catalyst, ligand, additive, in addition to the two identifiers, a third column is added to cover the stochiometric amount (equivalents). For products, the respective yield and yield type is referenced. The flexibility of SURF allows capturing multiple starting materials and reagents, as these can be accommodated by adding three additional columns with (CAS, SMILES/InChI, and equivalents). If desired, further columns for additional identifiers like names or lot numbers can be added.

The file data/surf_template.txt provides an example of the SURF structure with five minisci-type reactions from literature.

Combining Multiple SURF Files

To concatenate multiple SURF files into one larger file, put all SURF files to be combined into one folder and run the following:

python <folder path> <output file>


ls data/hte-data
    hte-1.txt hte-2.txt hte-3.txt hte-4.txt ...
python data/hte-data data/hte-data.txt

Transformations for Interoperability

The idea of SURF is not to replace existing reaction file structures, but have a human- and machine-readable format that is interoperable. Below we describe examples of how to transform SURF into other existing formats and back.


The Open Reaction Database (ORD) is an open-access schema and infrastructure for structuring and sharing organic reaction data. Translating SURF files into the protocol buffers format used by the open reaction database, run the following:

python <input SURF file> <output ORD file>

To get all the options of the script, run:

python --help

If the SURF file does not contain any or only partial provenance information, the user can provide personal information with the --username, --email, --orcid and --organization options.


python data/surf_template.txt data/surf_template.pbtxt --username "Alex Mueller" --email ""


To translate protocol buffers files used by the open reaction database back into the SURF format, run the following:

python <input ORD file> <output SURF file>

To get all the options of the script, run:

python --help


python data/ord_search_result.pb data/ord_search_result.txt --validate

SURF to Reaction SMILES

Reaction SMILES are a frequently used representation for chemical reactions. However, they just represent the molecular structures involved in the reaction and lack detailed information on conditions, equivalents and analytics. We therefore only provide a way to translate SURF files into Reaction SMILES but not vice versa:

python <input SURF file> <output RXNSMILES file>

To get all the options of the script, run:

python --help


python data/surf_template.txt data/surf_template.rxnsmi


The Unified Data Model (UDM) is an open, extendable and freely available data format for the exchange of experimental information about compound synthesis and testing, developed by the Pistoia Alliance. To translate SURF files into UDM XML files, run the following:

python <input SURF file> <output UDM file>

To get all the options of the script, run:

python --help


python data/surf_template.txt data/surf_template.xml


To translate UDM XML files into SURF files, run the following:

python <input UDM file> <output SURF file>

To get all the options of the script, run:

python --help


python data/udm_file.xml data/surf_file.txt


If you are using SURF in your project, please cite the following reference:

  title={Simple User-Friendly Reaction Format},
  author={Nippa, David F. and M{\"u}ller, Alex T. and Atz, Kenneth and Konrad, David B. and Grether, Uwe and Martin, Rainer E. and Schneider, Gisbert},


Simple User-Friendly Reaction Format







No releases published