Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reworked boolean-only Option processing framework for the OMR Compiler #3675

wants to merge 10 commits into
base: master
Choose a base branch


Copy link

@nbhuiyan nbhuiyan commented Mar 18, 2019

This PR introduces some of the main components of the new option processing framework for the OMR Compiler that aims to replace the existing Options class. This was motivated by the fact that the current Options class is difficult for developers to work with and not in an ideal state for an open-source project (see #2835). This is the first of a handful of PRs in both OMR and OpenJ9 necessary to migrate to a new option processing framework. During the transition period, the new and old implementations will co-exist, but not used at the same time by default.


Some of the main components of the new options processing framework being introduced in this PR are the following:

  • OMROptions.json
  • options-gen
  • A new option table design
  • OptionProcessors
  • OptionsBuilder
  • CompilerOptionsManager (will be renamed as OptionsManager once existing Options class is deprecated)
  • CompilerOptions (will be renamed as Options once existing Options class is deprecated)

The sections below discusses the roles of each of these components, and how they work together.


OMROptions.json contains all the OMR compiler options as an array of JSON objects and is consumed by the options-gen script at build time. Here is an example of an entry in the JSON array followed by the description of the fields:

    "name": "x86UseMFENCE",
    "category": "M",
    "desc": "Enable to use mfence to handle volatile store",
    "option-member": "TR_X86UseMFENCE",
    "type": "bool",
    "default": "false",
    "processing-fn": "setTrue",
    "subsettable": "no"
  • name: The name field is used as an identifier for an option. Every entry should have a unique name. Options provided in the command line and the environment are matched against this field.
  • category: The category field is used to group the options into different categories (eg, codegen, optimization, logging, etc). In this case, "M" stands for miscellaneous options. This information will be used when outputting options help text.
  • desc: Option description text explaining what this option does. It will be used when outputting options help text.
  • option-member: This field is used to identify the field in the options class that stores the value of the option.
  • type: Type of the option data.
  • default: This is the value the option is initialized as, and will remain that way unless changed through the command line, environment variable, or some other logic after options have been processed. For boolean options, the default is false. Numeric and textual data may use this field to set a default value, and will also have the option of using an additional field in the json object to provide as a parameter to the processing function.
  • processing-fn: This is the name of the function in the TR::OptionProcessors class that will be used to process this option if it appears on the command line or environment variable.
  • subsettable: This field is used to set whether an option can be a part of an option subset to apply to specific method compilations. This capability will be added in the near future, so currently this field is not used during the processing of options.

To add a new option, all that you would need to do is add a new entry anywhere in the json file and re-build. If the available option processing functions do not meet your requirements, you may define a new function in TR::OptionProcessors and specify its name in the processing-fn field. Currenlty, OMROptions.json is a very large file because it contains downstream project options as well. Downstream project options should exist in their own Options.json. Since there are a lot of options that can be moved downstream, this migration will happen over time as downstream projects get support for the new option processing framework.


Simplifying the process of adding new options was possible through the use of the options-gen script run at build time. The options-gen script takes 2 input files: OMROptions.json containing the OMR options, and Options.json (or any other name you may define as a command line option to the script) containing the project-specific options. Other command-line options include the path to the input files, as well as the -platform option that will exclude or include platform-specific options (this mechanism is yet to be implemented in the near future, but it will basically be another field in the JSON options object allowing you to specify platform(s) to apply an option to)

The script takes the entries in OMROptions.json and Options.json, combining them into a single list, and processing them into an internal representation as an instance of class OptionTable. The internal representation is then used to generate a hash table that is used at run-time when processing the options. At build time, options-gen determines the index The name of each entry is hashed using a hashing algorithm that is also used at run-time to get the index into the bucket of the hash table containing these processed entries.

The options-gen script currently outputs 6 files:

  • an array of case-insensitive values associated with the characters used in the hashing function to determine the index into the hash table. This array is indexed into using the ascii values.
  • contains macro definitions regarding the option table properties such as min/max hash value, table size, etc
  • contains an aggregate initialization list for the hash table
  • contains the data members of the CompilerOptions class
  • contains the body of a switch statement that helps translating between the option word masks used in the existing Options class and the corresponding pointer to member of the new CompilerOptions class. This is temporarily in place to enable querying the new options class using the existing getOption(TR_CompilationOptions o) and setOption(TR_CompilationOption o) functions.
  • contains the body of a switch statement that translates the existing option masks to strings that can be used for debugging cases of option setting mismatches

A new option table design

Previously, the option table was an array alphabetically ordered by the option names. Finding the entry corresponding to a command line option string in the array required a binary search in the array, comparing the string each time. In the new framework, the option table is a hash table that is indexed into using a hash function.


{ {aa},


{ {{aa}},

In the example above, notice how the hash table is not perfect, or minimal, although a lot of work was done to get close to that trying to mimic gperf's perfect hash function generating mechanism. This is a possible area of improvement we could perhaps revisit later to completely eliminate string comparisons during the processing of command line options. The hash function simply gives us the index into a bucket in the hash table that contain a single entry, or a number of entries that hash to the same value, requiring us to perform string comparisons to get the right entry. At the moment, it doesn't seem worth the trouble as we only need to use the option table during once during the initialization phase.


OptionProcessors is an extensible class that contains the option processing functions. There will be a set of standard processing functions (eg, setTrue, setFalse, setInt32, setStringFromOptionArg, etc) that you may use for processing an option you add to the json file. If you would need your option to be processed in a very specific way not handled by existing processors (eg, option affecting multiple members of the option class), you may simply define your own and specify its name in the processing-fn field of the json object.


This class handles the processing of command line and environment variable options and processing of the options (as well as the option sets) during the initialization.


This class contains all of the options data members by including the file generated by options-gen.


CompilerOptionsManager is an extensible class handling the initialization of options, looking up entries in the option table, obtaining the CompilerOptions object specific to a compilation, setting a project's default options (ones that can only be determined at runtime).

In addition to the components above, this PR also implements a solution using preprocessor directives to have both old and new option processing frameworks coexist within #if/#else blocks. It is also possible to verify that the outcome of querying CompilerOptions is same as the existing Options class when NEW_OPTIONS_DEBUG is defined.

Currently, the new options are initialized with the existing options to make the transition easier, but eventually this will be managed by the CompilerOptionsManager.

You may build using CMake to test adding and querying of options. To use the new options, run CMake configure with -DOMR_COMPILER_NEW_OPTIONS.

To enable checking for discrepancies between the new and old options, run CMake configure with the additional flag: -DOMR_COMPILER_NEW_OPTIONS_DEBUG.

@0xdaryl 0xdaryl self-assigned this Mar 21, 2019
Copy link

@fjeremic fjeremic commented Mar 25, 2019

Simplifying the process of adding new options was possible through the use of the options-gen script run at build time.

@dnakamura FYI for CMake changes.

@fjeremic fjeremic added the cmake label Mar 25, 2019
@nbhuiyan nbhuiyan force-pushed the options-gen branch 2 times, most recently from 8e06fa5 to 6192dc7 May 27, 2019
@nbhuiyan nbhuiyan marked this pull request as ready for review May 28, 2019
@nbhuiyan nbhuiyan changed the title WIP: Reworked boolean-only Option processing framework for the OMR Compiler Reworked boolean-only Option processing framework for the OMR Compiler May 29, 2019
Copy link

@0xdaryl 0xdaryl commented Jul 30, 2019

I know this doesn't strictly need it (according to GitHub), but would you mind rebasing this? I'd like to schedule a few tests with it.

nbhuiyan added 10 commits Jul 31, 2019
 * OMROptions.json
 * A new option table design
 * A new option processing and querying framework

Signed-off-by: Nazim Uddin Bhuiyan <>
This change was necessary due to some build compilers not having
full C++11 support. A new file, is
generated by options-gen.

Signed-off-by: Nazim Uddin Bhuiyan <>
Previously, a vector of OptionTableItem was used to represent
a bucket in the hash table, and the table consisted of an array
of such vectors representing buckets.

Unfortunately, older build compilers with incomplete C++11 support
may not have support for using initializer lists for initializing
std::vector, which is a non-aggregate type.

Signed-off-by: Nazim Uddin Bhuiyan <>
This commit adds a number of changes:
* Added new CMake configuration option OMR_COMPILER_NEW_OPTIONS. By
  default, this option is OFF, and would need to be turned on if you
  want to use the new options instead. Another configuration option,
  OMR_COMPILER_NEW_OPTIONS_DEBUG enables comparing results of querying
  the old and the new options.
* Add -outputdir commandline option to options-gen script to specify
  output dir to write generated files to
* Generate .inc files in build/compiler/control instead of generating
  them in the source tree
* Make generate-options target as a requirement for compiler projects
  instead of being part of ALL (provided OMR_COMPILER_NEW_OPTIONS is ON)

Signed-off-by: Nazim Uddin Bhuiyan <>
options-gen now generates the body of a swith statement that returns
the pointer to member in the new CompilerOptions class based on the
TR_CompilationOptions enum value. This opens up a way to remove the
need to replace all instances of boolean option querying using
getOption(TR_CompilationOptions o), simplifying the transition phase.

Signed-off-by: Nazim Uddin Bhuiyan <>
Signed-off-by: Nazim Uddin Bhuiyan <>
This is required for having a mechanism to translate option
masks to strings to help with debugging options.

Signed-off-by: Nazim Uddin Bhuiyan <>
This is implement the initial option set support in the new

Signed-off-by: Nazim Uddin Bhuiyan <>
Some of the major changes this commit introduces:
* The new options are initialized during the initialization
  of TR::Options as a temporary solution for the transition phase.
* The option processing responsibility of looking up entries
  in the options hash table is now performed by OptionsBuilder
  instead of CompilerOptionsManager.
* The TR::Options::getOptions and TR::Options::setOptions methods
  now set/query both old and new options, and comparing the results.
  This comparision is in place to identify any mismatches early on
  during the transition phase.
* Every instance of TR::Options class has a corresponding
  TR::CompilerOptions object.
* OptionsBuilder's option processing mechanism has been improved
  to handle option filter expressions and processing them into
  option sets.

Signed-off-by: Nazim Uddin Bhuiyan <>
Added/removed entries in OMROptions.json to reflect the
recent additions/deletions made in OMROptions.hpp and
the option table.

Signed-off-by: Nazim Uddin Bhuiyan <>
@nbhuiyan nbhuiyan requested a review from rwy0717 as a code owner Jul 31, 2019
Copy link
Member Author

@nbhuiyan nbhuiyan commented Jul 31, 2019

@0xdaryl I have rebased the branch. To get a build that uses the new options, you will need to run CMake configure with -DOMR_COMPILER_NEW_OPTIONS=ON as the new options are off by default.

Copy link

@0xdaryl 0xdaryl commented Sep 12, 2019

This PR introduces a new methodology in the core OMR compiler for constructing some of the data structures needed to integrate the compiler technology in different language environments. Namely, the use of build-time tooling to synthesize static data tables from a metadata description of each according to the requirements of the target language environment. As this marks a new direction in the compiler architecture and will impose a tooling dependency on downstream consuming projects, I will provide some more context around this design choice.

The compiler functionality in OMR is intended to be integrated into the fabric of language runtimes. It was architected to be extendable by providing a set of APIs, data structures, and default functionality but allows consuming projects to extend or override them if needed. Many operations in the compiler are driven by data in static tables or enumerated lists. For example, IL opcodes, IL datatypes, compiler options, IL opcode evaluator tables, instruction opcodes, etc. However, the ability to construct this information in such a way as to allow extension and customization is challenging due to limitations in the C++ language and a desire to represent these structures in a format that is straightforward to understand. Generating such data at build-time driven by a metadata description of the structure in JSON format allows the build to synthesize uniform C++ structures composed of OMR, optional processor architecture, and optional downstream project specializations.

Python was selected as the tool to transform the JSON representation from each specialization and produce the C++ files to be included in the build. The specific choice of Python 3.x was driven by a number of factors:

  • The availability of a rich set of standard modules integrated with Python as well as language features for easily performing a number of tasks required by this PR (e.g., JSON evaluation, command-line parsing)
  • The ubiquity of Python as a DevOps tool and its availability in all build environments supported by OMR
  • OMR already has a build-time dependency on Python 3.x via the JitBuilder language bindings introduced in 2018, although this dependency was only exposed to downstream projects if they are consuming JitBuilder and using a language binding
  • Developer familiarity with Python syntax and semantics over other scripting languages
  • Python 3.x was chosen over Python 2.7 due to the end-of-life of Python 2.7 on January 1, 2020

One of the implications of this design choice is that downstream projects consuming and building with the OMR compiler will now require Python 3.x as part of their build pipeline (this is a build-only dependence, not runtime). This is not unlike other compiler frameworks such as LLVM that impose a build-time dependency on Python.

During the design discussions for this PR, developers raised concerns about the possible extension of this approach to generating C++ functions. The consensus, however, was that at this time this approach should be limited to auto-generating data only as no reasonable use cases for functions were conceived (other than getters/setters and simple initializers). Any future proposal for auto-generated functions should be discussed by the community for merit.

Looking beyond this PR, this technique of generating data tables at build-time to facilitate extension and integration with downstream projects will be applied to other data structures and enums in the OMR compiler, such as the IL opcode table, IL data types, simplifier|value propagation|tree evaluator tables, etc. Furthermore, the availability of Python as part of the DevOps toolchain will allow the OMR compiler (or other parts of OMR) to explore other ways to simplify the build, test, or integration of OMR into downstream projects.

Copy link

@DanHeidinga DanHeidinga commented Feb 18, 2021

Some discussion about changes to option processing from a downstream consumer of OMR: eclipse-openj9/openj9#6919 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants