Add debug flag -RegexBytecodeDebug and initial RegExp bytecode refactoring #3915

dilijev · 2017-10-10T04:37:49Z

(Note: I'll rebase this to a single commit shortly.)

-RegexDebug displays various views of RegExp compilation in ChakraCore. In an effort to improve the memory efficiency of the RegExp bytecode that makes up our compiled RegExp programs, this change adds a new view under the additional flag -RegexBytecodeDebug to view the bytecode binary layout.

For example, for this test case:

let re = /[ab]c+(d|e)?/i;
console.log(re.test('abcce')); // true

Running ch.exe -RegexDebug will display some output including the AST which is used to generate the bytecode. The bytecode is displayed last (in a human-readable disassembly), and looks like this:

REGEX PROGRAM /[ab]c+(d|e)?/
Program {
    source:       [ab]c+(d|e)?
    flags:        ignorecase
    numGroups:    2
    numLoops:     0
    instructions: {
        L0000: SyncToSetAndBackup(set: [Cc], backup: [1-inf])
        L0040: SyncToSetAndConsume(set: [ABab])
        L0078: ChompSet<Plus>(set: [Cc])
        L00b0: TryMatchSet(set: [DEde], failLabel: L0110)
        L00f0: DefineGroupFixed(groupId: 1, length: 1, noNeedToSave: true)
        L0110: Succ()
    }
}

Now you can run ch.exe -RegexDebug -RegexBytecodeDebug and see the following more detailed view of the disassembly which includes the actual bytecode layout.

REGEX PROGRAM /[ab]c+(d|e)?/
Program {
    source:       [ab]c+(d|e)?
    flags:        ignorecase
    numGroups:    2
    numLoops:     0
    instructions: {
        L0000: (0x040 bytes) SyncToNegatedSetAndBackup(set: [Cc], backup: [1-inf])
            0x000001C19403C128[+0x008](0x008) [Inst]: 2f000000 00000000
            0x000001C19403C130[+0x010](0x028) [SetMixin<false>]: 00000000 00000000 00000000 00000000 08000000 08000000 00000000 00000000 00000000 00000000
            0x000001C19403C158[+0x038](0x008) [BackupMixin]: 01000000 ffffffff
        L0040: (0x038 bytes) SyncToNegatedSetAndConsume(set: [ABab])
            0x000001C19403C168[+0x008](0x008) [Inst]: 27000000 00000000
            0x000001C19403C170[+0x010](0x028) [SetMixin<false>]: 00000000 00000000 00000000 00000000 06000000 06000000 00000000 00000000 00000000 00000000
        L0078: (0x038 bytes) ChompSet<Plus>(set: [Cc])
            0x000001C19403C1A0[+0x008](0x008) [Inst]: 4c000000 00000000
            0x000001C19403C1A8[+0x010](0x028) [SetMixin]: 00000000 00000000 00000000 00000000 08000000 08000000 00000000 00000000 00000000 00000000
        L00b0: (0x040 bytes) TryMatchSet(set: [DEde], failLabel: L0110)
            0x000001C19403C1D8[+0x008](0x008) [Inst]: 58000000 00000000
            0x000001C19403C1E0[+0x010](0x028) [SetMixin<false>]: 00000000 00000000 00000000 00000000 30000000 30000000 00000000 00000000 00000000 00000000
            0x000001C19403C208[+0x038](0x004) [TryMixin]: 10010000
        L00f0: (0x020 bytes) DefineGroupFixed(groupId: 1, length: 1, noNeedToSave: true)
            0x000001C19403C218[+0x008](0x008) [Inst]: 3a000000 00000000
            0x000001C19403C220[+0x010](0x004) [GroupMixin]: 01000000
            0x000001C19403C224[+0x014](0x004) [FixedLengthMixin]: 01000000
            0x000001C19403C228[+0x018](0x001) [NoNeedToSaveMixin]:01
        L0110: (0x010 bytes) Succ()
            0x000001C19403C238[+0x008](0x008) [Inst]: 02000000 00000000
            0x000001C19403C230[+0x000](0x010) [NopInst]: (no unique data -- skipping)
    }
}

Background and Future Opportunities

The following is some background on what you're seeing above and possible optimizations this exercise has illuminated.

The bytecode is constructed by using placement-new to allocate structs representing bytecode operations into a memory buffer. Since every opcode (aka instruction "tag") value fits into a single byte (at present), the above view makes it clear how much internal fragmentation results from adding padding for small quantities like the opcode (aka instruction tag), and small quantities.

For many opcodes for small regexes, many quantities also fit into a single byte or a short, instead of an int32, so there is a lot of support for the idea of adding variable-length encoding to save space.

Some portions of the opcode data like SetMixin are wasteful by always including space for a pointer (when we might know from the AST that this is guaranteed to be nullptr before compiling the bytecode), and we could split that opcode into two separate operations to save the 8 bytes used for the pointer when it is not needed. This would also remove a branch from execution and save on runtime cost.

Additionally, the bit vector representing the first 256 characters may be wasteful because 128-255 are not ASCII and characters in that range may be uncommon enough that it would be more efficient overall to encode that information in the same data structure used for other unicode code points (the data structure stored by the aforementioned sometimes-null pointer).

Many opcodes encode one or more booleans which get padded out to 4 or 8 bytes (depending) for a single bit of information. Encoding that information in a template param would follow the practice of other places in the bytecode and significantly save on the size of those bytecode layouts.

All of the bytecodes start at offset [+0x008] because I'm skipping over the Debug-only vtable pointer when displaying the layout. I'm planning on refactoring out the virtual Print method (in favor of a static alternative) because this code already uses static dispatch based on the bytecode's opcode in many places, and this vtable pointer distorts how the bytecode would be laid-out in production.

Throughout this exercise, I've been bouncing ideas off of @Penguinwizzard. Thanks to him for listening and reviewing my work in progress!

MikeHolman · 2017-10-10T04:39:55Z

What does perf look like?

dilijev · 2017-10-10T04:41:44Z

@MikeHolman This is a debug-only visualization tool. No changes to perf. This refactoring introduced a slight memory regression which will disappear when I move the boolean fields into template params.

MikeHolman · 2017-10-10T04:45:26Z

Ah, reading is hard. Thought the optimizations you mention were part of this.

rajatd · 2017-10-10T06:56:59Z

The comments in RegexOpcodes.h - are those left over or intended to serve as pointers for future work?

dilijev · 2017-10-10T20:23:39Z

@rajatd Thanks, I meant to remove most of those comments. I'll do so. Any other TODOs left over in this patch will be indicators for future work.

dilijev · 2017-10-10T20:47:08Z

I will rebase this PR soon. Original commits can be found at dilijev:re-bc-0 in case that aids in reviewing.

dilijev · 2017-10-12T19:40:01Z

@dotnet-bot test Windows 7 ci_dev12_x64_debug please

dilijev · 2017-10-12T19:42:34Z

^ ah, we removed the legacy checks so the dev12 check doesn't exist anymore.

/cc @Penguinwizzard I updated the set of required checks for master. (FYI @MSLaguana)

Penguinwizzard

LGTM

dilijev · 2017-10-12T21:36:53Z

    PRINT_BYTES(NopInst);

SuccInst

Not important enough to hold back this PR since this results in printing no additional bytes (all of NopInst, SuccInst and FailInst behave the same in this regard).

I'll fix in a subsequent change.

a la

        L0110: (0x010 bytes) Succ()
            0x000001C19403C238[+0x008](0x008) [Inst]: 02000000 00000000
            0x000001C19403C230[+0x000](0x010) [NopInst]: (no unique data -- skipping)

Refers to: lib/Parser/RegexRuntime.cpp:1028 in 27a2cb8. [](commit_id = 27a2cb8, deletion_comment = False)

dilijev · 2017-10-12T21:36:59Z

    PRINT_BYTES(NopInst);

FailInst

Not important enough to hold back this PR since this results in printing no additional bytes (all of NopInst, SuccInst and FailInst behave the same in this regard).

I'll fix in a subsequent change.

Refers to: lib/Parser/RegexRuntime.cpp:1006 in 27a2cb8. [](commit_id = 27a2cb8, deletion_comment = False)

dilijev · 2017-10-12T21:43:46Z

// actual opcode name, even though the logic is basically the same in each definition. See notes below.

Could avoid the duplication with a macro.

This is not important enough to hold back this PR. If I decide to make this change I'll fix it in a subsequent PR.

#WontFix

Refers to: lib/Parser/RegexRuntime.cpp:2113 in 27a2cb8. [](commit_id = 27a2cb8, deletion_comment = False)

@Penguinwizzard

… RegExp bytecode refactoring Merge pull request #3915 from dilijev:re-bc (Note: I'll rebase this to a single commit shortly.) `-RegexDebug` displays various views of RegExp compilation in ChakraCore. In an effort to improve the memory efficiency of the RegExp bytecode that makes up our compiled RegExp programs, this change adds a new view under the additional flag -RegexBytecodeDebug to view the bytecode binary layout. For example, for this test case: ```js let re = /[ab]c+(d|e)?/i; console.log(re.test('abcce')); // true ``` Running `ch.exe -RegexDebug` will display some output including the AST which is used to generate the bytecode. The bytecode is displayed last (in a human-readable disassembly), and looks like this: ``` REGEX PROGRAM /[ab]c+(d|e)?/ Program { source: [ab]c+(d|e)? flags: ignorecase numGroups: 2 numLoops: 0 instructions: { L0000: SyncToSetAndBackup(set: [Cc], backup: [1-inf]) L0040: SyncToSetAndConsume(set: [ABab]) L0078: ChompSet<Plus>(set: [Cc]) L00b0: TryMatchSet(set: [DEde], failLabel: L0110) L00f0: DefineGroupFixed(groupId: 1, length: 1, noNeedToSave: true) L0110: Succ() } } ``` Now you can run `ch.exe -RegexDebug -RegexBytecodeDebug` and see the following more detailed view of the disassembly which includes the actual bytecode layout. ``` REGEX PROGRAM /[ab]c+(d|e)?/ Program { source: [ab]c+(d|e)? flags: ignorecase numGroups: 2 numLoops: 0 instructions: { L0000: (0x040 bytes) SyncToNegatedSetAndBackup(set: [Cc], backup: [1-inf]) 0x000001C19403C128[+0x008](0x008) [Inst]: 2f000000 00000000 0x000001C19403C130[+0x010](0x028) [SetMixin<false>]: 00000000 00000000 00000000 00000000 08000000 08000000 00000000 00000000 00000000 00000000 0x000001C19403C158[+0x038](0x008) [BackupMixin]: 01000000 ffffffff L0040: (0x038 bytes) SyncToNegatedSetAndConsume(set: [ABab]) 0x000001C19403C168[+0x008](0x008) [Inst]: 27000000 00000000 0x000001C19403C170[+0x010](0x028) [SetMixin<false>]: 00000000 00000000 00000000 00000000 06000000 06000000 00000000 00000000 00000000 00000000 L0078: (0x038 bytes) ChompSet<Plus>(set: [Cc]) 0x000001C19403C1A0[+0x008](0x008) [Inst]: 4c000000 00000000 0x000001C19403C1A8[+0x010](0x028) [SetMixin]: 00000000 00000000 00000000 00000000 08000000 08000000 00000000 00000000 00000000 00000000 L00b0: (0x040 bytes) TryMatchSet(set: [DEde], failLabel: L0110) 0x000001C19403C1D8[+0x008](0x008) [Inst]: 58000000 00000000 0x000001C19403C1E0[+0x010](0x028) [SetMixin<false>]: 00000000 00000000 00000000 00000000 30000000 30000000 00000000 00000000 00000000 00000000 0x000001C19403C208[+0x038](0x004) [TryMixin]: 10010000 L00f0: (0x020 bytes) DefineGroupFixed(groupId: 1, length: 1, noNeedToSave: true) 0x000001C19403C218[+0x008](0x008) [Inst]: 3a000000 00000000 0x000001C19403C220[+0x010](0x004) [GroupMixin]: 01000000 0x000001C19403C224[+0x014](0x004) [FixedLengthMixin]: 01000000 0x000001C19403C228[+0x018](0x001) [NoNeedToSaveMixin]:01 L0110: (0x010 bytes) Succ() 0x000001C19403C238[+0x008](0x008) [Inst]: 02000000 00000000 0x000001C19403C230[+0x000](0x010) [NopInst]: (no unique data -- skipping) } } ``` # Background and Future Opportunities The following is some background on what you're seeing above and possible optimizations this exercise has illuminated. The bytecode is constructed by using placement-new to allocate structs representing bytecode operations into a memory buffer. Since every opcode (aka instruction "tag") value fits into a single byte (at present), the above view makes it clear how much internal fragmentation results from adding padding for small quantities like the opcode (aka instruction tag), and small quantities. For many opcodes for small regexes, many quantities also fit into a single byte or a short, instead of an int32, so there is a lot of support for the idea of adding variable-length encoding to save space. Some portions of the opcode data like SetMixin are wasteful by always including space for a pointer (when we might know from the AST that this is guaranteed to be `nullptr` before compiling the bytecode), and we could split that opcode into two separate operations to save the 8 bytes used for the pointer when it is not needed. This would also remove a branch from execution and save on runtime cost. Additionally, the bit vector representing the first 256 characters may be wasteful because 128-255 are not ASCII and characters in that range may be uncommon enough that it would be more efficient overall to encode that information in the same data structure used for other unicode code points (the data structure stored by the aforementioned sometimes-null pointer). Many opcodes encode one or more booleans which get padded out to 4 or 8 bytes (depending) for a single bit of information. Encoding that information in a template <bool> param would follow the practice of other places in the bytecode and significantly save on the size of those bytecode layouts. All of the bytecodes start at offset [+0x008] because I'm skipping over the Debug-only vtable pointer when displaying the layout. I'm planning on refactoring out the virtual `Print` method (in favor of a static alternative) because this code already uses static dispatch based on the bytecode's opcode in many places, and this vtable pointer distorts how the bytecode would be laid-out in production. Throughout this exercise, I've been bouncing ideas off of @Penguinwizzard. Thanks to him for listening and reviewing my work in progress!

…BoundaryTest's isNegation, converting bools to template parameters. Merge pull request #4066 from dilijev:rebc-rm-bool-hardfail Follow up to #3915 as part of the "remove booleans" opportunity. Other work on removing the rest of the booleans is in progress but more invasive. This PR also: * removes virtual Print from `Inst` so that Debug build `Inst`s don't have a vtable pointer to disrupt the bytecode layout. * includes other changes to -RegexBytecodeDebug output to help view important aspects of bytecode layout. I don't yet have data about runtime and memory improvement with this change, but since we are removing some branches through the use of a boolean template instead of a boolean member variable and decreasing memory by removing the boolean and all of the associated padding (4 or 8 bytes, depending) any time these bytecodes are used, there should be moderate gains in both runtime and memory. BOITest / BOIHardFailTest / EOITest / EOIHardFailTest / WordBoundaryTest / NegatedWordBoundaryTest go from 0x8 to 0x4 bytes. (Opcode is 32 bit and the bool gets padded out 4 bytes.) Before/after snippet for BOITest, for `/(^token|foo)/` (running as `ch.exe -RegexDebug -RegexBytecodeDebug test.js`): Before: ``` L0010: (0x008 bytes) BOITest(hardFail: false) 0x000001F4591AC0B0[+0x000](0x004) [Inst]: 0c000000 0x000001F4591AC0B4[+0x004](0x001) [HardFailMixin]:00 L0018: (0x00c bytes) MatchLiteral(literal: "token") ``` After: ``` L0010: (0x004 bytes) BOITest(<hardFail>: false) 0x0000020E2BECC0B0[+0x000](0x004)(size:0x04)(align:0x04) [Inst]: 0d000000 L0014: (0x00c bytes) MatchLiteral(literal: "token") ``` Again for `/\b\B/`: Before: ``` L0000: (0x008 bytes) WordBoundaryTest() 0x0000023973F34100[+0x000](0x004) [Inst]: 10000000 0x0000023973F34100[+0x000](0x008) [WordBoundaryTestInst]: 10000000 00000000 L0008: (0x008 bytes) WordBoundaryTest() 0x0000023973F34108[+0x000](0x004) [Inst]: 10000000 0x0000023973F34108[+0x000](0x008) [WordBoundaryTestInst]: 10000000 01000000 L0010: (0x004 bytes) Succ() ``` After: ``` L0000: (0x004 bytes) WordBoundaryTest() 0x0000019F284AC190[+0x000](0x004)(size:0x04)(align:0x04) [Inst]: 13000000 L0004: (0x004 bytes) NegatedWordBoundaryTest() 0x0000019F284AC194[+0x000](0x004)(size:0x04)(align:0x04) [Inst]: 12000000 L0008: (0x004 bytes) Succ() ```

dilijev self-assigned this Oct 10, 2017

dilijev requested review from Penguinwizzard, akroshg, rajatd and leirocks October 10, 2017 04:37

rajatd approved these changes Oct 10, 2017

View reviewed changes

dilijev added 2 commits October 11, 2017 18:47

Add missing braces and formatting.

9e68676

Add -RegexBytecodeDebug Flag to display RegExp bytecode's binary layout

10f8086

dilijev force-pushed the re-bc branch from c2ef897 to 10f8086 Compare October 12, 2017 01:47

Fix xplat compile errors.

27a2cb8

Penguinwizzard approved these changes Oct 12, 2017

View reviewed changes

chakrabot merged commit 27a2cb8 into chakra-core:master Oct 13, 2017

dilijev deleted the re-bc branch October 13, 2017 16:44

dilijev mentioned this pull request Oct 25, 2017

RegExp Bytecode: Remove HardFailMixin and WordBoundaryTest's isNegation, converting bools to template parameters. #4066

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add debug flag -RegexBytecodeDebug and initial RegExp bytecode refactoring #3915

Add debug flag -RegexBytecodeDebug and initial RegExp bytecode refactoring #3915

dilijev commented Oct 10, 2017 •

edited

Loading

MikeHolman commented Oct 10, 2017

dilijev commented Oct 10, 2017

MikeHolman commented Oct 10, 2017

rajatd commented Oct 10, 2017

dilijev commented Oct 10, 2017

dilijev commented Oct 10, 2017 •

edited

Loading

dilijev commented Oct 12, 2017

dilijev commented Oct 12, 2017

Penguinwizzard left a comment

dilijev commented Oct 12, 2017 •

edited

Loading

dilijev commented Oct 12, 2017 •

edited

Loading

dilijev commented Oct 12, 2017 •

edited

Loading

Add debug flag -RegexBytecodeDebug and initial RegExp bytecode refactoring #3915

Add debug flag -RegexBytecodeDebug and initial RegExp bytecode refactoring #3915

Conversation

dilijev commented Oct 10, 2017 • edited Loading

Background and Future Opportunities

MikeHolman commented Oct 10, 2017

dilijev commented Oct 10, 2017

MikeHolman commented Oct 10, 2017

rajatd commented Oct 10, 2017

dilijev commented Oct 10, 2017

dilijev commented Oct 10, 2017 • edited Loading

dilijev commented Oct 12, 2017

dilijev commented Oct 12, 2017

Penguinwizzard left a comment

Choose a reason for hiding this comment

dilijev commented Oct 12, 2017 • edited Loading

dilijev commented Oct 12, 2017 • edited Loading

dilijev commented Oct 12, 2017 • edited Loading

dilijev commented Oct 10, 2017 •

edited

Loading

dilijev commented Oct 10, 2017 •

edited

Loading

dilijev commented Oct 12, 2017 •

edited

Loading

dilijev commented Oct 12, 2017 •

edited

Loading

dilijev commented Oct 12, 2017 •

edited

Loading