-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add debug flag -RegexBytecodeDebug and initial RegExp bytecode refactoring #3915
Conversation
What does perf look like? |
@MikeHolman This is a debug-only visualization tool. No changes to perf. This refactoring introduced a slight memory regression which will disappear when I move the boolean fields into template params. |
Ah, reading is hard. Thought the optimizations you mention were part of this. |
The comments in RegexOpcodes.h - are those left over or intended to serve as pointers for future work? |
@rajatd Thanks, I meant to remove most of those comments. I'll do so. Any other TODOs left over in this patch will be indicators for future work. |
I will rebase this PR soon. Original commits can be found at dilijev:re-bc-0 in case that aids in reviewing. |
@dotnet-bot test Windows 7 ci_dev12_x64_debug please |
^ ah, we removed the legacy checks so the dev12 check doesn't exist anymore. /cc @Penguinwizzard I updated the set of required checks for master. (FYI @MSLaguana) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
SuccInst Not important enough to hold back this PR since this results in printing no additional bytes (all of NopInst, SuccInst and FailInst behave the same in this regard). I'll fix in a subsequent change. a la
Refers to: lib/Parser/RegexRuntime.cpp:1028 in 27a2cb8. [](commit_id = 27a2cb8, deletion_comment = False) |
FailInst Not important enough to hold back this PR since this results in printing no additional bytes (all of NopInst, SuccInst and FailInst behave the same in this regard). I'll fix in a subsequent change. Refers to: lib/Parser/RegexRuntime.cpp:1006 in 27a2cb8. [](commit_id = 27a2cb8, deletion_comment = False) |
Could avoid the duplication with a macro. This is not important enough to hold back this PR. If I decide to make this change I'll fix it in a subsequent PR. #WontFix Refers to: lib/Parser/RegexRuntime.cpp:2113 in 27a2cb8. [](commit_id = 27a2cb8, deletion_comment = False) |
… RegExp bytecode refactoring Merge pull request #3915 from dilijev:re-bc (Note: I'll rebase this to a single commit shortly.) `-RegexDebug` displays various views of RegExp compilation in ChakraCore. In an effort to improve the memory efficiency of the RegExp bytecode that makes up our compiled RegExp programs, this change adds a new view under the additional flag -RegexBytecodeDebug to view the bytecode binary layout. For example, for this test case: ```js let re = /[ab]c+(d|e)?/i; console.log(re.test('abcce')); // true ``` Running `ch.exe -RegexDebug` will display some output including the AST which is used to generate the bytecode. The bytecode is displayed last (in a human-readable disassembly), and looks like this: ``` REGEX PROGRAM /[ab]c+(d|e)?/ Program { source: [ab]c+(d|e)? flags: ignorecase numGroups: 2 numLoops: 0 instructions: { L0000: SyncToSetAndBackup(set: [Cc], backup: [1-inf]) L0040: SyncToSetAndConsume(set: [ABab]) L0078: ChompSet<Plus>(set: [Cc]) L00b0: TryMatchSet(set: [DEde], failLabel: L0110) L00f0: DefineGroupFixed(groupId: 1, length: 1, noNeedToSave: true) L0110: Succ() } } ``` Now you can run `ch.exe -RegexDebug -RegexBytecodeDebug` and see the following more detailed view of the disassembly which includes the actual bytecode layout. ``` REGEX PROGRAM /[ab]c+(d|e)?/ Program { source: [ab]c+(d|e)? flags: ignorecase numGroups: 2 numLoops: 0 instructions: { L0000: (0x040 bytes) SyncToNegatedSetAndBackup(set: [Cc], backup: [1-inf]) 0x000001C19403C128[+0x008](0x008) [Inst]: 2f000000 00000000 0x000001C19403C130[+0x010](0x028) [SetMixin<false>]: 00000000 00000000 00000000 00000000 08000000 08000000 00000000 00000000 00000000 00000000 0x000001C19403C158[+0x038](0x008) [BackupMixin]: 01000000 ffffffff L0040: (0x038 bytes) SyncToNegatedSetAndConsume(set: [ABab]) 0x000001C19403C168[+0x008](0x008) [Inst]: 27000000 00000000 0x000001C19403C170[+0x010](0x028) [SetMixin<false>]: 00000000 00000000 00000000 00000000 06000000 06000000 00000000 00000000 00000000 00000000 L0078: (0x038 bytes) ChompSet<Plus>(set: [Cc]) 0x000001C19403C1A0[+0x008](0x008) [Inst]: 4c000000 00000000 0x000001C19403C1A8[+0x010](0x028) [SetMixin]: 00000000 00000000 00000000 00000000 08000000 08000000 00000000 00000000 00000000 00000000 L00b0: (0x040 bytes) TryMatchSet(set: [DEde], failLabel: L0110) 0x000001C19403C1D8[+0x008](0x008) [Inst]: 58000000 00000000 0x000001C19403C1E0[+0x010](0x028) [SetMixin<false>]: 00000000 00000000 00000000 00000000 30000000 30000000 00000000 00000000 00000000 00000000 0x000001C19403C208[+0x038](0x004) [TryMixin]: 10010000 L00f0: (0x020 bytes) DefineGroupFixed(groupId: 1, length: 1, noNeedToSave: true) 0x000001C19403C218[+0x008](0x008) [Inst]: 3a000000 00000000 0x000001C19403C220[+0x010](0x004) [GroupMixin]: 01000000 0x000001C19403C224[+0x014](0x004) [FixedLengthMixin]: 01000000 0x000001C19403C228[+0x018](0x001) [NoNeedToSaveMixin]:01 L0110: (0x010 bytes) Succ() 0x000001C19403C238[+0x008](0x008) [Inst]: 02000000 00000000 0x000001C19403C230[+0x000](0x010) [NopInst]: (no unique data -- skipping) } } ``` # Background and Future Opportunities The following is some background on what you're seeing above and possible optimizations this exercise has illuminated. The bytecode is constructed by using placement-new to allocate structs representing bytecode operations into a memory buffer. Since every opcode (aka instruction "tag") value fits into a single byte (at present), the above view makes it clear how much internal fragmentation results from adding padding for small quantities like the opcode (aka instruction tag), and small quantities. For many opcodes for small regexes, many quantities also fit into a single byte or a short, instead of an int32, so there is a lot of support for the idea of adding variable-length encoding to save space. Some portions of the opcode data like SetMixin are wasteful by always including space for a pointer (when we might know from the AST that this is guaranteed to be `nullptr` before compiling the bytecode), and we could split that opcode into two separate operations to save the 8 bytes used for the pointer when it is not needed. This would also remove a branch from execution and save on runtime cost. Additionally, the bit vector representing the first 256 characters may be wasteful because 128-255 are not ASCII and characters in that range may be uncommon enough that it would be more efficient overall to encode that information in the same data structure used for other unicode code points (the data structure stored by the aforementioned sometimes-null pointer). Many opcodes encode one or more booleans which get padded out to 4 or 8 bytes (depending) for a single bit of information. Encoding that information in a template <bool> param would follow the practice of other places in the bytecode and significantly save on the size of those bytecode layouts. All of the bytecodes start at offset [+0x008] because I'm skipping over the Debug-only vtable pointer when displaying the layout. I'm planning on refactoring out the virtual `Print` method (in favor of a static alternative) because this code already uses static dispatch based on the bytecode's opcode in many places, and this vtable pointer distorts how the bytecode would be laid-out in production. Throughout this exercise, I've been bouncing ideas off of @Penguinwizzard. Thanks to him for listening and reviewing my work in progress!
…BoundaryTest's isNegation, converting bools to template parameters. Merge pull request #4066 from dilijev:rebc-rm-bool-hardfail Follow up to #3915 as part of the "remove booleans" opportunity. Other work on removing the rest of the booleans is in progress but more invasive. This PR also: * removes virtual Print from `Inst` so that Debug build `Inst`s don't have a vtable pointer to disrupt the bytecode layout. * includes other changes to -RegexBytecodeDebug output to help view important aspects of bytecode layout. I don't yet have data about runtime and memory improvement with this change, but since we are removing some branches through the use of a boolean template instead of a boolean member variable and decreasing memory by removing the boolean and all of the associated padding (4 or 8 bytes, depending) any time these bytecodes are used, there should be moderate gains in both runtime and memory. BOITest / BOIHardFailTest / EOITest / EOIHardFailTest / WordBoundaryTest / NegatedWordBoundaryTest go from 0x8 to 0x4 bytes. (Opcode is 32 bit and the bool gets padded out 4 bytes.) Before/after snippet for BOITest, for `/(^token|foo)/` (running as `ch.exe -RegexDebug -RegexBytecodeDebug test.js`): Before: ``` L0010: (0x008 bytes) BOITest(hardFail: false) 0x000001F4591AC0B0[+0x000](0x004) [Inst]: 0c000000 0x000001F4591AC0B4[+0x004](0x001) [HardFailMixin]:00 L0018: (0x00c bytes) MatchLiteral(literal: "token") ``` After: ``` L0010: (0x004 bytes) BOITest(<hardFail>: false) 0x0000020E2BECC0B0[+0x000](0x004)(size:0x04)(align:0x04) [Inst]: 0d000000 L0014: (0x00c bytes) MatchLiteral(literal: "token") ``` Again for `/\b\B/`: Before: ``` L0000: (0x008 bytes) WordBoundaryTest() 0x0000023973F34100[+0x000](0x004) [Inst]: 10000000 0x0000023973F34100[+0x000](0x008) [WordBoundaryTestInst]: 10000000 00000000 L0008: (0x008 bytes) WordBoundaryTest() 0x0000023973F34108[+0x000](0x004) [Inst]: 10000000 0x0000023973F34108[+0x000](0x008) [WordBoundaryTestInst]: 10000000 01000000 L0010: (0x004 bytes) Succ() ``` After: ``` L0000: (0x004 bytes) WordBoundaryTest() 0x0000019F284AC190[+0x000](0x004)(size:0x04)(align:0x04) [Inst]: 13000000 L0004: (0x004 bytes) NegatedWordBoundaryTest() 0x0000019F284AC194[+0x000](0x004)(size:0x04)(align:0x04) [Inst]: 12000000 L0008: (0x004 bytes) Succ() ```
(Note: I'll rebase this to a single commit shortly.)
-RegexDebug
displays various views of RegExp compilation in ChakraCore. In an effort to improve the memory efficiency of the RegExp bytecode that makes up our compiled RegExp programs, this change adds a new view under the additional flag -RegexBytecodeDebug to view the bytecode binary layout.For example, for this test case:
Running
ch.exe -RegexDebug
will display some output including the AST which is used to generate the bytecode. The bytecode is displayed last (in a human-readable disassembly), and looks like this:Now you can run
ch.exe -RegexDebug -RegexBytecodeDebug
and see the following more detailed view of the disassembly which includes the actual bytecode layout.Background and Future Opportunities
The following is some background on what you're seeing above and possible optimizations this exercise has illuminated.
The bytecode is constructed by using placement-new to allocate structs representing bytecode operations into a memory buffer. Since every opcode (aka instruction "tag") value fits into a single byte (at present), the above view makes it clear how much internal fragmentation results from adding padding for small quantities like the opcode (aka instruction tag), and small quantities.
For many opcodes for small regexes, many quantities also fit into a single byte or a short, instead of an int32, so there is a lot of support for the idea of adding variable-length encoding to save space.
Some portions of the opcode data like SetMixin are wasteful by always including space for a pointer (when we might know from the AST that this is guaranteed to be
nullptr
before compiling the bytecode), and we could split that opcode into two separate operations to save the 8 bytes used for the pointer when it is not needed. This would also remove a branch from execution and save on runtime cost.Additionally, the bit vector representing the first 256 characters may be wasteful because 128-255 are not ASCII and characters in that range may be uncommon enough that it would be more efficient overall to encode that information in the same data structure used for other unicode code points (the data structure stored by the aforementioned sometimes-null pointer).
Many opcodes encode one or more booleans which get padded out to 4 or 8 bytes (depending) for a single bit of information. Encoding that information in a template param would follow the practice of other places in the bytecode and significantly save on the size of those bytecode layouts.
All of the bytecodes start at offset [+0x008] because I'm skipping over the Debug-only vtable pointer when displaying the layout. I'm planning on refactoring out the virtual
Print
method (in favor of a static alternative) because this code already uses static dispatch based on the bytecode's opcode in many places, and this vtable pointer distorts how the bytecode would be laid-out in production.Throughout this exercise, I've been bouncing ideas off of @Penguinwizzard. Thanks to him for listening and reviewing my work in progress!