Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible JIT optimizations #7588

Open
4 of 22 tasks
headius opened this issue Jan 20, 2023 · 0 comments
Open
4 of 22 tasks

Possible JIT optimizations #7588

headius opened this issue Jan 20, 2023 · 0 comments

Comments

@headius
Copy link
Member

headius commented Jan 20, 2023

This is a list of optimizations I see could be done in the JIT but which require more work than just on the code that the JIT emits (i.e. specialized invokedynamic call sites or helper code).

  • Calls to block_given? currently cause the method to deoptimize, since we need the method's frame to be able to retrieve the block from RubyKernel#block_given_p. This is unnecessary when we are in a normal method scope; we can just check the passed-in block directly, avoiding the deoptimization. This requires work in IR (use BlockGivenInstr rather than call when in a method scope, possibly with a guard for user-defined block_given?) (Implemented for bare block_given? calls in Implement block_given? call as optimized instruction #8170)
  • Simplify BNEInstr forms to eliminate non-identity comparisons #8189
  • BuildCompoundStringInstr emits a lot of code for each element being appended. This could be a single indy call with all inputs (if the order of evaluation and appending is not important) or N indy calls that do all of the coersion and appending in one shot.
    • Part of JIT size and perf improvements #7589 reduces allocation and bytecode by pushing frozen strings for components, but it still uses encCrStrBufCat which has a lot of complex logic for encoding and CR negotiation that is not really needed here.
    • The bulk of this work will land with Optimizations for dynamic string building #8180 which pushes most of the string-building work into an invokedynamic call site, eliminating all static string and append code in the jitted code.
  • Keyword arguments "setCallInfo" could be rolled into the call operation itself; with indy that would eliminate the extra bytecode altogether, and without indy it could still eliminate the flags push by having specialized "setCallInfo" for different flags. This will eventually be moot once we push kwarg descriptors through all call paths. (completed in More indy call optz #7720)
  • Interpolated strings could profile their final length, allocating that length for future interpolations. This would eliminate all but the first allocation of the resulting string data. It should be designed with some safety tolerances in place, e.g. not allocating gigantic strings forever because one case did a gigantic string.
  • BuildRangeInstr could have specialized versions for embeddable literals as either begin or end, avoiding the bytecode needed to emit such values only to consume them in the Range. (Implemented for fixnum and string ranges in Simplify fixnum and string ranges #8176) (Additional tweaks also handled endless and beginless fixnum ranges. We will wait and see if any other forms are useful.)
  • Class variables are currently uncached and not structured in a way that would lend itself to caching. More needs to be done than in just bytecode, but these could potentially be cached forever since they rarely refer to multiple values from a given call site.
  • Global variables are likewise largely uncached, due to races and design issues with the current structure used to store them. True global variables could be cached nearly forever, and local global variables can be compiled to less intrusive state accesses.
  • method_missing may be poorly optimized in the indy JIT, and has only basic optimizations (caching) in non-indy JIT. Ideally it should inline any trivial Ruby method_missing target.
  • Specialized return values to reduce bytecode: for example, a method with no result could be called as void to avoid popping, or a method immediately used in a conditional could be called with a boolean return value and avoid calling isTrue. Calls guaranteed to return specific types could return those types and avoid a checkcast.
  • Core methods with multiple arities currently only publish one of those arities as a directly-callable path, with other arities forced through DynamicMethod.call. All direct-callable paths should be available to indy call sites for direct invocation.
  • Splitting of block-receiving methods and polymorphic methods, similar to TruffleRuby.
  • Java methods no longer are optimizing in invokedynamic call sites. Further, they never handled more than one arity when they did optimize. (Basic support restored in Java call optimizations #7789)
  • Restore direct indy binding of user-defined method_missing. This was removed temporarily in Fix recent regressions on master #7797 due to it breaking the argument list aggregated by a core method_missing error (which showed up during Add more testing for invokedynamic modes #7732).
  • Leaf closure scopes should never need to push a new DynamicScope. Currently this works except when any instruction that must access the parent dynamic scope itself (not its variables) appears in a closure body. For example, adding a non-local return to an otherwise leafy scope will force it to allocate and use its own DynamicScope (early return from a block is slow #5933).
  • Proper shape caching and per-object shapes. Some work has been done toward this end in Improvements to instance variable shaping #7516, with an old bug describing a need for shared shape caching in Instance vars on dup'ed classes should cache the same #156.
  • String shaping optimizations. Frozen strings already have some specialized shapes, but we could do more to cache hashcode etc inside those different shapes. We also could implement "embedded" strings that put small strings into the "header" of the object as in CRuby. See https://bugs.ruby-lang.org/issues/20415 for an example. An attempt that does not appear to work is here: https://gist.github.com/headius/b4a8967b7e3bfbc9dc7aab7d5fa491ec
  • Optimized argument forwarding with ... as described in https://bugs.ruby-lang.org/issues/20425.
  • Literal collections with literal elements could use reduced bytecode by embedding the literals into the indy call site that constructs the collection (e.g. [1, 2, 3] could be a single indy instruction with embedded longs). Similar ideas implemented for CRuby in Optimize compilation of large literal arrays ruby/ruby#9721.
  • Super methods, refined methods and sends do not inline. Super will usually be monomorphic, or low-morphic. Refined methods will usually be monomorphic, since once bound to a scope they will remain bound to that scope. Sends have a potential to become megamorphic but will frequently be used for only a few targets; even megamorphic cases could be optimized better via a dispatch chain or balanced search tree.
  • The "normal" compilers in the JIT need more testing and could be optimized better; cached lazy values could be static final and the script could construct them on initialization so they would fold.
  • Methods that access other frame data need specialized call sites. A few examples: __dir__ and attrs need access to the file (Inconsistency between MRI and JRuby source location. #8079), refined methods need access to the scope. These currently force either a frame or a dynamic scope or need a backtrace (not provided by JIT). None of them should need to trigger deopt just to pass readily-available values.
@headius headius added this to the JRuby 9.4.1.0 milestone Jan 20, 2023
@enebo enebo modified the milestones: JRuby 9.4.1.0, JRuby 9.4.2.0 Jan 31, 2023
@headius headius modified the milestones: JRuby 9.4.2.0, JRuby 9.4.3.0 Feb 28, 2023
headius added a commit to headius/jruby that referenced this issue May 12, 2023
Still only supports one arity with no varargs or block conversion.
This code seems to have broken at some point as it used the wrong
signature to check for a block argument and did exactly the
wrong thing when comparing arity of a static or instance method.

This implements the first part of Java method call optimization
as mentioned in jrubyGH-7588.
headius added a commit to headius/jruby that referenced this issue May 23, 2023
Still only supports one arity with no varargs or block conversion.
This code seems to have broken at some point as it used the wrong
signature to check for a block argument and did exactly the
wrong thing when comparing arity of a static or instance method.

This implements the first part of Java method call optimization
as mentioned in jrubyGH-7588.
@headius headius modified the milestones: JRuby 9.4.3.0, JRuby 9.4.4.0 May 24, 2023
@headius headius modified the milestones: JRuby 9.4.4.0, JRuby 9.5.0.0 Oct 9, 2023
headius added a commit to headius/jruby that referenced this issue Mar 30, 2024
Fixnum and String ranges can be reduced in complexity by doing
all of the construction in one go rather than each piece
individually. A Range of fixnum..fixnum now uses indy to pass the
long values to the bootstrap, avoiding bytecode to construct the
fixnum objects. A Range of string..string embeds the bytelist and
CR into the bootstrap parameters, for the same result. Both also
have simplified forms in the non-indy JIT mode.

From ideas list in jruby#7588.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants