Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better uop coverage in the JIT optimizer #131798

Open
brandtbucher opened this issue Mar 27, 2025 · 1 comment
Open

Better uop coverage in the JIT optimizer #131798

brandtbucher opened this issue Mar 27, 2025 · 1 comment
Assignees
Labels
3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT

Comments

@brandtbucher
Copy link
Member

brandtbucher commented Mar 27, 2025

Out of 263 total uops, 155 of these are ignored by the tier two optimizer. These represent over half of all uops by dynamic execution count.

This issue will serve as a checklist for auditing these missing uops, and adding them where they make sense. At first glance, there's quite a bit of potential here... especially around ability to narrow known output types (like _CONTAINS_OP_SET), and the ability to narrow and remove guards on input types (like _BINARY_OP_SUBSCR_LIST_INT). As I'm going through, I'll cross out anything that doesn't seem like it makes sense to add.

First, here are the 53 missing uops that each represent at least 0.1% of all uops executed:

  • _SET_IP (12.1%)
  • _CHECK_VALIDITY (10.1%)
  • _CHECK_VALIDITY_AND_SET_IP (6.5%)
  • _CHECK_PERIODIC (3.1%)
  • _MAKE_WARM (2.8%)
  • _START_EXECUTOR (1.7%)
  • _GUARD_NOS_INT (1.5%)
  • _BINARY_OP_SUBSCR_LIST_INT (1.0%)
  • _CHECK_FUNCTION (1.0%)
  • _CHECK_MANAGED_OBJECT_HAS_VALUES (0.7%)
  • _ITER_CHECK_LIST (0.7%)
  • _CONTAINS_OP_SET (0.6%)
  • _FOR_ITER_TIER_TWO (0.6%)
  • _GUARD_NOT_EXHAUSTED_LIST (0.6%)
  • _ITER_NEXT_LIST_TIER_TWO (0.6%)
  • _SAVE_RETURN_OFFSET (0.6%)
  • _CALL_LEN (0.5%)
  • _CALL_LIST_APPEND (0.5%)
  • _POP_TOP (0.5%)
  • _RESUME_CHECK (0.5%)
  • _BINARY_OP_SUBSCR_STR_INT (0.4%)
  • _GUARD_DORV_VALUES_INST_ATTR_FROM_DICT (0.4%)
  • _GUARD_KEYS_VERSION (0.4%)
  • _BINARY_OP_SUBSCR_DICT (0.3%)
  • _CALL_BUILTIN_FAST (0.3%)
  • _CHECK_STACK_SPACE_OPERAND (0.3%)
  • _GET_ITER (0.3%)
  • _STORE_SUBSCR (0.3%)
  • _GUARD_NOT_EXHAUSTED_RANGE (0.2%)
  • _BINARY_SLICE (0.2%)
  • _BUILD_LIST (0.2%)
  • _CALL_BUILTIN_O (0.2%)
  • _CALL_NON_PY_GENERAL (0.2%)
  • _CHECK_IS_NOT_PY_CALLABLE (0.2%)
  • _GUARD_NOS_FLOAT (0.2%)
  • _ITER_CHECK_RANGE (0.2%)
  • _ITER_CHECK_TUPLE (0.2%)
  • _LOAD_DEREF (0.2%)
  • _STORE_SUBSCR_LIST_INT (0.2%)
  • _BINARY_OP_EXTEND (0.1%)
  • _CALL_ISINSTANCE (0.1%)
  • _CALL_METHOD_DESCRIPTOR_FAST (0.1%)
  • _CALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS (0.1%)
  • _CALL_METHOD_DESCRIPTOR_NOARGS (0.1%)
  • _CALL_TYPE_1 (0.1%)
  • _CHECK_ATTR_CLASS (0.1%)
  • _CONTAINS_OP_DICT (0.1%)
  • _GUARD_BINARY_OP_EXTEND (0.1%)
  • _GUARD_NOT_EXHAUSTED_TUPLE (0.1%)
  • _ITER_NEXT_TUPLE (0.1%)
  • _LIST_APPEND (0.1%)
  • _STORE_ATTR_SLOT (0.1%)
  • _STORE_SUBSCR_DICT (0.1%)

And here are the 102 missing uops that are less than 0.1%. These are less important, but still may net us some wins on individual benchmarks:

  • _BINARY_OP_SUBSCR_CHECK_FUNC
  • _BINARY_OP_SUBSCR_TUPLE_INT
  • _BUILD_MAP
  • _BUILD_SET
  • _BUILD_SLICE
  • _BUILD_STRING
  • _CALL_BUILTIN_CLASS
  • _CALL_BUILTIN_FAST_WITH_KEYWORDS
  • _CALL_INTRINSIC_1
  • _CALL_INTRINSIC_2
  • _CALL_KW_NON_PY
  • _CALL_METHOD_DESCRIPTOR_O
  • _CALL_STR_1
  • _CALL_TUPLE_1
  • _CHECK_ATTR_METHOD_LAZY_DICT
  • _CHECK_EG_MATCH
  • _CHECK_EXC_MATCH
  • _CHECK_FUNCTION_VERSION_INLINE
  • _CHECK_FUNCTION_VERSION_KW
  • _CHECK_IS_NOT_PY_CALLABLE_KW
  • _CHECK_METHOD_VERSION
  • _CHECK_METHOD_VERSION_KW
  • _CHECK_PERIODIC_IF_NOT_YIELD_FROM
  • _CONVERT_VALUE
  • _COPY_FREE_VARS
  • _DELETE_ATTR
  • _DELETE_DEREF
  • _DELETE_FAST
  • _DELETE_GLOBAL
  • _DELETE_NAME
  • _DELETE_SUBSCR
  • _DEOPT
  • _DICT_MERGE
  • _DICT_UPDATE
  • _END_FOR
  • _END_SEND
  • _ERROR_POP_N
  • _EXIT_INIT_CHECK
  • _EXPAND_METHOD
  • _EXPAND_METHOD_KW
  • _FATAL_ERROR
  • _FORMAT_SIMPLE
  • _FORMAT_WITH_SPEC
  • _GET_AITER
  • _GET_ANEXT
  • _GET_AWAITABLE
  • _GET_LEN
  • _GET_YIELD_FROM_ITER
  • _GUARD_DORV_NO_DICT
  • _GUARD_GLOBALS_VERSION
  • _GUARD_TOS_FLOAT
  • _GUARD_TOS_INT
  • _GUARD_TYPE_VERSION_AND_LOCK
  • _IMPORT_FROM
  • _IMPORT_NAME
  • _IS_NONE
  • _LIST_EXTEND
  • _LOAD_ATTR_NONDESCRIPTOR_NO_DICT
  • _LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES
  • _LOAD_BUILD_CLASS
  • _LOAD_COMMON_CONSTANT
  • _LOAD_FAST_LOAD_FAST
  • _LOAD_FROM_DICT_OR_DEREF
  • _LOAD_GLOBAL
  • _LOAD_GLOBAL_BUILTINS
  • _LOAD_GLOBAL_MODULE
  • _LOAD_LOCALS
  • _LOAD_NAME
  • _LOAD_SUPER_ATTR_ATTR
  • _LOAD_SUPER_ATTR_METHOD
  • _MAKE_CALLARGS_A_TUPLE
  • _MAKE_CELL
  • _MAKE_FUNCTION
  • _MAP_ADD
  • _MATCH_CLASS
  • _MATCH_KEYS
  • _MATCH_MAPPING
  • _MATCH_SEQUENCE
  • _MAYBE_EXPAND_METHOD_KW
  • _NOP
  • _POP_EXCEPT
  • _POP_TWO_LOAD_CONST_INLINE_BORROW
  • _PUSH_EXC_INFO
  • _PUSH_NULL_CONDITIONAL
  • _SETUP_ANNOTATIONS
  • _SET_ADD
  • _SET_FUNCTION_ATTRIBUTE
  • _SET_UPDATE
  • _STORE_ATTR
  • _STORE_ATTR_INSTANCE_VALUE
  • _STORE_ATTR_WITH_HINT
  • _STORE_DEREF
  • _STORE_FAST_LOAD_FAST
  • _STORE_FAST_STORE_FAST
  • _STORE_GLOBAL
  • _STORE_NAME
  • _STORE_SLICE
  • _TIER2_RESUME_CHECK
  • _UNARY_INVERT
  • _UNARY_NEGATIVE
  • _UNPACK_SEQUENCE_LIST
  • _WITH_EXCEPT_START

Linked PRs

@brandtbucher
Copy link
Member Author

@diegorusso is going to take _CALL_LEN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT
Projects
None yet
Development

No branches or pull requests

1 participant