What happened?
There are several spots in the critical (streaming) bundle execution path that serialize execution of all threads. Importantly are:
ExecutionStateSampler.add/removeTracker
ProxyInvocationHandler.as
DoFnSignatures.getSignature
ByteBuddyDoFnInvokerFactory.getByteBuddyInvokerConstructor
A JFR recording of a typical streaming load shows this:

These functions should be optimized to not require a coarse-grained lock when executing.
Issue Priority
Priority: 2
Issue Component
Component: runner-core