Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always return unaltered VM dispatch (PHP 7) #762

Merged
merged 10 commits into from
Apr 14, 2020

Conversation

SammyK
Copy link
Contributor

@SammyK SammyK commented Feb 19, 2020

Description

This PR addresses part two of a two-part refactor of the opcode handlers. The the first part was merged in #754. This PR changes the way the tracer hooks into the engine for instrumented calls for PHP 7. PHP 5 will be addressed in a separate PR.

Before this PR, instrumented calls would be forwarded from within the custom handlers for ZEND_DO_FCALL and friends. This required the handlers to return ZEND_USER_OPCODE_LEAVE to the VM which would prevent any downstream neighboring extensions from reliably hooking the function call opcodes.

This PR will start the span and run any prehook tracking closures in the function-call handlers and immediately return ZEND_USER_OPCODE_DISPATCH to the VM to resume normal execution. The span is stopped by hooking return opcodes. Exceptions are tracked via the ZEND_HANDLE_EXCEPTION pseudo opcode. Since all the opcode handlers allow the VM to continue its expected path unaltered, downstream neighboring extensions can reliably hook the function call opcodes for every call.

Nice-Neighbor Framework Results

This change was tested on the nice-neighbor framework to ensure that the new design allowed neighboring extensions to function without adverse effects from the tracer.

dd_fcall_dump

The fake dd_fcall_dump extension prints all function-call related opcodes. This is a nice neighbor that runs any upstream neighboring handlers before running its own. If it detects a change in the VM state (if the upstream handlers return anything other than ZEND_USER_OPCODE_DISPATCH) it will not run its own handler.

Before this change, when dd_fcall_dump is loaded after ddtrace, the following output is produced when running drop_spans.phpt:

OPCODE (ZEND_DO_ICALL -> date_default_timezone_set)
OPCODE (ZEND_DO_ICALL -> dd_trace_function)
OPCODE (ZEND_DO_ICALL -> dd_trace_method)
Traced array_sum
OPCODE (ZEND_RETURN)
VM state change detected. Cannot handle OPCODE (ZEND_INIT_FCALL)
Traced array_sum
OPCODE (ZEND_RETURN)
VM state change detected. Cannot handle OPCODE (ZEND_NEW)
Traced DateTime
OPCODE (ZEND_DO_FCALL -> DateTime::format)
OPCODE (ZEND_RETURN)
VM state change detected. Cannot handle OPCODE (ZEND_FREE)
Traced DateTime
OPCODE (ZEND_DO_FCALL -> DateTime::format)
OPCODE (ZEND_RETURN)
VM state change detected. Cannot handle OPCODE (ZEND_FREE)
OPCODE (ZEND_DO_ICALL -> dd_trace_serialize_closed_spans)
OPCODE (ZEND_DO_ICALL -> array_map)
DateTime: 2000-01-01
OPCODE (ZEND_RETURN)
ArraySum: 6
OPCODE (ZEND_RETURN)

The "VM state change detected" messages are from dd_fcall_dump. This reveals two fundamental issues.

  1. For instrumented calls, ddtrace is altering the VM state (returning ZEND_USER_OPCODE_LEAVE from the opcode handler), and dd_fcall_dump is unable to reliably hook those opcodes.
  2. By design, extensions that alter the VM state and return ZEND_USER_OPCODE_LEAVE must manually increment the opline EX(opline)++. However, neighboring extension's opcode handlers are still run for the original opcode with the execute_data mutated. For example, in the message above "VM state change detected. Cannot handle OPCODE (ZEND_INIT_FCALL)", that message was generated by the ZEND_DO_ICALL opcode handler. The dd_fcall_dump extension does not hook ZEND_INIT_FCALL, but since ddtrace mutated the VM state, the ZEND_INIT_FCALL was being passed to its ZEND_DO_ICALL opcode handler.

After this change, the output reflects ddtrace being a nice neighbor by not altering the VM state and allowing dd_fcall_dump to successfully hook the intended opcodes with the proper handlers.

OPCODE (ZEND_DO_FCALL -> date_default_timezone_set)
OPCODE (ZEND_DO_FCALL -> dd_trace_function)
OPCODE (ZEND_DO_FCALL -> dd_trace_method)
OPCODE (ZEND_DO_FCALL -> array_sum)
Traced array_sum
OPCODE (ZEND_RETURN)
OPCODE (ZEND_DO_FCALL -> array_sum)
Traced array_sum
OPCODE (ZEND_RETURN)
OPCODE (ZEND_DO_FCALL -> DateTime::__construct)
Traced DateTime
OPCODE (ZEND_DO_FCALL -> DateTime::format)
OPCODE (ZEND_RETURN)
OPCODE (ZEND_DO_FCALL -> DateTime::__construct)
Traced DateTime
OPCODE (ZEND_DO_FCALL -> DateTime::format)
OPCODE (ZEND_RETURN)
OPCODE (ZEND_DO_FCALL -> dd_trace_serialize_closed_spans)
OPCODE (ZEND_DO_FCALL -> array_map)
DateTime: 2000-01-01
OPCODE (ZEND_RETURN)
ArraySum: 6
OPCODE (ZEND_RETURN)

You might also have noticed that all the ZEND_DO_ICALL opcodes are now being emitted as ZEND_DO_FCALL. This is a direct result of the new design overriding zend_execute_internal which changes the opcode that the compiler emits.

This design does not hook zend_execute_ex in order to avoid the stack limitations that that override introduces. This means userland functions will continue to emit ZEND_DO_UCALL as illustrated by running exit_and_drop_span.phpt.

OPCODE (ZEND_DO_FCALL -> dd_trace_function)
OPCODE (ZEND_DO_UCALL -> foo)
foo()
Dropping span
OPCODE (ZEND_RETURN)
OPCODE (ZEND_EXIT)

Side Effects

Argument Mutation

The primary side effect of this new design relates to the mutability of arguments within an instrumented call. With this change, the tracing closure has access to the same arguments passed to the original call. If the original call mutates an argument, the posthook tracing closure will receive the mutated argument. This is the expected behavior of arguments in PHP 7.

The best way to illustrate this is with an example:

function foo($a) {
    var_dump(func_get_args());
    $a = 'Dogs';
    var_dump(func_get_args());
}

foo('Cats');

On PHP 7, this outputs:

array(1) {
  [0]=>
  string(4) "Cats"
}
array(1) {
  [0]=>
  string(4) "Dogs"
}

This illustrates that arguments in PHP 7 are mutable in a sense. If a posthook tracing closure were to access argument $a, it would be set to "Dogs". Before this change, the tracing closure would receive $a in its state before mutation and would be set to "Cats". If an argument needs to be accessed before mutation, the tracing closure can be run as a prehook to access the arguments before the original call.

So given:

function foo($a) {
    $a = 'Dogs';
}

Posthook example:

dd_trace_function('foo', function ($span, array $args) {
    var_dump($args[0]);
});

foo('Cats');

// string(4) "Dogs"

Prehook example:

dd_trace_function('foo', [
    'prehook' => function ($span, array $args) {
        var_dump($args[0]);
    }
]);

foo('Cats');

// string(4) "Cats"

A few integrations have been changed to prehooks on PHP 7 to circumvent argument mutation. The Symfony integration was changed preceding this PR: #786.

Generators

Another side effect of the new design impacts function calls that return a generator with yield and yield from. A current limitation of the existing design is that generators are only traced until the first yield. Subsequent yield's in an iteration are not instrumented. The following code illustrates this limitation:

function getResults() {
    yield 1;
    yield 2;
    yield 3;
}

dd_trace_function('getResults', function() {});

$generator = getResults();
foreach ($generator as $value) {
    echo $value . PHP_EOL;
}

// Instruments getResults() one time, not three times

The same limitation exists in this new design, but now the return value is passed to the tracing closure differently. Before this change, the return value is provided to the tracing closure as an instance of Generator. With the new design, the return values are passed to the tracing closure directly from the opline (before the ZEND_YIELD opcode has run.) Therefore the return value provided to the tracing closure is now the actual value provided by the yield statement, not an instance of Generator. The following code demonstrates this behavior change:

function getResults() {
    yield 1;
    yield 2;
    yield 3;
}

dd_trace_function('getResults', function($s, $a, $retval) {
    var_dump($retval);
});

$generator = getResults();
foreach ($generator as $value) {
    echo $value . PHP_EOL;
}

Existing behavior output:

object(Generator)#3 (0) {
}
1
2
3

New behavior output:

int(1)
1
2
3

Readiness checklist

  • (only for Members) Changelog has been added to the appropriate release draft. Create one if necessary.
  • Tests added for this feature/bug.

Reviewer checklist

  • Appropriate labels assigned.
  • Milestone is set.
  • Changelog has been added to the appropriate release draft. For community contributors the reviewer is in charge of this task.

@SammyK SammyK added 🍏 core Changes to the core tracing functionality c-extension Apply this label to issues and prs related to the C-extension 📦 Sandbox API labels Feb 19, 2020
@SammyK SammyK force-pushed the sammyk/build/unaltered-vm-dispatch branch from 9ab5f7f to ff0e81b Compare February 19, 2020 18:36
@SammyK SammyK force-pushed the sammyk/build/unaltered-vm-dispatch branch from ff0e81b to 75b7660 Compare February 19, 2020 22:31
@SammyK SammyK force-pushed the sammyk/build/unaltered-vm-dispatch branch 3 times, most recently from 69fb1b1 to 64ea72a Compare February 20, 2020 19:38
Comment on lines 93 to 118
if (EXPECTED(Z_TYPE_INFO_P(q) != IS_UNDEF)) {
ZVAL_DEREF(q);
if (Z_OPT_REFCOUNTED_P(q)) {
Z_ADDREF_P(q);
}
} else {
q = &EG(uninitialized_zval);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably EG(uninitialized_zval) is IS_UNDEF; is that right? If so, I think this can be reduced.

Suggested change
if (EXPECTED(Z_TYPE_INFO_P(q) != IS_UNDEF)) {
ZVAL_DEREF(q);
if (Z_OPT_REFCOUNTED_P(q)) {
Z_ADDREF_P(q);
}
} else {
q = &EG(uninitialized_zval);
}
ZVAL_DEREF(q);
Z_TRY_ADDREF_P(q);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strangely enough, EG(uninitialized_zval) is initialized to null. I think that's for BC.

src/ext/php7/engine_hooks.c Show resolved Hide resolved
@SammyK SammyK mentioned this pull request Mar 6, 2020
5 tasks
@SammyK SammyK force-pushed the sammyk/build/unaltered-vm-dispatch branch from 95cb9d1 to b22d6a1 Compare March 9, 2020 15:41
@SammyK SammyK changed the base branch from master to sammyk/symfony-refactor March 9, 2020 15:41
@SammyK SammyK force-pushed the sammyk/build/unaltered-vm-dispatch branch 3 times, most recently from 301112c to 2dc9f84 Compare March 12, 2020 17:13
@SammyK SammyK changed the base branch from sammyk/symfony-refactor to master March 12, 2020 17:14
@SammyK SammyK force-pushed the sammyk/build/unaltered-vm-dispatch branch 7 times, most recently from 4054424 to 8b03b61 Compare March 25, 2020 21:25
@SammyK SammyK force-pushed the sammyk/build/unaltered-vm-dispatch branch 3 times, most recently from f4b254f to e9cae80 Compare March 27, 2020 21:31
@labbati labbati modified the milestones: 0.43.0, 0.44.0 Apr 7, 2020
@SammyK SammyK force-pushed the sammyk/build/unaltered-vm-dispatch branch 4 times, most recently from fd771bc to 6c795fe Compare April 8, 2020 16:02
@SammyK SammyK marked this pull request as ready for review April 9, 2020 13:57
@morrisonlevi
Copy link
Collaborator

It's going to take a while to comb over. In the meantime, can you think of a way to instrument ZEND_DO_ICALL instead of using zend_instrument_internal without the downsides of a noisy neighbor?

@SammyK
Copy link
Contributor Author

SammyK commented Apr 9, 2020

Unfortunately not. Internal function's don't emit a ZEND_RETURN so there is no way to hook the end of the function call to close the span. In a future PR, internal functions will have their own custom handlers but for now we have to hook zend_execute_internal to know when the span ends.

Copy link
Collaborator

@morrisonlevi morrisonlevi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good! I've recommended few small style changes so that delegating to previous handler doesn't get repeated quite so much.

src/ext/php7/engine_hooks.c Outdated Show resolved Hide resolved
src/ext/php7/engine_hooks.c Outdated Show resolved Hide resolved
src/ext/php7/engine_hooks.c Outdated Show resolved Hide resolved
src/ext/php7/engine_hooks.c Outdated Show resolved Hide resolved
src/ext/php7/engine_hooks.c Show resolved Hide resolved
@SammyK SammyK force-pushed the sammyk/build/unaltered-vm-dispatch branch 2 times, most recently from db8325e to c6ff13d Compare April 14, 2020 17:37
@SammyK SammyK force-pushed the sammyk/build/unaltered-vm-dispatch branch from c6ff13d to 8b4b8c2 Compare April 14, 2020 18:18
@SammyK SammyK merged commit b46b9b6 into master Apr 14, 2020
@SammyK SammyK deleted the sammyk/build/unaltered-vm-dispatch branch April 14, 2020 19:37
@SammyK SammyK modified the milestones: 0.44.0, 0.43.0 Apr 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c-extension Apply this label to issues and prs related to the C-extension 🍏 core Changes to the core tracing functionality 🏆 enhancement A new feature or improvement 📦 Sandbox API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants