Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No active span in long running process #1065

Closed
sFreezer opened this issue Oct 26, 2020 · 14 comments
Closed

No active span in long running process #1065

sFreezer opened this issue Oct 26, 2020 · 14 comments
Milestone

Comments

@sFreezer
Copy link

Bug description

we have a long running php process with this config:

DD_TRACE_CLI_ENABLED: true
DD_TRACE_GENERATE_ROOT_SPAN: false
DD_TRACE_AUTO_FLUSH_ENABLED: true

in the formatter for the logging we like to add 'trace_id' and 'span_id':

$span = GlobalTracer::get()->getActiveSpan();
if (null !== $span) {
    $normalizedRecord["dd.trace_id"] = $span->getTraceId();
    $normalizedRecord["dd.span_id"] = $span->getSpanId();
}

but the $span is always null.

In Datadog, we can see that the spans and traces are there. I think the problem here is a bug in the communication with the Php Extension C layer and the Php process. The spans can be managed once through the extension or on PHP layer itself.
With $span = GlobalTracer::get()->getActiveSpan(); I mean only the spans are retrieved on PHP layer

PHP version

PHP 7.4.11 (cli) (built: Oct 22 2020 06:53:01) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
with ddtrace v0.48.3, Copyright Datadog, by Datadog

Installed extensions

[PHP Modules]
bcmath
Core
ctype
curl
date
ddtrace
dom
fileinfo
filter
ftp
gd
hash
iconv
json
libxml
mbstring
mongodb
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
pdo_sqlite
Phar
posix
rdkafka
readline
Reflection
session
SimpleXML
soap
sodium
SPL
sqlite3
standard
tokenizer
xml
xmlreader
xmlwriter
zip
zlib

[Zend Modules]
ddtrace

Diagnostics and configuration

Output of php --ri=ddtrace

DATADOG TRACER CONFIGURATION => {"date":"2020-10-26T16:12:33Z","os_name":"Linux consumer-787cc897dc-btmmk 4.19.112+ #1 SMP Fri Sep 4 12:00:04 PDT 2020 x86_64","os_version":"4.19.112+","version":"0.48.3","lang":"php","lang_version":"7.4.2","env":null,"enabled":true,"service":"pia","enabled_cli":true,"agent_url":"http://10.110.84.74:8126","debug":false,"analytics_enabled":false,"sample_rate":1.000000,"sampling_rules":null,"tags":null,"service_mapping":null,"distributed_tracing_enabled":true,"priority_sampling_enabled":true,"dd_version":null,"architecture":"x86_64","sapi":"cli","ddtrace.request_init_hook":"/opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php","open_basedir_configured":false,"uri_fragment_regex":null,"uri_mapping_incoming":null,"uri_mapping_outgoing":null,"auto_flush_enabled":true,"generate_root_span":false,"http_client_split_by_domain":false,"measure_compile_time":true,"report_hostname_on_root_span":false,"traced_internal_functions":null,"auto_prepend_file_configured":false,"integrations_disabled":null,"enabled_from_env":true,"opcache.file_cache":null}

                           Diagnostics

Diagnostic checks => passed

Directive => Local Value => Master Value
ddtrace.disable => Off => Off
ddtrace.request_init_hook => /opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php => /opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php

@sFreezer sFreezer added the 🐛 bug Something isn't working label Oct 26, 2020
@morrisonlevi
Copy link
Collaborator

Our documentation on connecting logs in PHP has this example using dd_trace_peek_span_id() instead of $span->getSpanId():

$span = \DDTrace\GlobalTracer::get()->getActiveSpan();
$append = sprintf(
    ' [dd.trace_id=%d dd.span_id=%d]',
    $span->getTraceId(),
    \dd_trace_peek_span_id()
);
my_error_logger('Error message.' . $append);

I realize it's inconvenient and unexpected, but it's an officially documented work-around that we'll support until the PHP <-> Extension gap is better handled, which we are working on this quarter and the next.

@labbati
Copy link
Member

labbati commented Oct 26, 2020

If I read the issue correctly, the problem is that the value returned by getActiveSpan() is null, hence we cannot call $span->getTraceId() on it.

The reason for it is that in case of long running scripts the root span is internal, while typically, for web requests and short leaved scripts, the root span is a user-land span, hence available from getActiveSpan(). Beside the internal vs userland (we should go too much into details) this issue clearly shows that we are not compatible with log injection for long running processes when the user follows our recommendation.

We can have a workaround and then I can provide mid and long term solutions.

Workaround/short-term

Instead of using automatic root span creation like I assume you (correctly) do today, e.g.

DDTrace\trace_function('function_repeted_many_times', function() {
    ....
});

You can create the root span manually, unfortunately in a more invasive way. Instead of doing what's above, you do

function function_repeted_many_times()
{
     // Add this at the begin of your function that represents the root span.
    $tracer = GlobalTracer::get();
    $scope = $tracer->startActiveSpan('operation_name');
    $span = $scope->getSpan();
    // ... set resource, name, type, etc...

    // here is your existing code

    // Then manually close the scope and flush.
   $scope->close();
   $tracer->flush();
   $tracer->reset();
}

You would do this ONLY for the root span, if you have other functions traced, you can leave them as they are, with the 'correct' automatic approach.

You might add checks to only execute our code if the tracer is installed, just to be sure, but you got the idea.

If you do this, then your root span will be populated and you can run the snippet as suggested by @morrisonlevi.

mid-term
We can store the trace id when we generate a new internal trace and expose a function \dd_trace_peek_trace_id();. Then once released this change you could remove altogether the code I suggested and simply use the snippet suggested by @morrisonlevi but with \dd_trace_peek_trace_id(); in place of $span->getTraceId();. Basically this would be the same code as you run today.

Unfortunately, I cannot commit to release it with the next release, as we have to prioritize other work.

long-term

In any case we are moving away from userland spans altogether to rely only on internal spans, so this will require no work on your side if you already enrolled in the mid-term solution.

@sFreezer
Copy link
Author

sFreezer commented Oct 27, 2020

in the docs for monolog you check for the activeSpan

$span = \DDTrace\GlobalTracer::get()->getActiveSpan();
if (null === $span) {
    return $record;
}

but i still tried it:

$activeSpan = GlobalTracer::get()->getActiveSpan();
if (null !== $activeSpan) {
    $normalizedRecord["dd.trace_id"] = $activeSpan->getTraceId();
}

$spanId = \dd_trace_peek_span_id();
if (null !== $spanId) {
    $normalizedRecord["dd.span_id"] = $spanId;
}

$activeSpan is null
$spanId is "0"

unfortunately that doesn't help

@labbati
Copy link
Member

labbati commented Oct 28, 2020

Hi @sFreezer, did you have a chance to test the manual approach?

@sFreezer
Copy link
Author

I am unfortunately not yet, I will try to test the next week and then contact you guys again

@sFreezer
Copy link
Author

sFreezer commented Nov 26, 2020

we are using dd_trace_method() to define the methods we like to trace in our long running processes.
so we can't change a thing here, the problem should be in the root span generation of dd_trace_method().

@ben-challis
Copy link

ben-challis commented Dec 2, 2020

We're also hitting this issue and is a major frustration. We're tracing most code via dd_trace_method() but require decoration in places to be able to handle cases where we need to mutate args, or conditionally start the span. F.e. a method signature of subscribe(\Closure $handler) in a context where that method never returns but will call the handler every time it receives a message from a queue, we want to wrap that closure. We'd use GlobalTracer::get()->startActiveSpan() for the handling boundary. This simply does not work at all at present without starting a user-land span as suggested (though that seems to trigger a Reporting of spans failed, status code 400: unexpected-eof in the agent, but traces report). We may be doing something wrong there, but we've copied your example line for line. The error goes away if we simply don't do any user-land tracing. SpanData::$parent_id and ignore_active_span not being available in the C API further adds frustration here, as this functionality doesn't appear to be replicable in the C - distributed tracing is one of your key selling points to us, and we've had it disabled for months at this point due to these issues.

We really need this to be in a situation where the two interop properly, though I can't really see how this'll work in a post-hook C world, and a inner-hook style user-land world? Hopefully we're misunderstanding here as we're paying a decent chunk each month here for constant headaches upgrading, amends to become re-compatible and working around different breaks either intentionally or unintentionally in functionality each time, which is increasingly costly to us.

@morrisonlevi
Copy link
Collaborator

I want to make sure I understand the issues completely:

  • The GlobalTracer API doesn't provide access to the latest trace and span ids if internal spans are used. This has been exasperated over time as we removed more and more of the legacy API.
  • Building off the previous point: when doing long-running processes, there may not be any spans available in the GlobalTracer API and so there isn't any way to fetch the current trace ID.
  • It seems like there is a bug somewhere which is causing "Reporting of spans failed, status code 400: unexpected-eof" in the agent, and this bug seems related to using GlobalTracer. Ben, can you open a separate issue for this with details on how to recreate it?

We can mitigate the first two issues. Luca proposed adding a function like dd_trace peek_trace_id to go along with dd_trace_peek_span_id. Internally created traces and spans will Just Work, and GlobalTracer made ones will work as long as any new traces call dd_trace_set_trace_id($trace_id) , which we can possibly enforce through the current API.

Did I understand everything here @sFreezer and @ben-challis?

@ben-challis
Copy link

Hi @morrisonlevi - you've understood - thanks for explaining how this solution could work, great news.

It seems like there is a bug somewhere which is causing "Reporting of spans failed, status code 400: unexpected-eof" in the agent, and this bug seems related to using GlobalTracer. Ben, can you open a separate issue for this with details on how to recreate it?

Sure thing, will raise a separate issue for this. I took some time yesterday with more trial and error and I believe this is Tracer::flush() when the tracer has no traces to flush + auto flush is enabled. If I remove the flush call it behaves. I'll get a code sample that replicates the issue.

@sFreezer
Copy link
Author

sFreezer commented Dec 4, 2020

@morrisonlevi yes you got it right,

the thing is that dd_trace_peek_span_id it not working -> getting "0" all the time link
So to only add dd_trace_peek_trace_id it not solving the problem completely.

@morrisonlevi
Copy link
Collaborator

Thanks for the response, @sFreezer. In this specific case, I think the behavior seems consistent because there wasn't an active span. Does it seem right to you that there wasn't a span there? If there should be a span, can you create a sample for reproducing it?

@sFreezer
Copy link
Author

sFreezer commented Dec 9, 2020

@morrisonlevi i don't quite understand what else you need from me, i have written all information in this ticket.

@labbati
Copy link
Member

labbati commented Jan 5, 2021

Hi @sFreezer we released 0.53.0 (and updated the docs) to expose a function that correctly returns the trace id in all the various scenarios, please follow this guide: https://docs.datadoghq.com/tracing/connect_logs_and_traces/php/#manually-inject-trace-and-span-ids

The function is \DDTrace\trace_id().

Please confirm is this worked for you.

@labbati labbati added this to the 0.53.0 milestone Jan 5, 2021
@sFreezer
Copy link
Author

sFreezer commented Jan 6, 2021

Hi, @labbati,
I checked it today and it works now.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants