-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: handle text types in _set_str_tag #2218
Conversation
tests/tracer/test_span.py
Outdated
def test_span_bad_unicode_set_tag(): | ||
span = Span(None, None) | ||
span.set_tag("key", "🤔") | ||
span._set_str_tag("key", "🤔") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be able to handle this case in Python 2. Right now stringify
is just unicode
in Python 2 which defaults to interpreting the input string as ascii encoded. If we did unicode(s, "utf-8")
we'd still be able to encode strings like this.
➕ for still doing the error handling as the data might not necessarily be utf-8 or ascii (as seen in the flask case recently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that we'll push invalid data without having a way for the users to be noticed unless they dig into their logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right @jd this would require someone to notice it in their logs. I don't think we have yet a better behavior to offer yet since this would be an error that we'd cause while tracing rather than at startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I understand it is that they are 2 cases:
- The user triggers this problem: they did a programming error and need to fix it because they're sending the wrong data type to ddtrace.
- We have a bug in an integration that needs to be fixed.
You have two choices:
a. Ignore (catch the exception and ignore/log it)
b. Fix it
This is picking a
for 1 and 2 IIUC.
For 1. I don't think choosing for the user is a good experience. Once you pick a. Ignore
there is no way for them to decide anything else. As a user, I dislike libraries hiding my bugs.
For 2. As a ddtrace maintainer, I'm not sure hiding integration issues in user's log is going to help us improve the user experience and quality of our integration
Thoughts?
ddtrace/span.py
Outdated
try: | ||
self.meta[key] = stringify(value) | ||
except UnicodeDecodeError: | ||
log.warning("error setting tag %s, ignoring it", key, exc_info=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm this is again hiding error under the carpet, I'm not sure it's a good idea.
Could you provide more context about the motivation behind this change?
tests/tracer/test_span.py
Outdated
def test_span_bad_unicode_set_tag(): | ||
span = Span(None, None) | ||
span.set_tag("key", "🤔") | ||
span._set_str_tag("key", "🤔") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that we'll push invalid data without having a way for the users to be noticed unless they dig into their logs?
a8b2110
to
03aa7d6
Compare
Codecov Report
@@ Coverage Diff @@
## master #2218 +/- ##
==========================================
+ Coverage 91.53% 91.58% +0.05%
==========================================
Files 570 570
Lines 40827 40834 +7
==========================================
+ Hits 37371 37398 +27
+ Misses 3456 3436 -20
Continue to review full report at Codecov.
|
03ecbf6
to
2b68a35
Compare
2b68a35
to
21b51ad
Compare
3929d75
to
1c03804
Compare
span = Span(None, None) | ||
span.set_tag("key", "🤔") | ||
span._set_str_tag("key_str", "🤔") | ||
# only span.set_tag() will fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason not to use ensure_text
also in set_tag
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could though in that case we'd need to first get the string representation of the value object if its not a six.binary_type
or six.text_type
. I wanted to keep this PR to just _set_str_tag
but if this works for this case (which is broken) I'd do the same for set_tag
(which is less broken but still doesn't properly handle these binary and byte string cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually confused why set_tag
would fail and _set_str_tag
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "fail" I mean that an exception will be raised and handled in the case of set_tag
:
>>> six.text_type("🤔")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)
>>> six.ensure_text("🤔")
u'\U0001f914'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah. I guess we really should be using ensure_text
in set_tag
. Using text_type
sounds really like a bad idea. 😢
ddtrace/span.py
Outdated
@@ -298,7 +299,10 @@ def set_tag(self, key, value=None): | |||
|
|||
def _set_str_tag(self, key, value): | |||
# type: (str, str) -> None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update type annotation
Also worth replacing |
ddtrace/span.py
Outdated
@@ -298,7 +299,10 @@ def set_tag(self, key, value=None): | |||
|
|||
def _set_str_tag(self, key, value): | |||
# type: (str, str) -> None | |||
self.meta[key] = stringify(value) | |||
try: | |||
self.meta[key] = ensure_text(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use errors="replace"
so we don't fail for other non-unicode codecs eg #2155
1c03804
to
48a7943
Compare
ddtrace/span.py
Outdated
# type: (Text, AnyStr) -> None | ||
try: | ||
self.meta[key] = ensure_text(value, errors="replace") | ||
except TypeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should never get a TypeError
there right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test for the case where this is called with something other than AnyStr
, e.g. dict(a=1)
. But I admit this might be more defensive than we ought to be in an internal function which we are type annotating.
48a7943
to
0ea5111
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a reminder to wrap retries
in str
in the urllib3
integration when the retries
is passed to trace_utils
0ea5111
to
cb2b4c8
Compare
…race-py into tag-unicode-decode-error
@@ -310,7 +312,7 @@ def set_tag(self, key, value=None): | |||
|
|||
def _set_str_tag(self, key, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add to the docstring that errors in decoding will be replaced with ? as per https://docs.python.org/3/library/codecs.html#codecs.replace_errors
tests/tracer/test_span.py
Outdated
assert span.meta["key_str"] == u"🤔" | ||
|
||
|
||
@pytest.mark.skipif(sys.version_info.major >= 2, reason="This test does not apply to Python 2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case will never run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either the reason statement is wrong or the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect it was meant to be <= 2
e31138e
to
66e6d41
Compare
Looks like a bunch of grpc tests are failing due to an encoding issue 🤔 |
c7f19ef
…race-py into tag-unicode-decode-error
c7f19ef
to
4d44299
Compare
@@ -102,7 +103,7 @@ def _handle_error(span, response_error, status_code): | |||
exc_val = to_unicode(response_error.details()) | |||
span._set_str_tag(errors.ERROR_MSG, exc_val) | |||
span._set_str_tag(errors.ERROR_TYPE, status_code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we stringify
here too? Could the status code be an int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also we should check all the usages of _set_str_tag to be sure that there aren't any other possibilities like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status_code
has already been coerced into unicode in this integration. Other integrations using _set_str_tag
were already updated in this PR.
@Mergifyio backport 0.48 |
* chore: correct typing hints for span.meta * fix: handle text types in _set_str_tag * chore: correct typing hints for span.meta * fix: handle text types in _set_str_tag * fix: handle text types in _set_str_tag * flake8 * add doc string, fix test * fix grpc traceback tag Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com> Co-authored-by: Gabriele N. Tornetta <P403n1x87@users.noreply.github.com> (cherry picked from commit c19c334) # Conflicts: # ddtrace/contrib/trace_utils.py # ddtrace/span.py
Command
|
* fix typo in filter debug line (#2213) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * feat: support space separated DD_TAGS value (#2196) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add sync mode to AgentWriter (#2206) * feat(agent): support aws lambda extension * chore: fix flake8 issue with test * chore: add release note for datadog lambda extension * chore: fix import order * chore: fix flake8 issue * [serverless] fix broken test * remove unused import * separate out sync writer changes * add integration tests for sync writer Co-authored-by: DarcyRaynerDD <darcy.rayner@datadoghq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * ci: run django test suite with ddtrace enabled (#2203) * Run django test suite with ddtrace enabled * Added pylibmc install to fix build error * Added django/ddtrace install commands to separate blocks * Fixed annotation * Installed generic version of ddtrace instead of checking out * Added documentation * Changed working directory, removed inherently failing test * Reverted working directory change * Reverted skipping specific test, added pythonpath * Added more docs Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * ci: run flask test suite with ddtrace enabled (#2190) * ci: run flask test suite with ddtrace enabled * Update .github/workflows/flask.yml Co-authored-by: Brett Langdon <me@brett.is> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com> Co-authored-by: Brett Langdon <me@brett.is> Co-authored-by: Kyle Verhoog <kyle@verhoog.ca> * docs: add note about pip version requirement (#2220) * handle logwriter configure (#2219) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * test: move __init__ to utils (#2217) Having code in tests/__init__ makes ddtrace being imported before we have a chance to do anything, such as calling gevent.monkey.patch_all. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * refactor: distributed_tracing_enabled usages to trace_utils helper (#2216) * Modified trace util helper to account for inconsistencies in distributed_tracing config key, refactored usages * Refactored traceutils helper * Made int_config optional in the trace_util helper, updated affected integration files * Removed erroneous code addition to pyramid/trace.py * Make tracer tests less noisy (#2221) monkeypatch.setenv warns if the value given is not an instance of the base string type. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * refactor(worker): replace regular thread by PeriodicThread (#2205) Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com> * Disable known failing Django tests (#2224) * Disable known failing Django tests * fix yaml multiline string Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * test(botocore): add test case for firehose args (#2227) This change adds a test case for #2204 to ensure that the params.Records argument is excluded from the span tags. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * refactor: move ddtrace.profiling._attr to ddtrace.utils (#2230) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adding urllib3 integration (#1148) * draft of urllib3 integration * black formatting * set request tags on the span before the call is made in case of errors * rename span name and support retry tags based on CR feedback * add tests and fix a few small bugs * fix flake8 * add tests and circleci config * flake8 fixes * Update ddtrace/contrib/urllib3/__init__.py Co-Authored-By: Tahir H. Butt <tahir.butt@datadoghq.com> * use dd compat for urlparse and unparse * use SpanType constants for type of span use SpanTypes.HTTP * fix tests * fix flake8 from merge conflict resolution * adapt to latest changes on master * adopt latest standards for integrations Use Pin to add patch information to urllib3 requests. Also use riot for configuring the test environment * apply checklist points * wip: make tests pass * set tags on exception also fixed the tests to override HTTP configuration where required * docs(urllib3): add urllib3 integration docs * fix service name handling * fix copy-pasta errors and improve documentation * Update ddtrace/contrib/urllib3/__init__.py Co-authored-by: Kyle Verhoog <kyle@verhoog.ca> * Update riotfile.py Co-authored-by: Kyle Verhoog <kyle@verhoog.ca> * remove unused imports from test sources * use the new get_argument_value utility * Update tests/contrib/urllib3/test_urllib3.py Co-authored-by: Kyle Verhoog <kyle@verhoog.ca> * Update ddtrace/contrib/urllib3/patch.py Co-authored-by: Kyle Verhoog <kyle@verhoog.ca> * commit Tahir's suggestions * fix TracerTestCase import Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com> Co-authored-by: Gabriele N. Tornetta <gabriele.tornetta@datadoghq.com> Co-authored-by: Gabriele N. Tornetta <P403n1x87@users.noreply.github.com> Co-authored-by: Kyle Verhoog <kyle@verhoog.ca> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com> * Revert writer.write argument change (#2232) This is a breaking change which we're going to make at a later date. * Make processor private on Tracer (#2233) This isn't something we want to make public yet. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix(profiling): make sure PeriodicCollector start withing service lock (#2214) This makes sure the periodic collectors are started while the service lock is held to avoid race conditions while starting the services. This has no impact in practice since the Profiler class was made thread safe with #2179. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * refactor(forksafe): simplify code (#2222) This simplifies the code to use a banal decorator approach rather than a weird lambda-passed-as-argument-based 😉 It also adds typing and an `unregister` function to allow for… unregistering a function! Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * chore: correct typing hints for span.meta (#2237) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lambda extension support (#2092) * feat(agent): support aws lambda extension * chore: fix flake8 issue with test * chore: add release note for datadog lambda extension * chore: fix import order * chore: fix flake8 issue * [serverless] fix broken test * remove unused import * Add docs, type hints Co-authored-by: Kyle Verhoog <kyle.verhoog@datadoghq.com> Co-authored-by: Kyle Verhoog <kyle@verhoog.ca> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com> * fix(span): ensure ID args have the correct type (#2243) Among all the arguments of the Span constructor, the ones bearing IDs are the most important type-wise. This change adds checks to ensure that span_id, trace_id and parent_id all have the expected type. The rationale for adding the check at construction time is so that this error could be caught as soon as possible, rather than defer it to later processing stages, thus avoiding invalid entries in the processing queue. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix: only warn if using uwsgi with ddtrace-run (#2202) * fix: only warn if using uwsgi with ddtrace-run * remove DeprecationWarning prefix * update message Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com> * Update ISSUE_TEMPLATE.md (#2248) * Use multiproject pipeline to deploy to reliability env (#2249) * Use multiproject pipeline for reliability env deploy * Use default branch Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * style(pymongo): format pymongo (#2250) * fix: handle text types in _set_str_tag (#2218) * chore: correct typing hints for span.meta * fix: handle text types in _set_str_tag * chore: correct typing hints for span.meta * fix: handle text types in _set_str_tag * fix: handle text types in _set_str_tag * flake8 * add doc string, fix test * fix grpc traceback tag Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com> Co-authored-by: Gabriele N. Tornetta <P403n1x87@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com> Co-authored-by: DarcyRaynerDD <darcy.rayner@datadoghq.com> Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com> Co-authored-by: Julien Danjou <julien.danjou@datadoghq.com> Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com> Co-authored-by: Brett Langdon <me@brett.is> Co-authored-by: Gabriele N. Tornetta <P403n1x87@users.noreply.github.com> Co-authored-by: Justin <justin.james.iso@gmail.com> Co-authored-by: Gabriele N. Tornetta <gabriele.tornetta@datadoghq.com> Co-authored-by: Darcy Rayner <50632605+DarcyRaynerDD@users.noreply.github.com> Co-authored-by: Laplie Anderson <randomanderson@users.noreply.github.com>
The addition of
_set_str_tag
in #1804 exposed cases in Python 2 where tag values were being set that were not of typeunicode
which was resulting in raising aUnicodeDecodingError
exception. We handle this better by making use ofsix.ensure_text
in_set_str_tag
as well as still catching the exception raised from it if the value is of a type that is not coercible. But this requires we ensure that the function is not called with types other than those that can be coerced withensure_text
, which means changing some of its uses in our integrations where we were assuming that we could passstringify-able
arguments.