Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle text types in _set_str_tag #2218

Merged
merged 16 commits into from
Mar 30, 2021

Conversation

majorgreys
Copy link
Contributor

@majorgreys majorgreys commented Mar 23, 2021

The addition of _set_str_tag in #1804 exposed cases in Python 2 where tag values were being set that were not of type unicode which was resulting in raising a UnicodeDecodingError exception. We handle this better by making use of six.ensure_text in _set_str_tag as well as still catching the exception raised from it if the value is of a type that is not coercible. But this requires we ensure that the function is not called with types other than those that can be coerced with ensure_text, which means changing some of its uses in our integrations where we were assuming that we could pass stringify-able arguments.

@majorgreys majorgreys requested a review from a team as a code owner March 23, 2021 21:59
def test_span_bad_unicode_set_tag():
span = Span(None, None)
span.set_tag("key", "🤔")
span._set_str_tag("key", "🤔")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to handle this case in Python 2. Right now stringify is just unicode in Python 2 which defaults to interpreting the input string as ascii encoded. If we did unicode(s, "utf-8") we'd still be able to encode strings like this.

➕ for still doing the error handling as the data might not necessarily be utf-8 or ascii (as seen in the flask case recently).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that we'll push invalid data without having a way for the users to be noticed unless they dig into their logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right @jd this would require someone to notice it in their logs. I don't think we have yet a better behavior to offer yet since this would be an error that we'd cause while tracing rather than at startup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I understand it is that they are 2 cases:

  1. The user triggers this problem: they did a programming error and need to fix it because they're sending the wrong data type to ddtrace.
  2. We have a bug in an integration that needs to be fixed.

You have two choices:
a. Ignore (catch the exception and ignore/log it)
b. Fix it

This is picking a for 1 and 2 IIUC.

For 1. I don't think choosing for the user is a good experience. Once you pick a. Ignore there is no way for them to decide anything else. As a user, I dislike libraries hiding my bugs.

For 2. As a ddtrace maintainer, I'm not sure hiding integration issues in user's log is going to help us improve the user experience and quality of our integration

Thoughts?

ddtrace/span.py Outdated
Comment on lines 301 to 307
try:
self.meta[key] = stringify(value)
except UnicodeDecodeError:
log.warning("error setting tag %s, ignoring it", key, exc_info=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm this is again hiding error under the carpet, I'm not sure it's a good idea.

Could you provide more context about the motivation behind this change?

def test_span_bad_unicode_set_tag():
span = Span(None, None)
span.set_tag("key", "🤔")
span._set_str_tag("key", "🤔")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that we'll push invalid data without having a way for the users to be noticed unless they dig into their logs?

@codecov-io
Copy link

Codecov Report

Merging #2218 (03aa7d6) into master (081c25b) will increase coverage by 0.05%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2218      +/-   ##
==========================================
+ Coverage   91.53%   91.58%   +0.05%     
==========================================
  Files         570      570              
  Lines       40827    40834       +7     
==========================================
+ Hits        37371    37398      +27     
+ Misses       3456     3436      -20     
Impacted Files Coverage Δ
ddtrace/compat.py 93.47% <ø> (ø)
ddtrace/span.py 90.80% <50.00%> (-0.70%) ⬇️
tests/tracer/test_span.py 98.54% <100.00%> (+0.01%) ⬆️
ddtrace/profiling/exporter/http.py 98.91% <0.00%> (-1.09%) ⬇️
tests/commands/test_runner.py 10.84% <0.00%> (+10.84%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 081c25b...03aa7d6. Read the comment docs.

@majorgreys majorgreys force-pushed the tag-unicode-decode-error branch 2 times, most recently from 03ecbf6 to 2b68a35 Compare March 25, 2021 00:53
@majorgreys majorgreys changed the title fix: ignore if UnicodeDecodeError with internal _set_str_tag fix: handle PY2 text types in _set_str_tag Mar 25, 2021
@majorgreys majorgreys changed the title fix: handle PY2 text types in _set_str_tag fix: handle text types in _set_str_tag Mar 25, 2021
@majorgreys majorgreys added the changelog/no-changelog A changelog entry is not required for this PR. label Mar 25, 2021
@majorgreys majorgreys force-pushed the tag-unicode-decode-error branch 4 times, most recently from 3929d75 to 1c03804 Compare March 25, 2021 02:48
span = Span(None, None)
span.set_tag("key", "🤔")
span._set_str_tag("key_str", "🤔")
# only span.set_tag() will fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason not to use ensure_text also in set_tag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could though in that case we'd need to first get the string representation of the value object if its not a six.binary_type or six.text_type. I wanted to keep this PR to just _set_str_tag but if this works for this case (which is broken) I'd do the same for set_tag (which is less broken but still doesn't properly handle these binary and byte string cases).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually confused why set_tag would fail and _set_str_tag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "fail" I mean that an exception will be raised and handled in the case of set_tag:

>>> six.text_type("🤔")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)
>>> six.ensure_text("🤔")
u'\U0001f914'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah. I guess we really should be using ensure_text in set_tag. Using text_type sounds really like a bad idea. 😢

ddtrace/span.py Outdated
@@ -298,7 +299,10 @@ def set_tag(self, key, value=None):

def _set_str_tag(self, key, value):
# type: (str, str) -> None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update type annotation

@majorgreys
Copy link
Contributor Author

Also worth replacing ddtrace.compat.to_unicode with ensure_text.

ddtrace/span.py Outdated
@@ -298,7 +299,10 @@ def set_tag(self, key, value=None):

def _set_str_tag(self, key, value):
# type: (str, str) -> None
self.meta[key] = stringify(value)
try:
self.meta[key] = ensure_text(value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use errors="replace" so we don't fail for other non-unicode codecs eg #2155

@majorgreys majorgreys added this to the 0.48.0 milestone Mar 25, 2021
ddtrace/span.py Outdated
# type: (Text, AnyStr) -> None
try:
self.meta[key] = ensure_text(value, errors="replace")
except TypeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should never get a TypeError there right?

Copy link
Contributor Author

@majorgreys majorgreys Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for the case where this is called with something other than AnyStr, e.g. dict(a=1). But I admit this might be more defensive than we ought to be in an internal function which we are type annotating.

Copy link
Contributor

@P403n1x87 P403n1x87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a reminder to wrap retries in str in the urllib3 integration when the retries is passed to trace_utils

P403n1x87
P403n1x87 previously approved these changes Mar 25, 2021
@@ -310,7 +312,7 @@ def set_tag(self, key, value=None):

def _set_str_tag(self, key, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add to the docstring that errors in decoding will be replaced with ? as per https://docs.python.org/3/library/codecs.html#codecs.replace_errors

assert span.meta["key_str"] == u"🤔"


@pytest.mark.skipif(sys.version_info.major >= 2, reason="This test does not apply to Python 2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case will never run?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either the reason statement is wrong or the condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it was meant to be <= 2

P403n1x87
P403n1x87 previously approved these changes Mar 29, 2021
Kyle-Verhoog
Kyle-Verhoog previously approved these changes Mar 29, 2021
jd
jd previously approved these changes Mar 30, 2021
@Kyle-Verhoog
Copy link
Member

Looks like a bunch of grpc tests are failing due to an encoding issue 🤔

@majorgreys majorgreys dismissed stale reviews from jd, Kyle-Verhoog, and P403n1x87 via c7f19ef March 30, 2021 16:30
@@ -102,7 +103,7 @@ def _handle_error(span, response_error, status_code):
exc_val = to_unicode(response_error.details())
span._set_str_tag(errors.ERROR_MSG, exc_val)
span._set_str_tag(errors.ERROR_TYPE, status_code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we stringify here too? Could the status code be an int?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also we should check all the usages of _set_str_tag to be sure that there aren't any other possibilities like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status_code has already been coerced into unicode in this integration. Other integrations using _set_str_tag were already updated in this PR.

@mergify mergify bot merged commit c19c334 into DataDog:master Mar 30, 2021
@majorgreys majorgreys deleted the tag-unicode-decode-error branch March 30, 2021 17:50
@Kyle-Verhoog
Copy link
Member

@Mergifyio backport 0.48

mergify bot pushed a commit that referenced this pull request Apr 1, 2021
* chore: correct typing hints for span.meta

* fix: handle text types in _set_str_tag

* chore: correct typing hints for span.meta

* fix: handle text types in _set_str_tag

* fix: handle text types in _set_str_tag

* flake8

* add doc string, fix test

* fix grpc traceback tag

Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
Co-authored-by: Gabriele N. Tornetta <P403n1x87@users.noreply.github.com>
(cherry picked from commit c19c334)

# Conflicts:
#	ddtrace/contrib/trace_utils.py
#	ddtrace/span.py
@mergify
Copy link
Contributor

mergify bot commented Apr 1, 2021

Command backport 0.48: success

Backports have been created

Kyle-Verhoog added a commit that referenced this pull request Apr 1, 2021
* fix typo in filter debug line (#2213)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* feat: support space separated DD_TAGS value (#2196)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Add sync mode to AgentWriter (#2206)

* feat(agent): support aws lambda extension

* chore: fix flake8 issue with test

* chore: add release note for datadog lambda extension

* chore: fix import order

* chore: fix flake8 issue

* [serverless] fix broken test

* remove unused import

* separate out sync writer changes

* add integration tests for sync writer

Co-authored-by: DarcyRaynerDD <darcy.rayner@datadoghq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* ci: run django test suite with ddtrace enabled (#2203)

* Run django test suite with ddtrace enabled

* Added pylibmc install to fix build error

* Added django/ddtrace install commands to separate blocks

* Fixed annotation

* Installed generic version of ddtrace instead of checking out

* Added documentation

* Changed working directory, removed inherently failing test

* Reverted working directory change

* Reverted skipping specific test, added pythonpath

* Added more docs

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* ci: run flask test suite with ddtrace enabled (#2190)

* ci: run flask test suite with ddtrace enabled

* Update .github/workflows/flask.yml

Co-authored-by: Brett Langdon <me@brett.is>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
Co-authored-by: Brett Langdon <me@brett.is>
Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* docs: add note about pip version requirement (#2220)

* handle logwriter configure (#2219)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* test: move __init__ to utils (#2217)

Having code in tests/__init__ makes ddtrace being imported before we have a
chance to do anything, such as calling gevent.monkey.patch_all.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* refactor: distributed_tracing_enabled usages to trace_utils helper (#2216)

* Modified trace util helper to account for inconsistencies in distributed_tracing config key, refactored usages

* Refactored traceutils helper

* Made int_config optional in the trace_util helper, updated affected integration files

* Removed erroneous code addition to pyramid/trace.py

* Make tracer tests less noisy (#2221)

monkeypatch.setenv warns if the value given is not an instance of the
base string type.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* refactor(worker): replace regular thread by PeriodicThread (#2205)

Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com>

* Disable known failing Django tests (#2224)

* Disable known failing Django tests

* fix yaml multiline string

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* test(botocore): add test case for firehose args (#2227)

This change adds a test case for #2204 to ensure that the params.Records
argument is excluded from the span tags.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* refactor: move ddtrace.profiling._attr to ddtrace.utils (#2230)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Adding urllib3 integration (#1148)

* draft of urllib3 integration

* black formatting

* set request tags on the span before the call is made in case of errors

* rename span name and support retry tags based on CR feedback

* add tests and fix a few small bugs

* fix flake8

* add tests and circleci config

* flake8 fixes

* Update ddtrace/contrib/urllib3/__init__.py

Co-Authored-By: Tahir H. Butt <tahir.butt@datadoghq.com>

* use dd compat for urlparse and unparse

* use SpanType constants for type of span

use SpanTypes.HTTP

* fix tests

* fix flake8 from merge conflict resolution

* adapt to latest changes on master

* adopt latest standards for integrations

Use Pin to add patch information to urllib3 requests. Also use riot for configuring the test environment

* apply checklist points

* wip: make tests pass

* set tags on exception

also fixed the tests to override HTTP configuration where required

* docs(urllib3): add urllib3 integration docs

* fix service name handling

* fix copy-pasta errors and improve documentation

* Update ddtrace/contrib/urllib3/__init__.py

Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* Update riotfile.py

Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* remove unused imports from test sources

* use the new get_argument_value utility

* Update tests/contrib/urllib3/test_urllib3.py

Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* Update ddtrace/contrib/urllib3/patch.py

Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>

* commit Tahir's suggestions

* fix TracerTestCase import

Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com>
Co-authored-by: Gabriele N. Tornetta <gabriele.tornetta@datadoghq.com>
Co-authored-by: Gabriele N. Tornetta <P403n1x87@users.noreply.github.com>
Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* Revert writer.write argument change (#2232)

This is a breaking change which we're going to make at a later date.

* Make processor private on Tracer (#2233)

This isn't something we want to make public yet.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix(profiling): make sure PeriodicCollector start withing service lock (#2214)

This makes sure the periodic collectors are started while the service lock is
held to avoid race conditions while starting the services.

This has no impact in practice since the Profiler class was made thread safe
with #2179.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* refactor(forksafe): simplify code (#2222)

This simplifies the code to use a banal decorator approach rather than a weird
lambda-passed-as-argument-based 😉

It also adds typing and an `unregister` function to allow for… unregistering a function!

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* chore: correct typing hints for span.meta (#2237)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Lambda extension support (#2092)

* feat(agent): support aws lambda extension

* chore: fix flake8 issue with test

* chore: add release note for datadog lambda extension

* chore: fix import order

* chore: fix flake8 issue

* [serverless] fix broken test

* remove unused import

* Add docs, type hints

Co-authored-by: Kyle Verhoog <kyle.verhoog@datadoghq.com>
Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>

* fix(span): ensure ID args have the correct type (#2243)

Among all the arguments of the Span constructor, the ones bearing IDs
are the most important type-wise. This change adds checks to ensure that
span_id, trace_id and parent_id all have the expected type.

The rationale for adding the check at construction time is so that this
error could be caught as soon as possible, rather than defer it to later
processing stages, thus avoiding invalid entries in the processing
queue.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix: only warn if using uwsgi with ddtrace-run (#2202)

* fix: only warn if using uwsgi with ddtrace-run

* remove DeprecationWarning prefix

* update message

Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>

* Update ISSUE_TEMPLATE.md (#2248)

* Use multiproject pipeline to deploy to reliability env (#2249)

* Use multiproject pipeline for reliability env deploy

* Use default branch

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* style(pymongo): format pymongo (#2250)

* fix: handle text types in _set_str_tag (#2218)

* chore: correct typing hints for span.meta

* fix: handle text types in _set_str_tag

* chore: correct typing hints for span.meta

* fix: handle text types in _set_str_tag

* fix: handle text types in _set_str_tag

* flake8

* add doc string, fix test

* fix grpc traceback tag

Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
Co-authored-by: Gabriele N. Tornetta <P403n1x87@users.noreply.github.com>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com>
Co-authored-by: DarcyRaynerDD <darcy.rayner@datadoghq.com>
Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
Co-authored-by: Julien Danjou <julien.danjou@datadoghq.com>
Co-authored-by: Brett Langdon <brett.langdon@datadoghq.com>
Co-authored-by: Brett Langdon <me@brett.is>
Co-authored-by: Gabriele N. Tornetta <P403n1x87@users.noreply.github.com>
Co-authored-by: Justin <justin.james.iso@gmail.com>
Co-authored-by: Gabriele N. Tornetta <gabriele.tornetta@datadoghq.com>
Co-authored-by: Darcy Rayner <50632605+DarcyRaynerDD@users.noreply.github.com>
Co-authored-by: Laplie Anderson <randomanderson@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog A changelog entry is not required for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants