Skip to content

Conversation

@palazzem
Copy link

@palazzem palazzem commented Mar 26, 2018

Overview

Currently, the Tracer doesn't have a configuration system for each single integration. This means that each integration has a different way to set a specific flag/option: sometimes we use application settings (i.e. Django settings) and some others we use attributes (i.e. client.distributed_tracing for requests).

This patch aims to add to the Pin class a configuration object that can be used and retrieved with a simple API. The goal of the patch, is:

  • provide a way to set global configurations (i.e. for the whole requests library)
  • provide a way to set a configuration for a specific client (i.e. the specific requests Session)
  • hide the Pin class, so that a nicer API can be used; the Pin will stays as is because it will be used as underlying mechanism

@palazzem palazzem added this to the 0.12.0 milestone Mar 26, 2018
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main and only API, since it's a plain dictionary that makes everything easy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should log.debug() something here? the reason why we return a new dictionary is to avoid crashing users application. On the other hand, we fail silently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes worth a log.debug

ddtrace/pin.py Outdated
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't set anything, made exception of _target that is an internal attribute and not used by users.

ddtrace/pin.py Outdated
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a caveat. The current implementation allows to:

  • set a pin in a global variable (i.e. requests library to set global settings)
  • when an instance is created from a class that has a Pin attached, the pin reference is copied to the instance, meaning they both point to the same object
  • because the config object is a mutable value, it means that changing the configuration for an "instance" pin, changes also the global configuration for all other pins
  • this results in an error

All of this is done to keep the API simple, while reusing the Pin class. Another approach is to continue to expose and suggest the Pin API, that I would rather to hide / remove from the public facing API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we review our integrations to be sure this mechanism wasn't used in any of them?

Copy link
Author

@palazzem palazzem Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's part of the task and it seems we don't do that. Mostly because the caveat is present only for the config object and in the current implementation only the tags attribute is affected (so it's a bugfix in that sense). Indeed, if we change a service name for an instance of the main class, we must use the override() method that clones the Pin.

This behavior protects only the tags and config object from that change, so a bugfix and a new feature. We should be good here.

ddtrace/pin.py Outdated
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everytime a Pin is attached onto an object, we set the pin's owner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any change it can badly affect the garbage collector?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm investigating if it's better to just use the memory address instead of the object pointer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use id(obj) we avoid any kind of reference counting for the GC. This introduces a little overhead:

In [53]: timeit.timeit("id({'a':5})", number=10000)
Out[53]: 0.0017991065979003906

In [34]: timeit.timeit("{'a':5}", number=10000)
Out[34]: 0.0015370845794677734

that I think it's not relevant. The only caveat is that we have these results for CPython, and it could be different for other interpreters, but at the moment we officially support only CPython.

ddtrace/pin.py Outdated
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very fast operation, considering our settings have a limited cardinality (<10)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be a deepcopy? Don't we risk to have one day some nested config?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we have to consider, but here some benchmarks:

# from Django config, just to simulate a small set of configurations with an empty dict
DEFAULTS = {'AGENT_HOSTNAME': 'localhost',
 'AGENT_PORT': 8126,
 'AUTO_INSTRUMENT': True,
 'DEFAULT_DATABASE_PREFIX': '',
 'DEFAULT_SERVICE': 'django',
 'DISTRIBUTED_TRACING': False,
 'ENABLED': True,
 'INSTRUMENT_CACHE': True,
 'INSTRUMENT_DATABASE': True,
 'INSTRUMENT_TEMPLATE': True,
 'TAGS': {},
 'TRACER': 'ddtrace.tracer'
}
# test 1
DEFAULTS.copy()
# test 2
copy.deepcopy(DEFAULTS)

And here the results:

test 1: 0.00654911994934082
test 2: 0.2787208557128906

The huge overhead is because of the string copy, so if nested dictionaries are used and we go with a deepcopy, we should take in consideration such overhead.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a big difference. If for now copy is enough for all configuration options we want to provide, no reason to do more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will add a comment to explain why we use a copy for our use case

ddtrace/pin.py Outdated
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never expose config to users from the Pin class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is putting this underscore _config hiding it from the public API?
Technically it is still part of the __init__ which is publicly exposed ; I'm afraid it mislead people / allow bad things.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's a convention to hide attributes and elements that are prefixed with an underscore. Of course anyone can use it, but if it has an underscore it's considered extremely dangerous. All frameworks do that, though because of the nature of Python, anyone can access to any attribute that is prefixed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the patch for the caveats described above, this test fails.

ddtrace/pin.py Outdated
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LotharSee probably I may proxy this value to the self._config['service_name'] so that we don't lose the only setting available in the Pin.

@palazzem palazzem force-pushed the palazzem/config-system branch from 70eb0ae to 7a667c9 Compare March 27, 2018 12:08
@palazzem palazzem changed the base branch from 0.12-dev to palazzem/global-config March 27, 2018 12:08
from .span import Span
from .tracer import Tracer
from .configuration import Config
from .settings import Config
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the name to avoid long name and obfuscation of config

self._target = None
# keep the configuration attribute internal because the
# public API to access it is not the Pin class
self._config = _config or {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to copy it here, since any operation to use the pin, calls the clone() method once. Indeed, it's possible that we:

  • initialize the global config with our defaults
  • first set the Pin in one of our integrations
  • users update the configuration
  • when the Pin is used in the integration for the first time (via get_from) it gets cloned

If we copy the dictionary, we'll have always our defaults.

cfg['service_name'] = 'metrics'
eq_(global_config['service_name'], 'service')

def test_configuration_copy_upside_down(self):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test before was ok_(got is pin). We now clone the pin once, when get_from is used to safe-protect mutable structure like config and tags. Not doing that means introducing a bug where an instance can update global configurations (and we don't want that).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed using another approach.

@palazzem palazzem force-pushed the palazzem/config-system branch from 17a2ea8 to cce600a Compare March 27, 2018 13:38
@palazzem palazzem force-pushed the palazzem/global-config branch from 9fd89a2 to f34941b Compare April 9, 2018 16:00
@palazzem palazzem force-pushed the palazzem/config-system branch from cce600a to 51aa8c3 Compare April 9, 2018 17:20
@palazzem palazzem changed the base branch from palazzem/global-config to 0.12-dev April 9, 2018 17:20
@palazzem palazzem force-pushed the palazzem/config-system branch from 51aa8c3 to 885f4f9 Compare April 19, 2018 16:59
@palazzem palazzem merged commit 80748f4 into 0.12-dev Apr 19, 2018
@palazzem palazzem deleted the palazzem/config-system branch April 19, 2018 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants