Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vertica Integration #634

Merged
merged 38 commits into from
Oct 11, 2018
Merged

Add Vertica Integration #634

merged 38 commits into from
Oct 11, 2018

Conversation

Kyle-Verhoog
Copy link
Member

@Kyle-Verhoog Kyle-Verhoog commented Oct 8, 2018

Overview

This PR adds support for the vertica-python library.

The traced methods include:

vertica_python.vertica.cursor.Cursor

  • execute()
  • copy()
  • fetchone()
  • fetchall()
  • nextset()

along with relevant metadata for each method.

Examples/Usage

Usage and examples can be found in the traces examples repo here: https://github.com/DataDog/trace-examples.

Development Note

This PR also introduces a proof of concept of a new instrumentation mechanism that aims to make writing the majority of integrations easy, configurable and consistent. For instance, this is the only code needed to instrument the execute method in the Vertica library:

def execute_span_start(instance, span, conf, *args, **kwargs):
    span.set_tag("query", args[0])

def execute_span_end(result, instance, span, conf, *args, **kwargs):
    span.set_metric(dbx.ROWCOUNT, instance.rowcount)

# tracing configuration
config._add(
    "vertica",
    {
        "service_name": "vertica",
        "app": "vertica",
        "app_type": "db",
        "patch": {
            "vertica_python.vertica.cursor.Cursor": {
                "routines": {
                    "execute": {
                        "operation_name": "vertica.query",
                        "span_type": "vertica",
                        "span_start": execute_span_start,
                        "span_end": execute_span_end,
                    },
                },
            },
        },
    },
)

For now this API is made private and contained within the Vertica integration. It is to be tested solely with the Vertica integration.

This API is subject to change and should not be used outside the Vertica integration.

@Kyle-Verhoog Kyle-Verhoog added this to the 0.15.0 milestone Oct 8, 2018
@Kyle-Verhoog Kyle-Verhoog self-assigned this Oct 8, 2018
"execute": {
"operation_name": "vertica.query",
"span_type": "vertica",
"on_before": execute_before,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this naming, but I'm unsure of what to call these. Maybe pre_trace, post_trace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace_start trace_end?

span_start span_end/span_finish, would be inline with span_type, and operation_name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/operation_name/resource_name/?

_config=config["patch"][_patch_item],
).onto(instance)

def _find_config(instance, routine_name):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary/a placeholder until a better solution is thought up. This code is not run in the vertica integration since the situation in which it is needed (described in the long comment below) does not occur in the tracing configuration for the vertica library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can probably be defined at module level rather than in the for loop ?

@Kyle-Verhoog Kyle-Verhoog changed the base branch from master to 0.15-dev October 8, 2018 20:44
Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a first pass, a lot of comments here are mostly focused on style.

I want to give it a review through again focusing more on the overall strategy here.

This is a big one to review.

@@ -0,0 +1,4 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so many spaces

ddtrace/contrib/vertica/patch.py Outdated Show resolved Hide resolved

def _uninstall(config):
for patch_class_path in config["patch"]:
patch_mod = '.'.join(patch_class_path.split('.')[0:-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patch_mod, _, patch_class = patch_class_path.rpartition('.')


def _install(config):
for patch_class_path in config["patch"]:
patch_mod = '.'.join(patch_class_path.split('.')[0:-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patch_mod, _, patch_class = patch_class_path.rpartition('.')


# patch the __init__ of the class with a Pin instance containing the defaults
@wrapt.patch_function_wrapper(patch_mod, "{}.{}".format(patch_class, "__init__"))
def init_wrapper(wrapped, instance, args, kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this patching live outside of the loop on L133?

Seems like this only needs to be called once per class, not once per-routine?

ddtrace/contrib/vertica/patch.py Outdated Show resolved Hide resolved
for patch_routine in config["patch"][patch_class_path]["routines"]:
def _wrap():
# _wrap is needed to provide data to the wrappers which can
# only be provided from a function closure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow why this is needed?

This is a big function, very easily I could have read over the part that makes this necessary.

@@ -2,3 +2,4 @@
# tags
NAME = "db.name" # the database name (eg: dbname for pgsql)
USER = "db.user" # the user connecting to the db
ROWCOUNT = "db.rowcount" # the rowcount of a query
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be two spaces after the line before #

also, might want to add spaces to the lines above to keep them inline (assuming that was the intention there)

@@ -42,6 +42,7 @@
'aiopg': True,
'aiobotocore': False,
'httplib': False,
"vertica": True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/"/'/g

def _uninstall(config):
for patch_class_path in config["patch"]:
patch_mod = '.'.join(patch_class_path.split('.')[0:-1])
patch_class = patch_class_path.split('.')[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/'/"/g

It seems like the codebase isn't the most consistent with quotes right now, but since the rest of this file uses " we should do so here, and everywhere else in this file, as well (but I think I get why you did double quotes everywhere, but single quotes for single character strings)

.circleci/config.yml Show resolved Hide resolved
"app_type": "db",
"patch": {
"vertica_python.vertica.connection.Connection": {
"routines": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already discussed this in PM, an alternative approach could be to have:

"vertica_python.vertica.connection.Connection": {
  "cursor": { ... }
  "__other_metadata": ...
}

The pattern that I mean is not adding the routines level now, and use a keyword __xxxx later on if we need to configure at the Connection level. Not saying it is required, just triggering a conversation about it.


_PATCHED = False

def copy_before(instance, span, conf, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud. What if we replace def copy_before(): and def execute_before() with a function def tag_query() and then in the configuration:

...
  "execute": {
    "on_before": tag_query
  },
  "copy": {
    "on_before": tag_query
  }
...

In the future (now seems to be not necessary), "on_before" etc... could also be passed a list of operations, instead of only one.

This adds a few benefits IMO:

  • greater direction between the function name and what it does
  • we do not lose the relation between the "on_before" and the fact that a tag is set
  • less code repetition

This also apply to execute_after and fetch_after.



# tracing configuration
config._add(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kyle-Verhoog did I already mention that I love the idea? 😄 I think so 😄


def wrap_init():
_patch_item = patch_class_path
patch_class_routine = "{}.{}".format(patch_class, "__init__")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we need to patch a module function rather than a class method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kyle-Verhoog (idk if that works) note to self: we still need to address this later on.

tox.ini Show resolved Hide resolved
ddtrace/contrib/vertica/patch.py Show resolved Hide resolved
patch_mod, _, patch_class = patch_class_path.rpartition(".")

for patch_routine in config["patch"][patch_class_path]["routines"]:
mod = importlib.import_module(patch_mod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved outside of the loop.

ddtrace/contrib/vertica/patch.py Outdated Show resolved Hide resolved

config_routines = config["patch"][full_name]["routines"]

if full_name in config["patch"] and routine_name in config_routines:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/full_name in config["patch"] and //

We are already doing that assertion above.

_config=config["patch"][_patch_item],
).onto(instance)

wrap_init()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a function, can we not just execute the code immediately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning behind writing these as inline functions is that it allowed me to quickly change around the logic without having to constantly update the signatures. That said, if we're happier with this approach we can clean these up into helper functions 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

from .utils import override_config


TEST_TABLE = "test_table"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just import from .fixtures import TEST_TABLE

tests/contrib/vertica/test_vertica.py Show resolved Hide resolved
tests/contrib/vertica/test_vertica.py Show resolved Hide resolved
tests/contrib/vertica/test_vertica.py Show resolved Hide resolved
tox.ini Show resolved Hide resolved
Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other small comments, but mostly looking good.

span.set_metric(dbx.ROWCOUNT, instance.rowcount)


def cursor_after(instance, cursor, span, conf, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be cursor_span_end to match the others?

"vertica_python.vertica.connection.Connection": {
"routines": {
"cursor": {"trace_enabled": False, "span_end": cursor_after}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing comma here and line above.

# TODO?: remove Pin dependence
pin = Pin.get_from(instance)

# TODO: inheritance problem with pins:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can likely remove this TODO, but keep some form of this comment so we know the problem/what we are doing.

ddtrace/contrib/vertica/patch.py Show resolved Hide resolved
ddtrace/contrib/vertica/patch.py Show resolved Hide resolved


# tracing configuration
config._add(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently working on the settings module and one thing came to my mind. How this approach fit into the user workflow. I mean if a user provides some settings through the config api would we override them completely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have to figure out how exactly we want to expose the config api to the user for them to modify. This is something that is still to be determined. In the documentation for Vertica I have only mentioned a custom service_name. This is all that I think we want to expose for now.

I'll look into adding more traditional customization via the Pin api to ensure we maintain that compatibility.

Good catch!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

ddtrace/contrib/vertica/patch.py Show resolved Hide resolved
Copy link
Member

@labbati labbati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I erroneously instead of commenting, please see at my only comment I added

span.set_metric(dbx.ROWCOUNT, instance.rowcount)


def cursor_span_end(instance, cursor, _, conf, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ 👍

Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm



# tracing configuration
config._add(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@brettlangdon brettlangdon changed the base branch from 0.15-dev to master October 11, 2018 16:00
@Kyle-Verhoog Kyle-Verhoog merged commit 6c1bbf6 into master Oct 11, 2018
@Kyle-Verhoog Kyle-Verhoog deleted the vertica branch October 11, 2018 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants