Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elastic] improve hostname resolution #2696

Merged
merged 1 commit into from
Jul 22, 2016

Conversation

yannmh
Copy link
Member

@yannmh yannmh commented Jul 21, 2016

When the cluster_stats parameter is enabled in the user configuration,
the Datadog Agent's Elasticsearch check attempts to resolve and
attach with the corresponding metrics, the hostname of the different
nodes.

On Amazon Elasticsearch v1.5.2, the node objects
returned by the _cluster/nodes/stats?all=true endpoint returns the
following keys:

root@xxxx:~# cat node_stats.json| jq '.nodes[] | select(.name ==
"${hostname}") | keys
[
  "breakers",
  "fs",
  "indices",
  "jvm",
  "name",
  "os",
  "process",
  "thread_pool",
  "timestamp"
]

where name contains the hostname.

@yannmh yannmh added this to the 5.8.5 milestone Jul 21, 2016
@yannmh yannmh force-pushed the yann/elasticsearch-cluster-stats-name branch from be21655 to 80d3fdd Compare July 21, 2016 00:01
@olivielpeau olivielpeau self-assigned this Jul 21, 2016
@olivielpeau
Copy link
Member

LGTM!

I'm not familiar with Amazon Elasticsearch, do you know if they set the node name to a "sensible" default (since by default on a vanilla ES the default node name is a random Marvel character name).

For reference the node name can be changed in ES' configuration with the node.name config option (see https://www.elastic.co/guide/en/elasticsearch/reference/1.5/setup-configuration.html#node-name)

@yannmh
Copy link
Member Author

yannmh commented Jul 21, 2016

do you know if they set the node name to a "sensible" default (since by default on a vanilla ES the default node name is a random Marvel character name).

What do you mean by "sensible" ? Indeed, from what I have seen, the default name is also a random Marvel character name.

@olivielpeau
Copy link
Member

By "sensible" I meant "that reflects the hostname or something close/similar"

Basically the check will use that name as the hostname of the metrics that are sent, so my concern is that if the names don't make sense as hostnames we'll have host: tags and hosts in the infrastructure list that use these "weird" names.

It's not a blocker to merge this PR, but if by default the name is still the random Marvel character, I think we should advise users of Amazon Elasticsearch to set the node.name to something that makes sense to them.

@yannmh yannmh force-pushed the yann/elasticsearch-cluster-stats-name branch from 80d3fdd to 467a1f1 Compare July 21, 2016 18:06
@yannmh
Copy link
Member Author

yannmh commented Jul 21, 2016

Following our discussion earlier,
name is a unique piece of information for the node. It generally matches the hostname but not always. Thus, let's use it as a tag with the namespace node_name instead of a hostname.

I made the change and will likely have to update the tests too (will see what Travis reports). @olivielpeau can you take a last 👀 on it please ?

@olivielpeau
Copy link
Member

Changes look good!

To make the tests pass we should probably set an explicit node.name in conf/elasticsearch.yml (see https://www.elastic.co/guide/en/elasticsearch/reference/1.4/setup-dir-layout.html and https://www.elastic.co/guide/en/elasticsearch/reference/1.4/setup-configuration.html#node-name) and test that the node_name tag is set with that name.

@yannmh yannmh force-pushed the yann/elasticsearch-cluster-stats-name branch 4 times, most recently from b22fee8 to 7b8b64d Compare July 22, 2016 20:11
When the `cluster_stats` parameter is enabled in the user configuration,
the Datadog Agent's Elasticsearch check attempts to resolve and
attach with the corresponding metrics, the hostname of the different
nodes.

On Amazon Elasticsearch v1.5.2, the node objects
returned by the `_cluster/nodes/stats?all=true` endpoint does not return
this information. Instead the following keys are available:
```
root@xxxx:~# at node_stats.json| jq '.nodes[] | select(.name ==
"${hostname}") | keys
[
  "breakers",
  "fs",
  "indices",
  "jvm",
  "name",
  "os",
  "process",
  "thread_pool",
  "timestamp"
]
```

`name` is a unique piece of identity. It generally match the hostname.
Use it an additional tag with the namespace `node_name`.
@yannmh yannmh force-pushed the yann/elasticsearch-cluster-stats-name branch from 7b8b64d to 7ba4a36 Compare July 22, 2016 20:50
@degemer degemer merged commit c3783ee into master Jul 22, 2016
@degemer degemer deleted the yann/elasticsearch-cluster-stats-name branch July 22, 2016 22:14
degemer pushed a commit that referenced this pull request Jul 22, 2016
When the `cluster_stats` parameter is enabled in the user configuration,
the Datadog Agent's Elasticsearch check attempts to resolve and
attach with the corresponding metrics, the hostname of the different
nodes.

On Amazon Elasticsearch v1.5.2, the node objects
returned by the `_cluster/nodes/stats?all=true` endpoint does not return
this information. Instead the following keys are available:
```
root@xxxx:~# at node_stats.json| jq '.nodes[] | select(.name ==
"${hostname}") | keys
[
  "breakers",
  "fs",
  "indices",
  "jvm",
  "name",
  "os",
  "process",
  "thread_pool",
  "timestamp"
]
```

`name` is a unique piece of identity. It generally match the hostname.
Use it an additional tag with the namespace `node_name`.
degemer added a commit that referenced this pull request Jul 28, 2016
* master: (119 commits)
  [core] remove noisy logs (#2715)
  run multiple instances of pylint (#2716)
  [packaging] Release 5.8.5 (#2712)
  [changelog][5.8.5] Add notes on packaging changes (#2710)
  [changelog] Update 5.8.5 with python upgrade
  be more specific when logging ssh errors (#2708)
  [marathon] allow base_url with path (#2620)
  [changelog] Update 5.8.5
  add issue and pr templates
  [sdk] minor tweaks - sdk env detection, check location (#2694)
  [http_check] log exceptions 🔊
  use 0.0.0.0 as server address when non_local_traffic is passed (#2691)
  [elastic] tag stats metric with the node name 🏷 (#2696)
  [openstack] moving proxy logic to AgentCheck, for maintainability. Fixing typos.
  [changelog] 5.8.5 draft
  [tests] lower flakiness of test_no_parallelism (#2690)
  [core] don't use docker hostname if it's a EC2 one (#2661)
  [haproxy] Fix `KeyError` when an unknown status is found (#2681)
  [changelog] Fix md links
  [jenkins] Deprecate check (#2688)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants