Skip to content

Latest commit

 

History

History
240 lines (189 loc) · 11.2 KB

0000-tsdb-dimensions.md

File metadata and controls

240 lines (189 loc) · 11.2 KB

0000: TSDB Dimensions

  • Stage: 0 (strawperson)
  • Date: TBD

Fields

This RFC proposes the annotating of certain ecs fields as dimension. This change is proposed to take the advantage of using TSDB offered by the elasticsearch without impacting the data injection.

Annotating field as dimension is one of the important step in the process of TSDB adoption. Failing to annotate adequate number of fields as dimension when TSDB is enabled may lead to data loss. A large majority of fields that must be annotated as dimension fields are ecs fields. Presently, the Integration (Service Integration, Cloud Native, etc ) developers are expected to annotate ecs fields as dimensions in integration configuration. To avoid the duplicatation in configuration, minimize data loss probability, the RFC is proposed. dimension field takes two values - true and false.

Changes to :service mapping

---
- name: service
  title: Service
  group: 2
  short: Fields describing the service for or from which the data was collected.
  description: >
    The service fields describe the service for or from which the data was collected.

    These fields help you find and correlate logs for a specific
    service and version.
  footnote: >
    The service fields may be self-nested under service.origin.* and service.target.*
    to describe origin or target services in the context of incoming or outgoing requests,
    respectively.
    However, the fieldsets service.origin.* and service.target.* must not be confused with
    the root service fieldset that is used to describe the actual service under observation.
    The fieldset service.origin.* may only be used in the context of incoming requests or
    events to describe the originating service of the request. The fieldset service.target.*
    may only be used in the context of outgoing requests or events to describe the target
    service of the request.
  reusable:
    top_level: true
    expected:
      - at: service
        as: origin
        beta: Reusing the `service` fields in this location is currently considered beta.
        short_override: Describes the origin service in case of an incoming request or event.
      - at: service
        as: target
        beta: Reusing the `service` fields in this location is currently considered beta.
        short_override: Describes the target service in case of an outgoing request or event.
  type: group
  fields:

    - name: address
      level: extended
      type: keyword
      dimension: true
      short: Address of this service.
      description: >
        Address where data about this service was collected from.

        This should be a URI, network address (ipv4:port or [ipv6]:port) or a resource path (sockets).
      example: 172.26.0.2:5432

Changes to host mapping

---
- name: host
  title: Host
  group: 2
  short: Fields describing the relevant computing instance.
  description: >
    A host is defined as a general computing instance.

    ECS host.* fields should be populated with details about the host on which
    the event happened, or from which the measurement was taken.
    Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes.
  type: group
  fields:
    - name: hostname
      level: core
      type: keyword
      short: Hostname of the host.
      dimension: true
      description: >
        Hostname of the host.

        It normally contains what the `hostname` command returns on the host machine.

Changes to host mapping

---
- name: agent
  title: Agent
  group: 2
  short: Fields about the monitoring agent.
  description: >
    The agent fields contain the data about the software entity, if any, that collects, detects, or observes events on a host, or takes measurements on a host.

    Examples include Beats. Agents may also run on observers. ECS agent.* fields shall be populated with details of the agent running on the host or observer where the event happened or the measurement was taken.
  footnote: >
    Examples: In the case of Beats for logs, the agent.name is filebeat. For APM, it is the
    agent running in the app/service. The agent information does not change if
    data is sent through queuing systems like Kafka, Redis, or processing systems
    such as Logstash or APM Server.
  type: group
  fields:

    - name: id
      level: core
      type: keyword
      short: Unique identifier of this agent.
      description: >
        Unique identifier of this agent (if one exists).

        Example: For Beats this would be beat.id.
      example: 8a4f500d
      dimension: true

Usage

Integration package development is the key beneficiary of this change. The fields of the document that are received from an integration receives a field mapping. If and when TSDB benefits are to be utilised, along with the field mapping with a metric type, at least one of the fields must receive dimension: true annotation.

Example of field mapping in integrations with the field enabled as a dimension field.

---
- name: wait_class
  type: keyword
  description: Every wait event belongs to a class of wait event. 
  dimension: true

Source data

The source of this data comes from monitoring a host like a Linux machine, laptop or a k8s node. The can come delivered through different shippers like Elastic Agent system metrics inputs, apm agents, prometheus node exporter and other host metric collectors.

Scope of impact

Concerns

No concerns are known as of now. Presence of the dimension:true does not impact functionality. Elastic Stack version 8.7 is essential for this.

People

The following are the people that consulted on the contents of this RFC.

  • @agithomas | author
  • @ruflin | subject matter expert
  • @lalit-satapathy | reviewer
  • @martijnvg | reviewer

References

RFC Pull Requests