Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support AWS Embedded Metric Format Version 0 #1

Merged
merged 2 commits into from
Apr 4, 2023
Merged

Add support AWS Embedded Metric Format Version 0 #1

merged 2 commits into from
Apr 4, 2023

Conversation

khanhntd
Copy link

@khanhntd khanhntd commented Apr 3, 2023

Same as this PR open-telemetry#20314

Description:
Currently, awsemfexporter will group all the metrics, translate to fields and setting the fields with exported metrics as EMF version 1 -

"_aws": {
					"CloudWatchMetrics": [
					{
						"Namespace": "ECS",
						"Dimensions": [ ["ClusterName"] ],
						"Metrics": [{"Name": "memcached_commands_total"}]
					}
					],
					"Timestamp": 1668387032641
			  	}

Therefore, adding support for EMF version v0 and a flag for customers to make a difference between Version 0 and Version 1 (e.g EMF v0)

				{
					"Version": 0,
					"CloudWatchMetrics": [
					{
						"Namespace": "ECS",
						"Dimensions": [ ["ClusterName"] ],
						"Metrics": [{"Name": "memcached_commands_total"}]
					}
					],
					"Timestamp": 1668387032641
			  	}

This can be validated by send metrics with awsemfexporter

image

Link to tracking Issue:
N/A

Testing:
There are two tests I have done:

  • By running go test with awsemfexporter, the result is
khanhntd@88665a0eaf54 awsemfexporter % go test ./...
ok      github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter       2.409s

@khanhntd khanhntd changed the title Emf Add support AWS Embedded Metric Format Version 0 Apr 3, 2023
@khanhntd khanhntd marked this pull request as ready for review April 3, 2023 14:07
@@ -47,18 +49,32 @@ func createDefaultConfig() component.Config {
LogStreamName: "",
Namespace: "",
DimensionRollupOption: "ZeroAndSingleDimensionRollup",
EnableEMFVersion1: true,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default EMF version is v1 and the public document mentions V1 more than V0. Therefore, enable EMF version with default is true

@@ -186,8 +186,8 @@ func (lm *LabelMatcher) init() (err error) {
if len(lm.Separator) == 0 {
lm.Separator = ";"
}
lm.compiledRegex = regexp.MustCompile(lm.Regex)
return
lm.compiledRegex, err = regexp.Compile(lm.Regex)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MustCompile will panic if the regexp is not compilable. Therefore, using Compile which are the same with MustCompile but return readable error instead of panic

@@ -102,6 +108,8 @@ type MetricDescriptor struct {
Overwrite bool `mapstructure:"overwrite"`
}

var _ component.Config = (*Config)(nil)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will validate if the Config declared by awsemfexporter Config has all the declared function by the interface

Copy link

@chadpatel chadpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few small comments. I am mostly wondering why we want to handle the uuid error that is extremely unlikely


if err != nil {
return nil, err
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we bothering to handle this error?

Copy link

@williazz williazz Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we ignore unhandled errors just because they are unlikely?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more trying to get at what changed. This is one of those type of methods that will never fail unless something is completely broken

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its failed, it would fail at unit test. As billy said, why would we ignore unhandled errors, if it is in test environment ,yes I agree it should be more less robust but I don't think we want to enforce it with prod env.


// create AWS session
awsConfig, session, err := awsutil.GetAWSConfigSession(logger, &awsutil.Conn{}, &expConfig.AWSSessionSettings)
awsConfig, session, err := awsutil.GetAWSConfigSession(set.Logger, &awsutil.Conn{}, &config.AWSSessionSettings)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this changing? It seems like you are refactoring things that are unrelated to the specific change necessary for EMFv0

There are a few examples of that in this PR

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i agree. if you must include unrelated changes in a single PR, then i'd appreciate comments saying that they are unrelated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO - better to refactor in a different PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with creating in different PR. However, its only simple line change and there are hardly and complication in it. If you are saying refactoring more than 5 files with more than 50 lines then yes, I would agree with you.

| `detailed_metrics` | Retain detailed datapoint values in exported metrics (e.g instead of exporting a quantile as a statistical value, preserve the quantile's population) | `false` |
| `output_destination` | Specify the EMFExporter output. Currently, two options are available. "cloudwatch" or "stdout" | `cloudwatch` |
| `detailed_metrics` | Retain detailed datapoint values in exported metrics (e.g instead of exporting a quantile as a statistical value, preserve the quantile's population) | `false` |
| `emf_version` | Send metrics to CloudWatchLogs with Embedded Metric Format in selected version [(e.g version 1 with _aws)](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format_Specification.html#CloudWatch_Embedded_Metric_Format_Specification_structure), version 0 without _aws) | `1` |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the critical thing to communicate to our customers is what the default is (i.e. 1) and that it is optional. It is also redundant to call it emf_version as we are in the emf exporter. Maybe output_version? I don't care as much about the naming as the lack of documentation on how this should be used (i.e., don't use it, the default is fine :))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree output_version would make confusing with the output_destination. Would output_version=2 means we are exporting emf version v2 to cloudwatch or we are exporting emf to cloudwatch version 2?

Agree with you on this

the critical thing to communicate to our customers is what the default is (i.e. 1) and that it is optional

That's why default exists right? For optional configuration instead of force customers to have expertise in it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to version for less redundant

@williazz
Copy link

williazz commented Apr 4, 2023

Did you do any testing?

@khanhntd
Copy link
Author

khanhntd commented Apr 4, 2023

Did you do any testing?

Its in PR description.

@khanhntd khanhntd merged commit 52ade39 into main Apr 4, 2023
@khanhntd khanhntd deleted the emf branch April 4, 2023 18:13
sky333999 pushed a commit that referenced this pull request Aug 24, 2023
…emetry#24676)

**Description:** The metadata.yml for the SSH check receiver currently
documents a resource attribute containing the SSH endpoint but this is
not emitted. This PR updates the receiver to include this resource
attribute.

**Link to tracking Issue:** open-telemetry#24441 

**Testing:**

Example collector config:
```yaml
receivers:
  sshcheck:
    endpoint: 13.245.150.131:22
    username: ec2-user
    key_file: /Users/dewald.dejager/.ssh/sandbox.pem
    collection_interval: 15s
    known_hosts: /Users/dewald.dejager/.ssh/known_hosts
    ignore_host_key: false
    resource_attributes:
      "ssh.endpoint":
        enabled: true

exporters:
  logging:
    verbosity: detailed
  prometheus:
    endpoint: 0.0.0.0:8081
    resource_to_telemetry_conversion:
      enabled: true

service:
  pipelines:
    metrics:
      receivers: [sshcheck]
      exporters: [logging, prometheus]
```

The log output looks like this:
```
2023-07-30T16:52:38.724+0200    info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "logging", "resource metrics": 1, "metrics": 2, "data points": 2}
2023-07-30T16:52:38.724+0200    info    ResourceMetrics #0
Resource SchemaURL: 
Resource attributes:
     -> ssh.endpoint: Str(13.245.150.131:22)
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope otelcol/sshcheckreceiver 0.82.0-dev
Metric #0
Descriptor:
     -> Name: sshcheck.duration
     -> Description: Measures the duration of SSH connection.
     -> Unit: ms
     -> DataType: Gauge
NumberDataPoints #0
StartTimestamp: 2023-07-30 14:52:22.381672 +0000 UTC
Timestamp: 2023-07-30 14:52:38.404003 +0000 UTC
Value: 319
Metric #1
Descriptor:
     -> Name: sshcheck.status
     -> Description: 1 if the SSH client successfully connected, otherwise 0.
     -> Unit: 1
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
StartTimestamp: 2023-07-30 14:52:22.381672 +0000 UTC
Timestamp: 2023-07-30 14:52:38.404003 +0000 UTC
Value: 1
```

And the Prometheus metrics look like this:
```
# HELP sshcheck_duration Measures the duration of SSH connection.
# TYPE sshcheck_duration gauge
sshcheck_duration{ssh_endpoint="13.245.150.131:22"} 311
# HELP sshcheck_status 1 if the SSH client successfully connected, otherwise 0.
# TYPE sshcheck_status gauge
sshcheck_status{ssh_endpoint="13.245.150.131:22"} 1
```
sky333999 pushed a commit that referenced this pull request Aug 24, 2023
)

**Description:** 

Adding command line argument `--status-code` to `telemetrygen traces`,
which accepts `(Unset,Error,Ok)` (case sensitive) or the enum equivalent
of `(0,1,2)`.

Running 

```shell
telemetrygen traces --otlp-insecure --traces 1 --status-code 1
```

against a minimal local collector yields

```txt
2023-07-29T21:27:57.862+0100	info	ResourceSpans #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0
Resource attributes:
     -> service.name: Str(telemetrygen)
ScopeSpans #0
ScopeSpans SchemaURL:
InstrumentationScope telemetrygen
Span #0
    Trace ID       : f6dc4be32c78b9999c69d504a79e68c1
    Parent ID      : 4e2cd6e0e90cf2ea
    ID             : 20835413e32d26a5
    Name           : okey-dokey
    Kind           : Server
    Start time     : 2023-07-29 20:27:57.861602 +0000 UTC
    End time       : 2023-07-29 20:27:57.861726 +0000 UTC
    Status code    : Error
    Status message :
Attributes:
     -> net.peer.ip: Str(1.2.3.4)
     -> peer.service: Str(telemetrygen-client)
Span #1
    Trace ID       : f6dc4be32c78b9999c69d504a79e68c1
    Parent ID      :
    ID             : 4e2cd6e0e90cf2ea
    Name           : lets-go
    Kind           : Client
    Start time     : 2023-07-29 20:27:57.861584 +0000 UTC
    End time       : 2023-07-29 20:27:57.861726 +0000 UTC
    Status code    : Error
    Status message :
Attributes:
     -> net.peer.ip: Str(1.2.3.4)
     -> peer.service: Str(telemetrygen-server)
```

and similarly (the string version)

```shell
telemetrygen traces --otlp-insecure --traces 1 --status-code '"Ok"'
```

produces 

```txt
Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0
Resource attributes:
     -> service.name: Str(telemetrygen)
ScopeSpans #0
ScopeSpans SchemaURL:
InstrumentationScope telemetrygen
Span #0
    Trace ID       : dfd830da170acfe567b12f87685d7917
    Parent ID      : 8e15b390dc6a1ccc
    ID             : 165c300130532072
    Name           : okey-dokey
    Kind           : Server
    Start time     : 2023-07-29 20:29:16.026965 +0000 UTC
    End time       : 2023-07-29 20:29:16.027089 +0000 UTC
    Status code    : Ok
    Status message :
Attributes:
     -> net.peer.ip: Str(1.2.3.4)
     -> peer.service: Str(telemetrygen-client)
Span #1
    Trace ID       : dfd830da170acfe567b12f87685d7917
    Parent ID      :
    ID             : 8e15b390dc6a1ccc
    Name           : lets-go
    Kind           : Client
    Start time     : 2023-07-29 20:29:16.026956 +0000 UTC
    End time       : 2023-07-29 20:29:16.027089 +0000 UTC
    Status code    : Ok
    Status message :
Attributes:
     -> net.peer.ip: Str(1.2.3.4)
     -> peer.service: Str(telemetrygen-server)
```

The default is `Unset` which is the current behaviour.

**Link to tracking Issue:**

24286

**Testing:**

Added unit tests which covers both valid and invalid inputs.

**Documentation:**

Command line arguments are self documenting via the usage info in the
flag.

Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
jefchien pushed a commit that referenced this pull request Nov 27, 2023
open-telemetry#29116)

**Description:** 

As originally proposed in open-telemetry#26991 before I got distracted

Exposes the duration of generated spans as a command line parameter. It
uses a `DurationVar` flag so units can be easily provided and are
automatically applied.

Example usage:

```bash
telemetrygen traces --traces 100 --otlp-insecure --span-duration 10ns # nanoseconds
telemetrygen traces --traces 100 --otlp-insecure --span-duration 10us # microseconds
telemetrygen traces --traces 100 --otlp-insecure --span-duration 10ms # milliseconds
telemetrygen traces --traces 100 --otlp-insecure --span-duration 10s # seconds
```

**Testing:** 

Ran without the argument provided `telemetrygen traces --traces 1
--otlp-insecure` and seen spans publishing with the default value.

Ran again with the argument provided: `telemetrygen traces --traces 1
--otlp-insecure --span-duration 1s`

And observed the expected output:

```
Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0
Resource attributes:
     -> service.name: Str(telemetrygen)
ScopeSpans #0
ScopeSpans SchemaURL: 
InstrumentationScope telemetrygen 
Span #0
    Trace ID       : 8b441587ffa5820688b87a6b511d634c
    Parent ID      : 39faad428638791b
    ID             : 88f0886894bd4ee2
    Name           : okey-dokey
    Kind           : Server
    Start time     : 2023-11-12 02:05:07.97443 +0000 UTC
    End time       : 2023-11-12 02:05:08.97443 +0000 UTC
    Status code    : Unset
    Status message : 
Attributes:
     -> net.peer.ip: Str(1.2.3.4)
     -> peer.service: Str(telemetrygen-client)
Span #1
    Trace ID       : 8b441587ffa5820688b87a6b511d634c
    Parent ID      : 
    ID             : 39faad428638791b
    Name           : lets-go
    Kind           : Client
    Start time     : 2023-11-12 02:05:07.97443 +0000 UTC
    End time       : 2023-11-12 02:05:08.97443 +0000 UTC
    Status code    : Unset
    Status message : 
Attributes:
     -> net.peer.ip: Str(1.2.3.4)
     -> peer.service: Str(telemetrygen-server)
	{"kind": "exporter", "data_type": "traces", "name": "debug"}
```

**Documentation:** No documentation added.

---------

Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants