Search code, repositories, users, issues, pull requests...

opened

on Mar 15, 2024

Is your feature request related to a problem? Please describe.

For better defining the otel-collector-config.yml, a JSON schema would be great to validate the configuration before using it in the collector itself.

Describe the solution you'd like

A JSON Schema for validating the configuration when writing it, similarly to Docker compose files or Kubernetes manifest files, ...

Describe alternatives you've considered

Reading the docs multiple times, and checking config by running it on local cluster w/ Docker compose or Kubernetes.

added

and removed

Member

@TylerHelmuth If I understand json schema correctly you could write a schema file and use it to validate the config externally (not inside the collector) correct?

Author

Yes, the configuration file in .yml will be statically checked against the JSON Schema, therefore providing an easier way to configure all the possible aspects of the otel collector.

You can try this out with a compose.yml file for defining Docker compose stacks, for example. And you will see that you get automatic intellisense on defining the important parts of the configuration.

And, in your schema, you can define if a property is required or not and his types.

TylerHelmuth

Member

I recommend you write and maintain your own file specific to your collector build.

Since the collector is composable from components from anywhere, and those components can use any configuration they want, it isn't possible for us to provide a single, wholistic, schema file for any collector build.

We could maybe provide one with each distro we provide, but even that would be hard to maintain.

Member

related, but closed issue open-telemetry/opentelemetry-collector-contrib#27003

Member

@TylerHelmuth I disagree with your DIY position. The advantage of OTEL collector for custom distros is in providing a reusable baseline. From my perspective the discoverability of configuration of different components is very bad today, 2 out of 5. A DIY approach cannot solve for the lack of schema / documentation of existing components, it needs to be solved centrally in the main collector repo. JSON schema is one option, I personally would prefer protobuf schema (either would need a prototype to iron out kinks), as I mentioned in open-telemetry/opentelemetry-collector-contrib#27003 (comment)

TylerHelmuth

Member

@yurishkuro I agree that we could improve the consistency and quality of component documentation - being able to auto-generate standard docs from component config would be awesome.

What I'm arguing is that the Collector Core repo cannot supply a single schema file that could be used to statically validate any collector build.

Each distribution at https://github.com/open-telemetry/opentelemetry-collector-releases could maybe supply a schema that knows about each component in its manifest. Maybe the collector could add an extra command to generate a schema file based on the supplied config.

Member

What I'm arguing is that the Collector Core repo cannot supply a single schema file that could be used to statically validate any collector build.

agreed, but if there was a standard mechanism defined & used in Core it would be much more likely that components and contrib plugins would start using it to declare their configuration, improving the overall ecosystem.

cforce

Contributor

With schema support issue #9707 would be based upon

cmgriffing

I think this would improve the experience quite a bit.

As further info, if the JSON Schema were uploaded to SchemaStore.org, various tools and editors would get automatic support in their various plugins.
https://www.schemastore.org/json/#editors

As an example, this VSCode YAML extension would be able to consume it:
https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml

While I understand the concerns for extensibility, a JSON schema can allow arbitrary fields in addition to the official fields. So there is nothing lost if a different tool wants to add custom config.

Author

Hi everyone,

After exploring more related issues and work on this topic, I will try to summarize my findings in this comment.

Auto Generation of Component Configuration from metadata.yaml opentelemetry-collector-contrib#27003
- Looks like this is a first work on automating the generation of YAML schema for the collector config file.
- Could be a solution to this issue if merged ?
- A related proposal defines the steps and configuration schema to generate here: https://docs.google.com/document/d/15SMsJAdE1BHAmwf8cLXBvsHDrezXOHniaRZafjO-5e4/edit
Automate reference documentation as YAML or JSON opentelemetry-collector-contrib#24189
- This is also a related issue that has been created based on/related to the latter PR.
- would this be a suitable solution for adding jsonschema/YAML to the Collector config file ?
https://github.com/open-telemetry/opentelemetry-configuration
- It seems like there is a repo for opentelemetry configuration, but, as I'm unfamiliar with all those components, I don't know if it is related to the collector config or other components.
- I include it for reference.
[Proposal] Automate reference documentation as YAML files community#1610
- this issue also seems to be on the same topic.
- I include it for reference too.
https://github.com/splunk/collector-config-tools/tree/main/cfg-metadata
- it seems like splunk has already made some work on this and generate some metadata files for community exporters, extensions, processors and receivers.
- that's one of their motivation to upstream this work for making it available for the whole community.

Moreover, here are some Go libs for generating jsonschema files based on Go types:

Thus, what are the next steps to implement this ?
I'm willing to contribute if necessary.

Author

Hi,
A temporary alternative would be to use https://github.com/dash0hq/otelbin

This tool allows to validate the otel collector config against different schemas.
And it also provides an easy way to visualise the collector flow.

See the live tool here also : https://www.otelbin.io/

jpkrohling

Member

this could be useful to the operator as well, perhaps for the auto-upgrade mechanism

cc @yuriolisa

Member

@jpkrohling @TylerHelmuth I am noticing that OTEL is often not participating in LFX mentorship programs, wouldn't this be a good project for an intern?

NB: the deadline for summer term is today 5pm PT.

9 remaining items

I don't know if having default values for fields is actually a requirement

I believe it is, because the docs need to include it.

as long as there is a generic mechanism to add proto annotations that will inject struct tags the solution can be built using those tags

The problem is that mapstructure doesn't have a way to set a default value through a tag.

lots of validation solutions out there already, maybe they do support default values too

I hope so! I'll have a look to see if I can find any.

mentioned this

on Jun 10, 2024

[WIP] Autogenerate config.go files from Pkl schema #10376

@yurishkuro Unfortunately, I found it difficult to find a clean solution using protobuf. Using Pkl seems more promising to me, because it contains a lot of the features we need off the shelf. I opened #10376 to show how using Pkl in the batch processor could look like - please feel free to take a look :)

Member

Unfortunately, I found it difficult to find a clean solution using protobuf.

can you summarize your findings / difficulties?

The schema could be useful for generating code and docs both in the main Collector repo, and in its distributions. For example:

Creating an alternative config.go in an OTel distribution where the mapstructure tags are not sufficient.
Generating markdown which list all config arguments, whether they are optional, what their default values are, and whether there are any constraints on them.

Protobuf v3 has a few notable limitations:

No optional (nullable) values.
No default values.
No "constraints" on arguments.

There are various solutions which exist to mitigate those issue individually to some extent. However, because they are all independent of each other, I don't think they would work very well together. For example, one could use protovalidate for validation, protoc-gen-doc for generating docs, and gnostic for generating schema, but since all of those are independent solutions, the validation rules won't be present in the docs and in the schema. In addition, I don't know how to work around the lack of optional arguments and default values.

On the other hand, Pkl seems pretty much designed for this use case and it is a single tool with almost everything we need right now. It was open-sourced by Apple a few months ago and is picking up popularity. Using the right tool for the job would look cleaner, be easier to maintain, and will cost less in the long term since hopefully the upstream project will maintain all those features as one cohesive whole, rather than having to rely on a set of independent products like in the Protobuf ecosystem.

theletterf

Member

Adding myself to the party here, after opening open-telemetry/opentelemetry-collector-contrib#24189 a year ago. +CC @atoulme.

The stake for Splunk, and for OpenTelemetry.io docs, is having reliable information in YAML format as to what metrics and configs are available for each component.

https://github.com/splunk/collector-config-tools/blob/main/cfg-metadata/exporter/otlp.yaml

Hi, @theletterf! In order for you to generate docs, I suppose you need information such as:

Whether a given config argument is optional or mandatory.
What the default value is.
Whether there are constraints on the argument. E.g. a log_level string argument which can only have a value of either "info" or "debug". Or a send_batch_max_size_bytes argument which has to be greater than "0".

Is this a correct assumption? Do you need any other information in the schema?

theletterf

Member

For settings, this is the info we gather using cfgschema:

For metrics:

https://github.com/splunk/collector-config-tools/blob/main/metric-metadata/elasticsearchreceiver.yaml

You might want to take a look at the contraption we built for this:

https://github.com/splunk/collector-config-tools/tree/main/cfgschema

Thanks, @theletterf! I will keep your requirements in mind.

I know the the Pkl PR is a bit disruptive, mostly due to the Pkl Duration type and the Pkl-specific validation rules. As a next step, I will try to propose something which doesn't disrupts the existing code so much. I may update the Pkl PR and/or revisit a solution such as the one that uses go-jsonschema.

I can only spend about 1 day a week on this project, so it might take me a few weeks to get back to you with updated proposals.

theletterf

Member

Considering how long we've been chasing this, all activity is welcome. :)

mentioned this

on Jul 22, 2024

Generate config Go code from schema #10694

tsloughter

Member

agreed, but if there was a standard mechanism defined & used in Core it would be much more likely that components and contrib plugins would start using it to declare their configuration, improving the overall ecosystem.

Right. The SDK configuration file schema is the same way but still seen as a benefit to be able to validate what it can and hope that others will extend it.

atoulme

mentioned this

on Oct 2, 2024

What's the replacement of configschema ? open-telemetry/opentelemetry-collector-contrib#33778

TylerHelmuth

mentioned this

on Dec 2, 2024

add document defining an OpenTelemetry Collector open-telemetry/opentelemetry-specification#4313