Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otelcol.Collector.DryRun() now instantiates all components #10031

Closed
SeanPMiller opened this issue Apr 24, 2024 · 13 comments · Fixed by #10078
Closed

otelcol.Collector.DryRun() now instantiates all components #10031

SeanPMiller opened this issue Apr 24, 2024 · 13 comments · Fixed by #10078
Labels
bug Something isn't working

Comments

@SeanPMiller
Copy link

Describe the bug
This behavior might be intended. However, if you write your own entrypoint into the collector, as I have, and you call otelcol.Collector.DryRun() with a goal of determining whether the configuration "looks valid" before calling Run(), then service.New() is now called as part of the dry run, and its return value is discarded. This behavior is a consequence of this commit.

Due to the above behavior, calling DryRun() now, for example, calls open() and bind() for the self-monitoring telemetry-service port and the health-check extension port, among other ports. When Run() is subsequently called, the second instance of each server component fails with a bind() error because its port has already been opened and bound.

This might be a misuse of the DryRun() function. Let me know, and thanks in advance.

Steps to reproduce
Write the above code.

What did you expect to see?
The old behavior.

What did you see instead?

Error: failed to start extensions: failed to bind to address 0.0.0.0:13133: listen tcp 0.0.0.0:13133: bind: address already in use; failed to shutdown pipelines: no existing monitoring routine is running
2024/04/24 22:03:22 collector server run finished with error: failed to start extensions: failed to bind to address 0.0.0.0:13133: listen tcp 0.0.0.0:13133: bind: address already in use; failed to shutdown pipelines: no existing monitoring routine is running

What version did you use?
v0.98.0

Environment
go version go1.22.0 linux/amd64

@SeanPMiller SeanPMiller added the bug Something isn't working label Apr 24, 2024
@SeanPMiller
Copy link
Author

As a workaround, I have switched to using the following function:

func temporaryDryRunPatch(settings otelcol.CollectorSettings, ctxt context.Context) (err error) {
    factories, err := settings.Factories()
    if err != nil {
        return fmt.Errorf("failed to initialize factories: %w", err)
    }   

    confProv, err := otelcol.NewConfigProvider(settings.ConfigProviderSettings)
    if err != nil {
        return fmt.Errorf("failed to instantiate configuration provider: %w", err)
    }   
    defer func() {
        if shutdownErr := confProv.Shutdown(ctxt); shutdownErr != nil {
            if err != nil {
                err = fmt.Errorf("configuration-provider shutdown failed after encountering error: shutdown_err=%w, original_err=%w", shutdownErr, err)
            } else {
                err = fmt.Errorf("configuration-provider shutdown failed: %w", shutdownErr)
            }   
        }   
    }() 

    cfg, err := confProv.Get(ctxt, factories)
    if err != nil {
        return fmt.Errorf("failed to get config: %w", err)
    }   

    return cfg.Validate()
}

This gives me the behavior I want, so this is not an especially high-priority issue.

@mx-psi
Copy link
Member

mx-psi commented Apr 25, 2024

Thanks for reporting! Seems like this was caused by #9257 cc @djaglowski @ycombinator

I don't quite understand why it happens though, my expectation would be that until we call Service.Start no extension actually registers anything. @SeanPMiller, two questions for you:

  1. Does this only happen with the healthcheck extension? Or can you see this happening with other components?
  2. Can you confirm that this does not happen in v0.97.0?

@SeanPMiller
Copy link
Author

I'll work on a demonstration program. Having a day job is a drag.

@SeanPMiller
Copy link
Author

https://github.com/SeanPMiller/otelcol-issue-10031

This is essentially condensed from otelcorecol, except for the custom root command.

@SeanPMiller
Copy link
Author

Specific answers:

  1. I see this in production with both the selfmon Prometheus telemetry service and also with healthcheck extension. The log shows, for example, the following:
    2024-04-25T18:19:24.987Z	info	service@v0.98.0/telemetry.go:55	Setting up own telemetry...
    2024-04-25T18:19:24.987Z	info	service@v0.98.0/telemetry.go:97	Serving metrics	{"address": ":8888", "level": "Basic"}
    2024-04-25T18:19:24.990Z	info	service@v0.98.0/telemetry.go:55	Setting up own telemetry...
    2024-04-25T18:19:24.990Z	info	service@v0.98.0/telemetry.go:97	Serving metrics	{"address": ":8888", "level": "Basic"}
    
  2. I can confirm that this does not happen in v0.97.0, as demonstrated by the reproduction above.

@mx-psi
Copy link
Member

mx-psi commented Apr 26, 2024

Thanks for the repro! I see how this can fail with the Prometheus endpoint, I am still a bit puzzled with the healthcheck case.

@SeanPMiller
Copy link
Author

I am, too. I'll see if I can reproduce, but it only seems to happen with a sufficiently complex configuration.

@TylerHelmuth
Copy link
Member

I think this issue in contrib around validate is related. The kubeletstatreceiver's createMetricsReceiver function is being invoked when validate is used.

@ycombinator
Copy link
Contributor

Perhaps we should revert #9257 and reopen #8007 for now, so we can take another stab at implementing validation in a different way?

@mx-psi
Copy link
Member

mx-psi commented May 3, 2024

The kubeletstatreceiver's createMetricsReceiver function is being invoked when validate is used.

That's to be expected, isn't it? The problem there IMO is that a component should not be opening any files during creation, only during startup.

Perhaps we should revert #9257 and reopen #8007 for now, so we can take another stab at implementing validation in a different way?

Let's do this. It also seems like we have not set clear expectations on what components should do at creation time vs startup, so we should work on this as well.

@TylerHelmuth
Copy link
Member

That's to be expected, isn't it? The problem there IMO is that a component should not be opening any files during creation, only during startup.

That may be true, and it is possible the k8s components have a bug, but the reason the user in that issue starting seeing an issue when using validate is because in #9257 instead of only calling cfg.Validate we also now instantiate the components via validatePipelineCfg.

@TylerHelmuth
Copy link
Member

I agree with reverting the change for now

@ycombinator
Copy link
Contributor

ycombinator commented May 3, 2024

I'm unable to simply click the "Revert" button on #9257 so instead I've created #10078 manually reverting the relevant bits of that PR.

bogdandrutu pushed a commit that referenced this issue May 3, 2024
#### Description

This PR reverts the change made in
#9257 due
to problems reported in
#10031.

<!-- Issue number if applicable -->
#### Link to tracking issue
Fixes #10031.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants