Skip to content

[scraperhelper] Can't run scrapers in parallel #13113

@dehaansa

Description

@dehaansa

Component(s)

scraper/scraperhelper

Describe the issue you're reporting

As reported in this issue for the sqlqueryreceiver, the scraperhelper's controller always runs scrapers in series. In the case of the sqlqueryreceiver this means that each query is run sequentially rather than leveraging the connection pool options and running in parallel.

I think there are a few options for how to improve the behavior.

  1. No change to scraperherlper. To get the benefits of a connection pool for example, that logic will need to be embedded inside a single scraper instead of the current pattern in the sqlqueryreceiver of one scraper per query.

  2. Always parallelize scrapers. This might have issues in some cases, if scrapers could potentially conflict with each other. However, it feels like what I would expect the behavior to be as a user especially if the scraperhelper package continues to evolve (some examples in Scraper feedback, and what to do next? #11238)

  3. Configurable parameter in the scraper controller to run scrapers in parallel. The benefits of parallel without potentially breaking some existing uses of the package. I think parallel should be the default, but if we don't want to change existing behavior it could be opt-in.

  4. Configurable parameter(s) in the scraper definition to define if an individual scraper should be run in parallel. This feels excessive to me, but allows for the case where some scrapers must be run exclusive of each other. We could get deep in the weeds here with marking dependencies/conflicts and dividing sets of scrapers that can run in parallel, etc if that's something we want to support.

I'm in favor of changing the behavior to always parallelize (1), however existing uses of the scraper packages will need to be evaluated to be sure this is safe.

Activity

dehaansa

dehaansa commented on May 29, 2025

@dehaansa
Author

CC @bogdandrutu as you appear to have done recent work on the scraper packages

josepcorrea

josepcorrea commented on May 30, 2025

@josepcorrea

One of the main reasons we believe this is a bug is that the current sequential execution of scrapers can break the expected collection_interval behavior.

For example, if the collection_interval is set to 3 minutes and one SQL query takes 5 minutes to complete, the following queries won’t start until that one finishes. As a result, even lightweight queries that should run every 3 minutes might actually be delayed significantly, leading to inaccurate or outdated metrics.

This behavior defeats the purpose of defining a consistent scrape interval and can be especially problematic in environments where some queries are much heavier than others.

josepcorrea

josepcorrea commented on Jun 2, 2025

@josepcorrea

Additionally, the max_open_conn property seems somewhat meaningless in this context, since with sequential execution, there is never more than one connection used at a time. This defeats the purpose of tuning connection pool limits for performance.

It's also worth noting that max_open_conn was mentioned as part of a fix in a related issue: open-telemetry/opentelemetry-collector-contrib#39270 — however, it appears that the underlying sequential execution behavior still limits its effectiveness.

andrzej-stencel

andrzej-stencel commented on Jun 3, 2025

@andrzej-stencel
Member

That's correct, the scrapers are currently run sequentially for both logs and metrics. I agree parallel behavior would be desirable in certain circumstances, though not necessarily in all of them.

I'm in favor of 2. Make the controller programmatically configurable to run scrapers in parallel or sequentially. This way we can keep the current sequential behavior of existing scrapers and change the scrapers we wish to parallel, and/or create new scrapers with the behavior best for the scenario.

On the programmatic level, I'd rather either have no default and make it mandatory to choose parallel or sequential behavior, or have sequential as the default. Perhaps this could be a new ControllerOption like WithParallel?

dehaansa

dehaansa commented on Jun 6, 2025

@dehaansa
Author

I put together a POC here if anyone would like to review an implementation of 2 with serialized as default, going to evaluate in contrib tomorrow.

josepcorrea

josepcorrea commented on Jun 9, 2025

@josepcorrea

@dehaansa : Additionally, beyond enabling parallel execution of the scrapers, it would be highly beneficial to allow individual configuration of collection_interval per query in receivers like sqlqueryreceiver.

Right now, all queries share the same collection_interval, which forces a compromise between fast and slow queries. Being able to define different intervals per query would significantly improve efficiency and allow for more accurate and timely metrics, especially in setups where some queries are lightweight and others are heavy and long-running.

This enhancement would complement parallel execution nicely and provide greater flexibility for complex monitoring scenarios.

andrzej-stencel

andrzej-stencel commented on Jun 9, 2025

@andrzej-stencel
Member

@dehaansa : Additionally, beyond enabling parallel execution of the scrapers, it would be highly beneficial to allow individual configuration of collection_interval per query in receivers like sqlqueryreceiver.

@josepcorrea Would it make sense to create a separate issue to track this?

josepcorrea

josepcorrea commented on Jun 11, 2025

@josepcorrea

Thanks @andrzej-stencel — I’ve gone ahead and created a separate issue to track this enhancement: #13190.
This change would help optimize resource use and metric accuracy.

josepcorrea

josepcorrea commented on Jun 11, 2025

@josepcorrea

Note: I initially created the issue in the wrong repository. The correct one is now here: [otelcol-contrib#40630]. Sorry for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @dehaansa@andrzej-stencel@josepcorrea

      Issue actions

        [scraperhelper] Can't run scrapers in parallel · Issue #13113 · open-telemetry/opentelemetry-collector