Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetricsModule: inject DataSourceTaskIdHolder early #16140

Merged
merged 6 commits into from
Mar 21, 2024

Conversation

arunramani
Copy link
Contributor

Description

This fixes an issue where MSQ ingestion failed if the first monitor in the list is ServiceStatusMonitor. Before this change, setting the ServiceStatusMonitor as the first monitor will fail with errors in CliPeon like below

1) Error in custom provider, java.lang.RuntimeException: com.fasterxml.jackson.databind.exc.ValueInstantiationException: Cannot construct instance of `org.apache.druid.segment.virtual.ExpressionVirtualColumn`, problem: Unable to provision, see the following errors:

1) Problem parsing object at prefix[druid.lookup]: Cannot construct instance of `org.apache.druid.query.lookup.LookupListeningAnnouncerConfig`, problem: Unable to provision, see the following errors:

1) Error in custom provider, java.lang.IllegalStateException: This is a proxy used to support circular references. The object we're proxying is not constructed yet. Please wait until after injection has completed to use this object.
  at org.apache.druid.cli.CliPeon$1.getDataSourceFromTask(CliPeon.java:344) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> org.apache.druid.cli.CliPeon$1)
  at org.apache.druid.cli.CliPeon$1.getDataSourceFromTask(CliPeon.java:344) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> org.apache.druid.cli.CliPeon$1)
  while locating java.lang.String annotated with @com.google.inject.name.Named(value="druidDataSource")
    for field at org.apache.druid.server.metrics.DataSourceTaskIdHolder.dataSource(DataSourceTaskIdHolder.java:29)
  at org.apache.druid.server.metrics.MetricsModule.configure(MetricsModule.java:92) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> org.apache.druid.server.metrics.MetricsModule)
  while locating org.apache.druid.server.metrics.DataSourceTaskIdHolder
Caused by: java.lang.IllegalStateException: This is a proxy used to support circular references. The object we're proxying is not constructed yet. Please wait until after injection has completed to use this object.
	at com.google.common.base.Preconditions.checkState(Preconditions.java:512)
	at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:36)
	at com.sun.proxy.$Proxy103.getDataSource(Unknown Source)
	at org.apache.druid.cli.CliPeon$1.getDataSourceFromTask(CliPeon.java:344)
	at org.apache.druid.cli.CliPeon$1$$FastClassByGuice$$1ae344b1.invoke(<generated>)
	at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:264)
	at com.google.inject.internal.ProviderMethod$Factory.provision(ProviderMethod.java:401)
	at com.google.inject.internal.ProviderMethod$Factory.get(ProviderMethod.java:376)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)

The problem is happening when the task file is deserialized in CliPeon.readTask. This is the sequence of events

  • CliPeon.readTask uses Jackson to deserialize the task
  • Jackson uses guice to find some injectable values. One of these is DataSourceTaskIdHolder
  • To inject that value, guice decides to use the provider annotated by CliPeon.getDataSourceFromTask and CliPeon.getTaskIDFromTask.

This creates a circular ref since readTask is the provider for Task and CliPeon.getDataSourceFromTask and CliPeon.getTaskIDFromTask use Task as their inputs.

However if you set any other monitor first such as JvmMonitor, then the injector doesn’t use the providers from CliPeon. Jackson is able to bind both datasource and task ID from the task file itself.

For now triggering the binding of DataSourceTaskIdHolder before any of the monitors are loaded resolves the issue but this does not seem like the right fix. I have not been able to figure out what the correct fix is but I suspect it's a lot deeper than this one-liner.

Key changed/added classes in this PR
  • MetricsModule

This PR has:

  • been self-reviewed.
  • a release note entry in the PR description.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.

@adithyachakilam
Copy link
Contributor

@arunramani Is it possible to add a test case of any sort for this ?

@arunramani
Copy link
Contributor Author

@arunramani Is it possible to add a test case of any sort for this ?

I don't think there's any easy way to unit test this specifically. However the existing unit tests cover the injection working correctly.

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to fix a real problem and I can't think of a better way to fix it.

Thank you for the comment about this being a hack. Hopefully someone will come across it and suggest a better way to untangle this.

@suneet-s suneet-s merged commit c72e69a into apache:master Mar 21, 2024
85 checks passed
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants