Skip to content

Tidy up construction of the Guice Injectors#12816

Merged
gianm merged 4 commits intoapache:masterfrom
paul-rogers:220723-guice
Aug 4, 2022
Merged

Tidy up construction of the Guice Injectors#12816
gianm merged 4 commits intoapache:masterfrom
paul-rogers:220723-guice

Conversation

@paul-rogers
Copy link
Contributor

@paul-rogers paul-rogers commented Jul 24, 2022

See Issue #12815 for an overview of the goal of this PR. This PR provides the basic refactoring to allow a later PR to restructure the Calcite test setup which will in turn simply the proposed Druid planner test framework. This PR also lays the groundwork to simplify the complex, ad-hoc code which the "new ITs" use to set up tests.

Refactors Guice Injector Construction

See the above issue for an overview.

ExtensionsLoader

The Initialization class currently performa a kitchen-sink set of tasks, one of which is to load extensions. That code was pulled out into a separate class: ExtensionsLoader. The previous code used global maps to hold extension information, and cobbled together a way to pass a config object into static Initialization methods.

The new approach is that ExtensionsLoader is Guice-enabled: it is created by the startup injector and passed to the primary injector. Guice, then, manages the singleton for this class. Clients of the code now obtain an ExtensionsLoader from Guice.

InitializationTest cases related to extensions moved into a new ExtensionsLoaderTest class.

Hadoop Initialization

The Initialization class also holds a method, getHadoopDependencyFilesToLoad(), used only by Hadoop indexing tasks. This code moved to a new Initialization class in the indexing module. The test cases moved to a new test class.

Initialization Deprecation

The Initialization class is now virtually empty:

  • Hadoop-related code moved as described above.
  • Extension-related code moved as described above.
  • Module management moved into injector builders as described in Issue #12815.

The only functionality left in Initialization is makeInjectorWithModules(), now marked as deprecated, to build an injector. However, this method adds all the server dependencies, which is not what is wanted for tests. Over time, tests will be refactored to build an injector with just the modules needed, and without server baggage. At that time, we'll drop the makeInjectorWithModules() method and thus the Initialization class itself.

GuiceInjectors Deprecation

Similarly, the GuiceInjectors class builds the startup injector. As tests shift to use the builders, GuiceInjectors can disappear.

CalciteTestsInjectorBuilder

The "Calcite tests" has an elaborate set of ad-hoc code in CalciteTests to hand-wire a set of mock objects. The planner test PR struggled to refactor the code to allow more flexibility. A key motivation of this clean-up is to provide a framework that uses Guice to construct the Calcite test environment.

In this PR, we define a CalciteTestsInjectorBuilder class to build the injector that was previously built in a static initializer. This class is just a proof-of-concept at present: it is useful by itself. The real goal is to gradually move the ad-hoc code that creates planner-related object in Calcite tests to use this new builder (or a variation.)


This PR has:

  • been self-reviewed.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@paul-rogers paul-rogers marked this pull request as draft July 24, 2022 02:19
@lgtm-com
Copy link

lgtm-com bot commented Jul 24, 2022

This pull request introduces 2 alerts when merging 7992d8a into 5772dfd - view on LGTM.com

new alerts:

  • 2 for Spurious Javadoc @param tags

Builders for various module collections
Revise the extensions loader
Injector builders for server startup
Move Hadoop init to indexer
Clean up server node role filtering
Calcite test injector builder
@paul-rogers paul-rogers marked this pull request as ready for review July 30, 2022 19:32
Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nicer way of doing things.

static {
EXTENSIONS_CONFIG = INJECTOR.getInstance(ExtensionsConfig.class);
}
// Note: static variables mean that this task cannot run in a shared JVM,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because ExtensionsLoader isn't concurrency-safe?

As far as I can tell, the prior code was OK to use in a shared JVM: it was an Injector (which I think is concurrency-safe?) and and ExtensionsConfig (immutable; definitely fine).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is more basic: static variables. If task A and task B both run in the Indexer, then they share a single copy of the static variable (unless the Indexer loads them in isolated class loaders.) By contrast, the ExtensionsLoader was converted from static variables to a Guice-injected singleton to avoid this issue. For ExtensionsLoader the only time two instances will appear in the same JVM is in tests, which used to have special code to undo the static variables between tests.

The HadoopTask approach seems more like a bug than a feature. A good enhancement would be to modify the HadoopTask to use Guice (and the ExtensionsLoader, perhaps modified for the Hadoop dependencies) to avoid the static variable issue.

Copy link
Contributor

@gianm gianm Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the HadoopTask is done in a weird way. But I was asking if anything bad happens due to sharing the statics between different tasks. The comment makes it sound like something will break, but I can't tell what it is. If nothing will break, it'd be better to get rid of (or reword) the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking more closely, you may be right that this particular use is OK. There will be a single injector and set of extensions used by multiple tasks, which should be OK. Removed the comment.

}

@Test
@Ignore("assumes record rates, to be fixed in separate PR")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue here? I don't understand the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, cryptic. This test failed repeatedly in this PR. Discussions with other engineers revealed that the test itself is flawed: it assumes the test duration. Someone else is fixing that. This temporary hack exists just so that this PR will build. I'll check if the required fix has been submitted yet, and if so, will remove this annotation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Is there a GitHub issue for the flaky test? It'd be good to link that here so the comment makes more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Sounds like #12852 will remove this test, so noted that here.

@paul-rogers
Copy link
Contributor Author

@gianm, thanks much for the review! Address your comments. You pointed out a couple of tender spots in areas I avoided changing. Please take a look at the responses so we can figure out if we need to do anything more.

The suggestion for the default implementation of getJacksonModules() was good. So good, in fact, I went wild and removed all the now-redundant stub implementations. Let's see what IntelliJ inspections thinks of my handiwork before doing another review.

@gianm
Copy link
Contributor

gianm commented Aug 2, 2022

@gianm, thanks much for the review! Address your comments. You pointed out a couple of tender spots in areas I avoided changing. Please take a look at the responses so we can figure out if we need to do anything more.

I just wrote a few replies. When you have a chance to look at these I think this patch is good to go for me.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@gianm gianm merged commit a618458 into apache:master Aug 4, 2022
@abhishekagarwal87 abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants