Skip to content

[GOBBLIN-2013] update guice initialization for 'DagProcEngine enabled' and related classes #3892

Merged
arjun4084346 merged 4 commits intoapache:masterfrom
arjun4084346:testdag
Mar 12, 2024
Merged

[GOBBLIN-2013] update guice initialization for 'DagProcEngine enabled' and related classes #3892
arjun4084346 merged 4 commits intoapache:masterfrom
arjun4084346:testdag

Conversation

@arjun4084346
Copy link
Copy Markdown
Contributor

@arjun4084346 arjun4084346 commented Mar 9, 2024

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    updated tests

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@arjun4084346 arjun4084346 force-pushed the testdag branch 3 times, most recently from c51c82d to 5d51351 Compare March 10, 2024 01:31
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 10, 2024

Codecov Report

Attention: Patch coverage is 85.36585% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 46.55%. Comparing base (02eca4f) to head (6fc087e).

Files Patch % Lines
...ervice/modules/core/GobblinServiceGuiceModule.java 69.23% 4 Missing ⚠️
...modules/utils/FlowCompilationValidationHelper.java 86.66% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3892      +/-   ##
============================================
+ Coverage     46.53%   46.55%   +0.02%     
- Complexity    11149    11158       +9     
============================================
  Files          2231     2231              
  Lines         87881    87911      +30     
  Branches       9632     9633       +1     
============================================
+ Hits          40893    40929      +36     
+ Misses        43288    43287       -1     
+ Partials       3700     3695       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@arjun4084346 arjun4084346 force-pushed the testdag branch 2 times, most recently from 8ce2784 to 2505b62 Compare March 11, 2024 05:23
@arjun4084346 arjun4084346 changed the title update guice module [GOBBLIN-2013] update guice module Mar 11, 2024
metricContext.register(this.totalAddSpecTimeNanos);
metricContext.register(this.numJobsScheduledDuringStartup);
}
this.dagProcessingEngineEnabled = ConfigUtils.getBoolean(config, ServiceConfigKeys.DAG_PROCESSING_ENGINE_ENABLED, false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can also add this as an injected param. up to u (look at line 183)

Comment on lines +520 to +530
if (this.dagProcessingEngineEnabled) {
Config flowConfig = flowSpec.getConfig();
String flowGroup = flowConfig.getString(ConfigurationKeys.FLOW_GROUP_KEY);
String flowName = flowConfig.getString(ConfigurationKeys.FLOW_NAME_KEY);
String flowExecutionId = String.valueOf(FlowUtils.getOrCreateFlowExecutionId(flowSpec));
DagActionStore.DagAction dagAction =
new DagActionStore.DagAction(flowGroup, flowName, flowExecutionId, DagActionStore.FlowActionType.LAUNCH);
this.dagManagement.addDagAction(dagAction);
} else {
this.orchestrator.orchestrate(flowSpec, jobProps, Long.parseLong(triggerTimestampMillis), isReminderEvent);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct place to add dag action to the stream. Orchestrate is called in response to scheduler and is where scheduler lease arbitration will happen. As a result of it, the launch dagAction will be committed to dagActionStore -> read by changeMonitor -> then forwarded to the dagManager. It's at the last step that we should add the launch task to the stream (via dagManagement) instead of pass to old dagManager. This change should be in DagProcEnabledDagActionStoreChangeMonitor

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wherever your enable this let's add a test

Orchestrator orchestrator, SchedulerService schedulerService, Optional<UserQuotaManager> quotaManager, Optional<Logger> log,
@Named(InjectionNames.WARM_STANDBY_ENABLED) boolean isWarmStandbyEnabled,
Optional<FlowTriggerHandler> flowTriggerHandler) throws Exception {
Optional<FlowTriggerHandler> flowTriggerHandler, DagManagement dagManagement,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also clearer into to have an optional DagManagement object that is present only when the engine is enabled but it's open to debate

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I actually wanted to create those objects regardless of the enabled flag so at least the object creation can be tested in prod.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd whole-heartedly urge us to ONLY create required instances. otherwise a not-yet-ready-for-prod instance that fails creation can bring down the entire application. best practice is to guard significant new/reworked functionality behind config, and this includes initialization

Copy link
Copy Markdown
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused: how does this PR relate to #3893 (review) ?

there seems to be some, but not all overlap. if two PRs relate to one another, and have any overlap at all, please have at least one mention the other in the PR description to clarify their relationship.

separately, but also critical: this PR title really brings very little clarity and seems insufficient (eventually) for a commit message

Orchestrator orchestrator, SchedulerService schedulerService, Optional<UserQuotaManager> quotaManager, Optional<Logger> log,
@Named(InjectionNames.WARM_STANDBY_ENABLED) boolean isWarmStandbyEnabled,
Optional<FlowTriggerHandler> flowTriggerHandler) throws Exception {
Optional<FlowTriggerHandler> flowTriggerHandler, DagManagement dagManagement,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd whole-heartedly urge us to ONLY create required instances. otherwise a not-yet-ready-for-prod instance that fails creation can bring down the entire application. best practice is to guard significant new/reworked functionality behind config, and this includes initialization

protected final Optional<FlowTriggerHandler> flowTriggerHandler;
@Getter
protected final Map<String, Spec> scheduledFlowSpecs;
protected final Map<String, FlowSpec> scheduledFlowSpecs;
Copy link
Copy Markdown
Contributor

@phet phet Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NBD, this may be for the best... but I can't seem to figure out: where is the initial motivation coming in? e.g. I was looking for a changed method signature, but didn't notice one. is it entirely a preference to change the type of these various members?

(again, I'm not calling into question the decision to change, more wanting to understand motivation for the decision.)

Copy link
Copy Markdown
Contributor Author

@arjun4084346 arjun4084346 Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the initial commit was too big and some improvements were missed out

Copy link
Copy Markdown
Contributor

@phet phet Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah, ok, I see - so essentially just following more widely the specific-derived-class (FlowSpec) typing instituted therein.

(I like the increased clarity)

@arjun4084346 arjun4084346 changed the title [GOBBLIN-2013] update guice module [GOBBLIN-2013] create instances through guice for new classes created in PR #3858 Mar 12, 2024
Copy link
Copy Markdown
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a much better PR title... but the context behind the change still may not be readily apparent for anyone reading such a commit message in 2-3 months.

how about: "update guice initialization for 'DagProcEngine enabled'"?

@Getter
private final SharedFlowMetricsSingleton sharedFlowMetricsSingleton;

private final ClassAliasResolver<SpecCompiler> aliasResolver;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call to rescope as method level!

Comment on lines +48 to +50
ContextAwareGauge<Long> orchestrationDelayMetric = metricContext.newContextAwareGauge
(ServiceMetricNames.FLOW_ORCHESTRATION_DELAY, orchestrationDelayCounter::get);
this.metricContext.register(orchestrationDelayMetric);
metricContext.register(orchestrationDelayMetric);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change how it's referenced... isn't metricContext still an instance member?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it is not, it is static

Comment on lines +76 to +78
String specCompilerClassName = ServiceConfigKeys.DEFAULT_GOBBLIN_SERVICE_FLOWCOMPILER_CLASS;
if (config.hasPath(ServiceConfigKeys.GOBBLIN_SERVICE_FLOWCOMPILER_CLASS_KEY)) {
specCompilerClassName = config.getString(ServiceConfigKeys.GOBBLIN_SERVICE_FLOWCOMPILER_CLASS_KEY);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there a method in ConfigUtils (or similar utility) that we could leverage to provide such fallback to a default?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about this one:

String ConfigUtils::getString(Config config, String path, String def)

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i misread your ask.

protected final Optional<FlowTriggerHandler> flowTriggerHandler;
@Getter
protected final Map<String, Spec> scheduledFlowSpecs;
protected final Map<String, FlowSpec> scheduledFlowSpecs;
Copy link
Copy Markdown
Contributor

@phet phet Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah, ok, I see - so essentially just following more widely the specific-derived-class (FlowSpec) typing instituted therein.

(I like the increased clarity)

@arjun4084346 arjun4084346 changed the title [GOBBLIN-2013] create instances through guice for new classes created in PR #3858 [GOBBLIN-2013] update guice initialization for 'DagProcEngine enabled' and related classes Mar 12, 2024
Copy link
Copy Markdown
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work turning this one around in such short order!

@arjun4084346 arjun4084346 merged commit 2217205 into apache:master Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants