STORM-3335 allow timing out when scheduling a topology #2957

agresch · 2019-02-08T21:57:57Z

This is based on a couple of internal PRs @govind-menon committed to our older version of storm.

govind-menon

+1

srdo

Looks pretty good. Left a few comments.

srdo · 2019-02-09T08:18:15Z

storm-server/src/main/java/org/apache/storm/scheduler/resource/ResourceAwareScheduler.java

@@ -78,6 +85,9 @@ public void prepare(Map<String, Object> conf) {
        configLoader = ConfigLoaderFactoryService.createConfigLoader(conf);
        maxSchedulingAttempts = ObjectReader.getInt(
            conf.get(DaemonConfig.RESOURCE_AWARE_SCHEDULER_MAX_TOPOLOGY_SCHEDULING_ATTEMPTS), 5);
+        schedulingTimeoutSeconds = ObjectReader.getInt(
+                conf.get(DaemonConfig.SCHEDULING_TIMEOUT_SECONDS_PER_TOPOLOGY), 60);
+        backgroundScheduling = Executors.newFixedThreadPool(1);


I think we need to add a method that allows cleanup of IScheduler, similar to the cleanup method we have on IBolt. This executor needs to be shut down, otherwise the thread will leak if this class gets used in tests.

srdo · 2019-02-09T08:28:05Z

storm-server/src/main/java/org/apache/storm/scheduler/resource/ResourceAwareScheduler.java

+                            + td.getId() + " using strategy " + rasStrategy.getClass().getName() + " timeout after "
+                            + schedulingTimeoutSeconds + " seconds.");
+                    schedulingFuture.cancel(true);
+                    rasStrategy.stop();


Consider dropping the stop method and make the scheduler check for interrupts instead. You're already sending an interrupt to it via the cancel call above.

srdo · 2019-02-09T10:37:28Z

storm-server/src/main/java/org/apache/storm/scheduler/resource/ResourceAwareScheduler.java

+                try {
+                    result = schedulingFuture.get(schedulingTimeoutSeconds, TimeUnit.SECONDS);
+                } catch (TimeoutException te) {
+                    markFailedTopology(topologySubmitter, cluster, td, "Scheduling took too long for "


Nit: Consider referencing the config parameter users should change if they want to raise the timeout, so it is obvious what users can do when they get this message.

agresch · 2019-02-11T21:35:44Z

@srdo - I made the changes you suggested.

danny0405 · 2019-02-14T06:12:25Z

storm-server/src/main/java/org/apache/storm/scheduler/DefaultScheduler.java

@@ -101,6 +101,10 @@ public void prepare(Map<String, Object> conf) {
        //noop
    }

+    @Override
+    public void cleanup() {
+    }


Most of the cleanup() method seems a no-op, can we define a default implementation of it ?

@danny0405 - Updated. Please take a look.

srdo · 2019-02-14T14:55:57Z

@agresch Thanks, it looks great. I'm wondering if we can put the setup and cleanup in @Before/@After for many of the tests? Ideally we'd avoid leaking threads even when the tests fail, so cleanup should be either in an @After or in try-finally.

kishorvpatil

👍

agresch · 2019-02-18T22:19:04Z

somehow missed some test files, more changes to come...

agresch · 2019-02-19T22:08:29Z

@srdo - please look at the latest couple commits for the test cleanup.

srdo · 2019-02-20T08:05:31Z

+1, thanks for addressing comments.

danny0405

+1, LGTM.

Ethanlm · 2019-02-20T14:50:55Z

storm-server/src/test/java/org/apache/storm/scheduler/blacklist/TestBlacklistScheduler.java

@@ -74,6 +85,7 @@ public void TestBadSupervisor() {
        Cluster cluster = new Cluster(iNimbus, resourceMetrics, supMap, new HashMap<>(), topologies, config);
        BlacklistScheduler bs = new BlacklistScheduler(new DefaultScheduler(), metricsRegistry);
        bs.prepare(config);
+        scheduler = bs;


I think it doesn't matter. But we could just use scheduler = new BlacklistScheduler(new DefaultScheduler(), metricsRegistry); directly

Ethanlm · 2019-02-20T14:52:20Z

storm-server/src/test/java/org/apache/storm/scheduler/resource/TestResourceAwareScheduler.java

@@ -252,6 +270,7 @@ public void testTopologySetCpuAndMemLoad() {
        TopologyDetails topology1 = new TopologyDetails("topology1", config, stormTopology1, 0, executorMap1, 0, "user");

        ResourceAwareScheduler rs = new ResourceAwareScheduler();
+        scheduler = rs;


Same as above. Could we use scheduler = new ResourceAwareScheduler(); directly?

Ethanlm · 2019-02-20T14:55:06Z

storm-server/src/test/java/org/apache/storm/scheduler/resource/TestResourceAwareScheduler.java

@@ -723,7 +769,7 @@ public void testSubmitUsersWithNoGuarantees() {
        ResourceAwareScheduler rs = new ResourceAwareScheduler();
        rs.prepare(config);
        rs.schedule(topologies, cluster);
-
+        scheduler = rs;


Same as above. scheduler = new ResourceAwareScheduler(); directly?

...java/org/apache/storm/scheduler/resource/strategies/scheduling/ConstraintSolverStrategy.java

agresch · 2019-02-20T16:44:29Z

@Ethanlm - please check again. Thanks.

srdo · 2019-02-20T19:16:15Z

@agresch Thank you for your patience. Please squash to one commit, and I'll merge.

agresch · 2019-02-20T20:23:07Z

@srdo - squashed

govind-menon approved these changes Feb 8, 2019

View reviewed changes

srdo reviewed Feb 9, 2019

View reviewed changes

danny0405 reviewed Feb 14, 2019

View reviewed changes

kishorvpatil approved these changes Feb 14, 2019

View reviewed changes

danny0405 approved these changes Feb 20, 2019

View reviewed changes

Ethanlm reviewed Feb 20, 2019

View reviewed changes

Ethanlm approved these changes Feb 20, 2019

View reviewed changes

STORM-3335 allow timing out when scheduling a topology

435e57b

agresch force-pushed the agresch_STORM-3335 branch from b59d009 to 435e57b Compare February 20, 2019 20:22

asfgit merged commit 435e57b into apache:master Feb 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STORM-3335 allow timing out when scheduling a topology #2957

STORM-3335 allow timing out when scheduling a topology #2957

agresch commented Feb 8, 2019

govind-menon left a comment

srdo left a comment

srdo Feb 9, 2019

srdo Feb 9, 2019

srdo Feb 9, 2019

agresch commented Feb 11, 2019

danny0405 Feb 14, 2019

agresch Feb 19, 2019

srdo commented Feb 14, 2019

kishorvpatil left a comment

agresch commented Feb 18, 2019

agresch commented Feb 19, 2019

srdo commented Feb 20, 2019

danny0405 left a comment

Ethanlm Feb 20, 2019

Ethanlm Feb 20, 2019

Ethanlm Feb 20, 2019 •

edited

Loading

agresch commented Feb 20, 2019

srdo commented Feb 20, 2019

agresch commented Feb 20, 2019

STORM-3335 allow timing out when scheduling a topology #2957

STORM-3335 allow timing out when scheduling a topology #2957

Conversation

agresch commented Feb 8, 2019

govind-menon left a comment

Choose a reason for hiding this comment

srdo left a comment

Choose a reason for hiding this comment

srdo Feb 9, 2019

Choose a reason for hiding this comment

srdo Feb 9, 2019

Choose a reason for hiding this comment

srdo Feb 9, 2019

Choose a reason for hiding this comment

agresch commented Feb 11, 2019

danny0405 Feb 14, 2019

Choose a reason for hiding this comment

agresch Feb 19, 2019

Choose a reason for hiding this comment

srdo commented Feb 14, 2019

kishorvpatil left a comment

Choose a reason for hiding this comment

agresch commented Feb 18, 2019

agresch commented Feb 19, 2019

srdo commented Feb 20, 2019

danny0405 left a comment

Choose a reason for hiding this comment

Ethanlm Feb 20, 2019

Choose a reason for hiding this comment

Ethanlm Feb 20, 2019

Choose a reason for hiding this comment

Ethanlm Feb 20, 2019 • edited Loading

Choose a reason for hiding this comment

agresch commented Feb 20, 2019

srdo commented Feb 20, 2019

agresch commented Feb 20, 2019

Ethanlm Feb 20, 2019 •

edited

Loading