Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CliIndexer process type and initial task runner implementation #8107

Merged
merged 29 commits into from
Jul 30, 2019

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Jul 18, 2019

First part of implementation for #7900

Description

This PR adds the initial foundation for the work in #7900:

  • A new Indexer process type is added, running on port 8091 like MiddleManagers.
  • This process type is like MM but uses a new ThreadingTaskRunner, which runs Tasks using a thread pool instead of forking processes
  • A new interface, AppenderatorsManager, which creates and manages Appenderators for tasks, has been added. The AppenderatorsManager for the new Indexer process uses a shared query execution pool and per-datasource segment walkers/imelines for the Appenderators it creates.
  • The TaskReportWriter has been adjusted to support managing multiple tasks
  • The new Indexer process serves requests on a single port, but divides the HTTP connection pool into 3 parts: one pool of connections for chat handler requests, one pool of connections for non chat handler requests, and 2 reserved connections for lookup listening. Each of the first two pools is sized to druid.server.http.numThreads. The distinct connection pooling is handled through Jetty QoS filters.
  • Lookups are loaded once for the entire Indexer process

No additional memory management for the tasks is present in this PR, they will use whatever has been configured in their task specs with no additional external control.

The new process is not currently exposed in the docs, as this is an intermediate PR and the process is not ready for use yet.

This PR has:

  • been self-reviewed.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths.
  • been tested in a test Druid cluster.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a first pass, also pulled the branch and fired one of these things up, and (after setting druid.service) it worked for me! 🤘

@@ -158,6 +158,8 @@ public void stop() throws Exception

synchronized (lock) {
try {
// When stopping, the task status should not be communicated to the overlord, so the listener and exec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding comment 👍

@Override
public void removeAppenderatorForTask(String taskId)
{

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a // nothing to remove or something so this doesn't look so sad?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment

binder.bind(DruidNode.class).annotatedWith(RemoteChatHandler.class).to(Key.get(DruidNode.class, Self.class));
if (useSeparatePort) {
// bind a modified DruidNode that will be used by the Jetty server installed below
binder.bind(DruidNode.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up being unfortunate in that it requires that druid.service be set since the 'default' value is not available on the binding at the point that this runs, however I'm not sure how to better fix it 😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to internally use a QoS filter with jetty to leave some room for chat handler traffic instead relying on a separate server with a separate pool? I don't think this would require bubbling up to the user configs, just something we compute based on the top level configs (though it might require some minimum).

I think it would simplify things both conceptually and for the implementation, and would make this no longer an issue I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I redid this to use QoS filters, the configured number of connections are doubled (one set for chat handler, the other for non-chat handler requests), with 2 reserved connections for lookups

*/
@Command(
name = "indexer",
description = "Runs an Indexer. Description TBD."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add a description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description

.addBinding("local")
.to(LocalTaskActionClientFactory.class)
.in(LazySingleton.class);
// all of these bindings are so that we can run the peon in local mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't quite accurate, this should probably refer to indexer. Additionally, there are lot of bindings getting wired up shared between this and peon, with this entire block looking like a straight copy and paste for one, is it possible to restructure stuff so this can be shared between them?

Copy link
Contributor Author

@jon-wei jon-wei Jul 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved some of the shared binding blocks into methods on CliPeon/CliMiddleManager for now

{
private Appenderator realtimeAppenderator;

public TestAppenderatorsManager()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this constructor need to exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unnecessary constructor

@@ -386,12 +392,14 @@ public TaskStatus run(final TaskToolbox toolbox)

toolbox.getMonitorScheduler().removeMonitor(metricsMonitor);

toolbox.getDataSegmentServerAnnouncer().unannounce();
toolbox.getDruidNodeAnnouncer().unannounce(discoveryDruidNode);
if (appenderatorsManager.shouldTaskMakeNodeAnnouncements()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just make the AppenderatorsManager handle task announcements entirely for tasks, and instead here just call methods on it signaling to do so or not? I guess sort of awkward with the way things are currently since the announcers are fetched from the toolbox

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had done that initially, but decided to just use a boolean since the CliIndexer implementation would've done nothing for those methods

*/
public class UnifiedIndexerAppenderatorsManager implements AppenderatorsManager
{
private static final Logger log = new Logger(UnifiedIndexerAppenderatorsManager.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this log isn't used, also I think we are supposed to make static variables all uppercase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused logger

private static final Logger log = new Logger(UnifiedIndexerAppenderatorsManager.class);

private final ConcurrentHashMap<String, SinkQuerySegmentWalker> datasourceSegmentWalkers = new ConcurrentHashMap<>();
private final ConcurrentHashMap<String, Appenderator> taskAppenderatorMap = new ConcurrentHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stuff is only put and removed from here, but never read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something that would've been expanded in follow on PRs, but I removed it for now

@ManageLifecycle
public class UnifiedIndexerLifecycleHandler
{
private static final EmittingLogger log = new EmittingLogger(UnifiedIndexerLifecycleHandler.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think static variables are supposed to be all caps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to all caps

throw new IllegalStateException("Task has been shut down!");
}

if (taskWorkItem == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't be null here since taskWorkItem.shutdown is already checked above. I guess this should be checked first before checking taskWorkItem.shutdown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, I fixed this in ForkingTaskRunner as well

import java.util.concurrent.Executor;

/**
* Base class for {@link ForkingTaskRunner} and {@link ThreadingTaskRunner} which support task restoration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

@Override
public Collection<TaskRunnerWorkItem> getRunningTasks()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason getRunningTasks, getPendingTasks, and getRunnerTaskState are not abstract instead of default explodey? BaseRestorableTaskRunner doesn't have any abstract other than calling the class that currently, seems like

  @Override
  public abstract Collection<TaskRunnerWorkItem> getRunningTasks();

  @Override
  public abstract Collection<TaskRunnerWorkItem> getPendingTasks();

  @Nullable
  @Override
  public abstract RunnerTaskState getRunnerTaskState(String taskId);

would be appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed

* Only one shutdown per-task can be running at a given time, and there is one control thread per task,
* thus the pool has 2 * worker slots.
*
* Note that separate task logs are not supported, all task log entries will be written to the Indexer process log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem like it would be very tricky to retain having individual task logs that get pushed to deep storage, involving some sort of log4j wizardry unless I'm missing something clever.

import java.util.concurrent.ExecutorService;

/**
* This interface defines entities that create and manage potentially multiple Appenderators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you add javadoc links to things like Appenderator and other types? I find them handy for easily navigating to things in intellij when I'm reading about how stuff works


import java.util.concurrent.ExecutorService;

public class PeonAppenderatorsManager implements AppenderatorsManager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add javadoc for this guy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added javadocs (also added javadoc links in AppenderatorsManager)

public static final int DEFAULT_GZIP_INFLATE_BUFFER_SIZE = 4096;

public ServerConfig(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done as @JsonCreator to get rid of the need for the empty constructor or is there something in this that doesn't come from json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty constructor is still used by some tests, I added a comment about how the new non-empty constructor is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that serde works fine with two constructors and without anyone marked @JsonCreator ... how does ObjectMapper decide which constructor to use for deserialization ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on the exact details of what ObjectMapper does to deserialize with those constructors, but I added a new ServerConfigSerdeTest that checks this behavior

@@ -80,7 +81,7 @@ public void configure(Binder binder)
JettyBindings.addQosFilter(
binder,
ListenerResource.BASE_PATH + "/" + LookupCoordinatorManager.LOOKUP_LISTEN_ANNOUNCE_KEY,
2 // 1 for "normal" operation and 1 for "emergency" or other
LOOKUP_LISTENER_QOS_MAX_REQUESTS // 1 for "normal" operation and 1 for "emergency" or other
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh neat, is this documented anywhere I wonder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't, I added a brief mention of it to the lookup docs

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@@ -132,6 +132,8 @@ The configuration is propagated to the query serving processes (Broker / Router
The query serving processes have an internal API for managing lookups on the process and those are used by the Coordinator.
The Coordinator periodically checks if any of the processes need to load/drop lookups and updates them appropriately.

Please note that only 2 simultaneous lookup configuration propagation requests can be concurrently handled by a single query serving process, to prevent lookup handling from consuming too many server HTTP connections.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: I think this is a very dense sentence, it might be friendlier to break up at the comma, maybe:

Please note that only 2 simultaneous lookup configuration propagation requests can be concurrently handled by a single query serving process. This is done to prevent lookup handling from consuming too many server HTTP connections.

Thanks for adding it to the docs though 👍

}
}

// Save running tasks to a file, so they can potentially be restored on next startup. Suppresses exceptions that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert to javadocs?

@clintropolis
Copy link
Member

I did some more testing with this on my laptop with a setup of 1 of each broker, router, coordinator, overlord, and 2 indexer and historicals

Screen Shot 2019-07-23 at 6 20 58 PM

Doing small scale some kafka indexing testing to make sure realtime queries and handoff were functioning

Screen Shot 2019-07-23 at 6 43 32 PM

Overall things are working nicely. I did run into an issue when trying to stop an indexer node (SIGTERM), I believe the issue lies with the order of lifecycle shutdown, in that the tasks are gracefully stopped after jetty is stopped. This causes the lifecycle stop on the indexer to hang during graceful task stop, because the task is waiting to hear from the overlord a message it will never be able to hear without a running jetty.

The supervisor on the overlord is then forever stuck in a loop, performing an action it can never complete because the indexer has stopped listening.

2019-07-24T01:58:21,268 INFO [KafkaSupervisor-wikipedia] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id='wikipedia', generationTime=2019-07-24T01:58:21.268Z, payload=KafkaSupervisorReportPayload{dataSource='wikipedia', topic='wikipedia', partitions=1, replicas=2, durationSeconds=600, active=[{id='index_kafka_wikipedia_ed020815fc3c3f4_bebmfiod', startTime=2019-07-24T01:55:50.422Z, remainingSeconds=449}, {id='index_kafka_wikipedia_ed020815fc3c3f4_cfcibngo', startTime=2019-07-24T01:55:50.525Z, remainingSeconds=449}], publishing=[], suspended=false, healthy=true, state=RUNNING, detailedState=RUNNING, recentErrors=[]}}
2019-07-24T01:58:36,268 INFO [IndexTaskClient-wikipedia-0] org.apache.druid.indexing.common.IndexTaskClient - submitRequest failed for [http://localhost:8092/druid/worker/v1/chat/index_kafka_wikipedia_ed020815fc3c3f4_bebmfiod/offsets/current], with message [Connection refused (Connection refused)]

The indexer eventually gives up after 5 minute timeout and ungracefully stops, but the supervisor/overlord appears to remain stuck until either the indexer comes back on the same host/port or the overlord is restarted. This also jams up what the web ui displays as the task status, where the task of the stuck indexer remains in the 'running' state until the same condition of the indexer returning or the overlord is restarting is met.

This issue aside, I'm still +1 on this if you'd rather fix this in a follow-up PR, since this is currently an undocumented feature anyway.

@jon-wei
Copy link
Contributor Author

jon-wei commented Jul 24, 2019

@clintropolis

I observed the Jetty server shutdown before task shutdown in a Peon too, that may be worth looking into as a separate issue.

The hanging you observed was due to SeekableStreamIndexTask not doing any shutdown actions if druid.indexer.task.restoreTasksOnRestart=false, so the indexer just waited for the graceful shutdown timeout before interrupting the task thread.

I've updated the patch to have the stopGracefully() methods on the tasks interrupt their run method

@@ -463,6 +463,32 @@ public void nodeViewInitialized()
}
);

DruidNodeDiscovery druidNodeDiscoveryIndexer = druidNodeDiscoveryProvider.getForNodeType(NodeType.INDEXER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of repeated code you can change line 441 to

DruidNodeDiscovery druidNodeDiscovery = druidNodeDiscoveryProvider.getForService(WorkerNodeService.DISCOVERY_SERVICE_KEY);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to use the suggested code.

It looks like WorkerNodeService.DISCOVERY_SERVICE_KEY was incorrectly mapped to NodeType.PEON in DruidNodeDiscoveryProvider previously, I also changed that to the MM and indexer types.

// Synchronizes start/stop of this object.
private final Object startStopLock = new Object();

private volatile boolean started = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a LifecycleLock utility class to manage lifecycle instead of startStopLock and started flag, that can be used here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted this to use LifecycleLock

* It is used by UnifiedIndexerAppenderatorsManager which allows queries on data associated with multiple
* Appenderators.
*/
AppenderatorImpl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can other constructor use this constructor instead of repeating everything ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted the constructors so one calls the other, the sink timeline handling in the constructor I added was also changed to accommodate that


/**
* Base class for {@link ForkingTaskRunner} and {@link ThreadingTaskRunner} which support task rest
* oration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we keep "restoration" as single word

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, fixed

public abstract Collection<TaskRunnerWorkItem> getPendingTasks();

/**
* The concrete implementation should override this, as the implemention depends on the specifics of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comments on these 3 methods have no value as those methods are already declared abstract. if we do, the comment should be about what is expected from implementors or no comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the comments

}

@Override
public void write(String taskId, Map<String, TaskReport> reports)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is weird that instance of this class will retain and manage a map of taskId -> file (because it inherits TaskReportFileWriter) but those would be ignored all time.
TaskReportWriter probably can be trimmed down a lot more, declared abstract/interface and have separate MultiFileTaskReportFileWriter wherever TaskReportWriter is currently used.

Copy link
Contributor Author

@jon-wei jon-wei Jul 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I redid these classes, there's now a TaskReportFileWriter interface with three implementations:

  • SingleFileTaskReportFileWriter
  • MultipleFileTaskReportFileWriter
  • NoopTestTaskReportFileWriter

* Only one shutdown per-task can be running at a given time, and there is one control thread per task,
* thus the pool has 2 * worker slots.
*
* Note that separate task logs are not supported, all task log entries will be written to the Indexer process log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a choice we are making intentionally to keep log in same place or no separation is done because we haven't figured it out yet ?
if we do end up keeping all logs in same place then it would be great if taskId is included in the log line printed by specific log, so as to be able to get some sense of a task by just grepping for its taskId.

task.getId(),
TaskStatus.running(task.getId())
);
taskStatus = taskStatusFuture.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we are running task in a separate thread while this thread is waiting for the task to finish anyway ?

if this is done for making calls to notifyXXX() then they could be done right before calling task.run(toolbox) if task was run in this thread itself ?

Copy link
Contributor Author

@jon-wei jon-wei Jul 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was partly for making the notify calls after the run started, and partly because I wanted to have a similar structure to ForkingTaskRunner where the control context and the task code were separated by thread/process, but it's not really necessary to do that.

I changed the task setup and run() to execute using a thread in taskExecutor, the control thread pool is only used to run externally triggered task shutdown now (and there is only 1 control thread per task slot now)

);

try {
final Closer closer = Closer.create();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't understand why we have this Closer object, nothing appears to be accumulated in it for closing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed this, it was a relic from ForkingTaskRunner


try {
final Closer closer = Closer.create();
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need two try blocks ... line 164 and 166 , could they be folded in to one ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collapsed into a single try


if (status == null) {
if (taskInfo.thread != null) {
taskInfo.thread.interrupt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that the interrupt status of the thread is not cleared by this task if it wasn't blocked waiting on something. in that case same thread with interrupt status set would pick a new task ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I see that you have also updated some of the tasks to capture their own threads on run and interrupt them inside stopGracefully(..) .... one of these appear redundant, no ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first point, I added a call to Thread.interrupted() at the very end of the outermost finally in the task callable.

For the second point, the interrupt here is meant as a "last resort" if the graceful shutdown didn't complete for some reason within the configured timeout, the interrupt within the tasks should normally end the shutdown before this is called

Copy link
Contributor

@himanshug himanshug Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong but it appears to me that whole business with the thread is unnecessary and you can achieve same by calling ThreadingTaskRunnerWorkItem.shutdownFuture.cance(true) and you wouldn't have to deal manually with interrupts. That would also remove task from executor if it hasn't started running already.

sidenote: I see that you are trying to simulate force stop of a task which is impossible to do in this context. If task is not cooperative, nothing is gonna work unfortunately. in the long run, this taskRunner requires adding another method to Task interface to tell task to force stop itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, since we are interrupting the thread from outside, doing same in the task implementations is not achieving anything extra and should probably be removed, that just introduces confusion with no benefit.... or maybe I am still missing something :)

return null;
} else {
if (workItem.getThread() == null) {
return RunnerTaskState.PENDING;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't it mean that task is finished as we set thread to null when task finishes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget if I had the thread set to null after task finishes in earlier revisions, but currently the thread doesn't get nulled out after task completion

taskStatus = task.run(toolbox);
}
catch (Throwable t) {
LOGGER.info(t, "Exception caught while running the task.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed

return taskStatus;
}
catch (Throwable t) {
LOGGER.info(t, "Exception caught during execution");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


if (status == null) {
if (taskInfo.thread != null) {
taskInfo.thread.interrupt();
Copy link
Contributor

@himanshug himanshug Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong but it appears to me that whole business with the thread is unnecessary and you can achieve same by calling ThreadingTaskRunnerWorkItem.shutdownFuture.cance(true) and you wouldn't have to deal manually with interrupts. That would also remove task from executor if it hasn't started running already.

sidenote: I see that you are trying to simulate force stop of a task which is impossible to do in this context. If task is not cooperative, nothing is gonna work unfortunately. in the long run, this taskRunner requires adding another method to Task interface to tell task to force stop itself.

@jon-wei
Copy link
Contributor Author

jon-wei commented Jul 29, 2019

@himanshug

I changed the interrupt to use future.cancel(true), and added a comment about how it's not possible to truly force a hard shutdown there (my intent in putting it there was for hypothetical cases where a task's stopGracefully takes too long or has a bug and doesn't result in a shutdown, but the task is still interruptible)

// Note that we can't truly force a hard termination of the task thread externally.
// In the future we may want to add a forceful shutdown method to the Task interface.
if (taskInfo.shutdownFuture != null) {
taskInfo.shutdownFuture.cancel(true);
Copy link
Contributor

@himanshug himanshug Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to cancel status future and not shutdownFuture.
also Future.get(..) throws TimeoutException instead of null, so this will not be called if task didn't finish within timeout
also, we can remove keeping the thread as "state"

anyways, I thought it would be easier to write code :) , can you merge jon-wei#4 which will add a commit making above changes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad, cancelled the wrong future.

Thanks for the patch!

ThreadingTaskRunner: cancel  task run future not shutdownFuture and remove thread from workitem
@himanshug himanshug merged commit 640b7af into apache:master Jul 30, 2019
@jon-wei
Copy link
Contributor Author

jon-wei commented Jul 30, 2019

@clintropolis @jihoonson @himanshug Thanks a lot for the review!

@clintropolis clintropolis added this to the 0.16.0 milestone Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants