Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch active task payloads from memory #15377

Merged

Conversation

AmatyaAvadhanula
Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula commented Nov 15, 2023

The TaskQueue maintains a map of active task ids to tasks, which can be utilized to get active task payloads, before falling back to the metadata store.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@AmatyaAvadhanula AmatyaAvadhanula changed the title Fetch active task paylaods from memory Fetch active task payloads from memory Nov 15, 2023
}

@Nullable
Task getActiveTask(String taskId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Nullable, this can return an Optional.

@@ -104,7 +104,14 @@ public List<TaskStatusPlus> getTaskStatusPlusList(

public Optional<Task> getTask(final String taskid)
{
return storage.getTask(taskid);
// Try to fetch active task from memory
final Task activeTask = taskLockbox.getActiveTask(taskid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this should always be called under the giant lock no ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've ensured that the call happens within a lock in the new TaskQueue method

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice to have but I am not sure if it is okay to keep the payload of all active tasks in memory. This could significantly increase the memory usage of the Overlord e.g. in case of many concurrent streaming tasks.

@AmatyaAvadhanula
Copy link
Contributor Author

@kfaraz thank you for your feedback. Instead of maintaining a separate map which could increase the memory usage of the Overlord, this patch now utilizes the map of active tasks that already exists in the TaskQueue

@AmatyaAvadhanula AmatyaAvadhanula marked this pull request as ready for review November 15, 2023 08:39
@@ -947,6 +947,25 @@ public CoordinatorRunStats getQueueStats()
return stats;
}

public Optional<Task> getTask(String id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better for this method to return only active tasks, i.e. it shouldn't have to query the underlying storage on behalf of the caller. As such, this method should be renamed to getActiveTask

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@abhishekagarwal87 abhishekagarwal87 merged commit 77828be into apache:master Nov 17, 2023
83 checks passed
writer-jill pushed a commit to writer-jill/druid that referenced this pull request Nov 20, 2023
The TaskQueue maintains a map of active task ids to tasks, which can be utilized to get active task payloads, before falling back to the metadata store.
@kfaraz kfaraz deleted the fetch_active_task_from_memory branch November 24, 2023 07:54
yashdeep97 pushed a commit to yashdeep97/druid that referenced this pull request Dec 1, 2023
The TaskQueue maintains a map of active task ids to tasks, which can be utilized to get active task payloads, before falling back to the metadata store.
@LakshSingla LakshSingla added this to the 29.0.0 milestone Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants