(WIP) task search efficiency #932

Merged
merged 16 commits into from Mar 4, 2016

Conversation

Projects
None yet
4 participants
@ssalinas
Member

ssalinas commented Mar 2, 2016

Working on a few improvements:

  • Speed up getInactiveTaskIds by finding them directly instead of finding active tasks then removing from an array of all tasks. We make the same number of zk calls either way, might as well have the results we want from them.
  • batch together calls for getting SingularityTaskHistoryUpdates
  • Store SingularityTaskIdHistorys in the zk history path so we can access them more easily
    /cc @tpetr @Calvinp

@ssalinas ssalinas changed the title from (WIP) task search efficiency to (WIP) task search efficiency Mar 2, 2016

+ public void processResult(CuratorFramework client, CuratorEvent event) throws Exception {
+ if (event.getStat() == null) {
+ objects.add(pathsMap.get(event.getPath()));
+ latch.countDown();

This comment has been minimized.

@wsorenson

wsorenson Mar 3, 2016

Member

what's the point of these 2 statements? latch.countDown(); return; --> latch.countDown()

@wsorenson

wsorenson Mar 3, 2016

Member

what's the point of these 2 statements? latch.countDown(); return; --> latch.countDown()

This comment has been minimized.

@ssalinas

ssalinas Mar 3, 2016

Member

ah, yeah that could just be outside the if and without the return, will fix

@ssalinas

ssalinas Mar 3, 2016

Member

ah, yeah that could just be outside the if and without the return, will fix

This comment has been minimized.

@jhaber

jhaber Mar 3, 2016

Member

I may be paranoid but I usually do try { x; } finally { latch.countDown(); } just in case an exception is thrown the latch still gets decremented

@jhaber

jhaber Mar 3, 2016

Member

I may be paranoid but I usually do try { x; } finally { latch.countDown(); } just in case an exception is thrown the latch still gets decremented

This comment has been minimized.

@wsorenson

wsorenson Mar 3, 2016

Member

good point

@wsorenson

wsorenson Mar 3, 2016

Member

good point

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Mar 3, 2016

Member

@wsorenson - Consolidated some of the methods in CuratorAsync so we aren't repeating as much
@jhaber - put all the latch count downs in finally blocks as well

Member

ssalinas commented Mar 3, 2016

@wsorenson - Consolidated some of the methods in CuratorAsync so we aren't repeating as much
@jhaber - put all the latch count downs in finally blocks as well

@@ -241,5 +282,60 @@ public void processResult(CuratorFramework client, CuratorEvent event) throws Ex
}
}
-}
+ protected <T, Q> Map<T, List<Q>> getAsyncAsMapThrows(final String pathNameForLogs, final Map<String, T> parentPathsMap, final String subpath, final Transcoder<Q> transcoder) throws Exception {

This comment has been minimized.

@tpetr

tpetr Mar 3, 2016

Member

method name seems kinda vauge

@tpetr

tpetr Mar 3, 2016

Member

method name seems kinda vauge

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Mar 3, 2016

Member

Summary of changes:

  • implement notExists in curator async to get inactive task ids without having to get active task ids then filter
  • batch together any calls for task history updates using new getAsyncAsMap in curator async
  • begin storing current SingularityTaskIdHistory for a task in the /tasks/history/(requestId)/(taskId) node, and write a migration to backfill. This will update on each status update for the task
Member

ssalinas commented Mar 3, 2016

Summary of changes:

  • implement notExists in curator async to get inactive task ids without having to get active task ids then filter
  • batch together any calls for task history updates using new getAsyncAsMap in curator async
  • begin storing current SingularityTaskIdHistory for a task in the /tasks/history/(requestId)/(taskId) node, and write a migration to backfill. This will update on each status update for the task
+ String updatePath = getUpdatePath(taskHistoryUpdate.getTaskId(), taskHistoryUpdate.getTaskState());
+ String updatesPath = getUpdatesPath(taskHistoryUpdate.getTaskId());
+ String historyPath = getHistoryPath(taskHistoryUpdate.getTaskId());
+ String reqeustPath = getRequestPath(taskHistoryUpdate.getTaskId().getRequestId());

This comment has been minimized.

@tpetr

tpetr Mar 3, 2016

Member

typo

@tpetr

tpetr Mar 3, 2016

Member

typo

@ssalinas ssalinas added the hs_staging label Mar 3, 2016

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Mar 4, 2016

Member

@wsorenson had to undo some of the consolidation of those async methods, turns out having the callbacks com from other methods, I was getting ConcurrentModification and related exceptions when trying to run things.

Also, implemented what I talked about with @tpetr and started storing the json of the SingularityTaskIdHistory object in the /tasks/history/(requestId)/(taskId) node. The benefit of this ebing that we can now just use one getAsync call to gather up many id histories instead of having to build each from multiple calls. At the moment, it will update this json on each status update so that the task is history has the most recent task state. Thoughts on that strategy?

Member

ssalinas commented Mar 4, 2016

@wsorenson had to undo some of the consolidation of those async methods, turns out having the callbacks com from other methods, I was getting ConcurrentModification and related exceptions when trying to run things.

Also, implemented what I talked about with @tpetr and started storing the json of the SingularityTaskIdHistory object in the /tasks/history/(requestId)/(taskId) node. The benefit of this ebing that we can now just use one getAsync call to gather up many id histories instead of having to build each from multiple calls. At the moment, it will update this json on each status update so that the task is history has the most recent task state. Thoughts on that strategy?

@wsorenson

This comment has been minimized.

Show comment
Hide comment
@wsorenson

wsorenson Mar 4, 2016

Member

I'm not sure it's worth all that effort to duplicate data and re-save data on updates just to make admin task search more efficient.

Member

wsorenson commented Mar 4, 2016

I'm not sure it's worth all that effort to duplicate data and re-save data on updates just to make admin task search more efficient.

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Mar 4, 2016

Member

@wsorenson did some refactoring, zk migration is taken out, and methods in curator async each build their own callback then call a shared method to actually execute the query, watch the latch, and return results

Member

ssalinas commented Mar 4, 2016

@wsorenson did some refactoring, zk migration is taken out, and methods in curator async each build their own callback then call a shared method to actually execute the query, watch the latch, and return results

@ssalinas ssalinas added the hs_qa label Mar 4, 2016

ssalinas added a commit that referenced this pull request Mar 4, 2016

Merge pull request #932 from HubSpot/task_search_speed
(WIP) task search efficiency

@ssalinas ssalinas merged commit 8ba0273 into task_search Mar 4, 2016

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details

@ssalinas ssalinas deleted the task_search_speed branch Mar 4, 2016

@ssalinas ssalinas modified the milestone: 0.4.12 Apr 1, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment