Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Project dir cache enhancement (#2017)
a race condition scenarios could happen when multiple azkaban executor process are running: It's possible when two azkaban executor process perform deletion in the same azkaban project dir even when one executor is inactive. E.g, flow run on any executor by using useExecutor label. If so, the list of azkaban project dir in memory(installedProjects) kept by azkaban executor process, will be out of sync from what's on the disk. Another case is race condition between one executor process is deleting a project dir while another executor process is creating execution dir based on the project dir. This PR removes the installedProjects from executor. So every time a project needs to be downloaded, a scan of every project dir and calculation of total disk usage sum will be done to decide whether purging is needed. This could takes tens of seconds when number of project dir is >= 5000 but a few seconds with inode cache. make project dir cleanup(deleteProjectDirsIfNecessary) synchronized. Since the method is a check-then-act process which is vulnerable to race condition when multiple threads are doing deletion. An alternative is to synchronize on an interned string of project id+project version(https://stackoverflow.com/questions/133988/synchronizing-on-string-objects-in-java), however this is not that elegant as the linked post points out. Synchronization on the object level makes sense given flow setup is low frequency operation in most cases(<= 5 ops/mins in our production environment). when project dir is created, another metadata file keeping the file count is created. The purpose of it is is to address the race condition between one executor process is deleting a project dir while another executor process is creating execution dir based on the project dir. A sanity check on the file count will be conducted against created execution dir. If execution dir's file count is not same as base project dir, then fail the flow setup and let azkaban web server dispatch it again. Note even with this fix, there still could be race conditions. E.g, when two executor process are calling ProjectCacheDirCleaner#deleteProjectDirsIfNecessary, one might delete a dir while the other is loading the same dir. A potential long term fix: #2020 Follow-up add file count sanity check mentioned above.
- Loading branch information
1 parent
5044552
commit 1d251bc
Showing
6 changed files
with
185 additions
and
162 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.