I recently came across Modin which seems to be a drop-in replacement to parallelize most pandas operations. Given the extensive use of pandas dataframes and easily parallelizable operations in this codebase, I think it is worth while to try it out and evaluate trade-offs.