Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Diff-ing and other Tweaks #66

Merged
merged 5 commits into from
May 23, 2021
Merged

Parallel Diff-ing and other Tweaks #66

merged 5 commits into from
May 23, 2021

Conversation

Myoldmopar
Copy link
Member

This has some changes in place to get parallel diff calculations working. I haven't tested it exhaustively, but it does look like it is working, and all unit tests continue to pass. Need to verify CI will be OK with it though, by pointing a decent CI config file to this branch and making sure some diffs appear in that branch.

This will continue to run serially on Windows or in frozen apps or whatever the limitation was previously. Now that E+ is thread safe, we need to make sure that table-diff and math-diff are also thread safe, and then we can get all this running using threading rather than multiprocessing, and it will be parallel on all platforms.

@Myoldmopar Myoldmopar requested a review from mitchute May 11, 2021 17:44
@mitchute
Copy link
Collaborator

Parallel diff-ing!! 🤩

runner.test_output_dir,
runner.thresh_dict_file,
ci_mode=True
) # returns an updated entry
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_diffs_for_one_case is now a @staticmethod to make it easier to use with the process pool. This required some extra parameters to be passed into it.

self.id_like_to_stop_now = False
self.completed_structure = None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed structure instance is now a member variable on the suite runner class so that the process pool can access it when processes complete.

@@ -372,7 +375,8 @@ def run_build(self, build_tree):
# `apply_async` approach I am using. Blech. Once again, on Windows, this means it will partially not be
# multithreaded.
if self.number_of_threads == 1 or frozen and system() in ['Windows', 'Darwin']: # pragma: no cover
self.my_print("Ignoring num_threads on frozen Windows/Mac instance, just running with one thread.")
if self.number_of_threads > 1:
self.my_print("Ignoring num_threads on frozen Windows/Mac instance, just running with one thread.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only print this warning message if you are actually trying to run multithreaded. Previously it would show up even if you intentionally ran a single thread.

@@ -697,8 +704,7 @@ def process_diffs_for_one_case(self, this_entry, ci_mode=False):
EndErrSummary.STATUS_SUCCESS,
runtime_case2
))
self.my_print("TestMathAndKill Fatal-ed as expected, continuing with no diff checking on it")
return this_entry
return this_entry, "TestMathAndKill Fatal-ed as expected, continuing with no diff checking on it"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.my_print is no longer available from this static method, so this function simply returns a string message that can be printed once back in the process pool callback function.

@@ -786,14 +786,14 @@ def process_diffs_for_one_case(self, this_entry, ci_mode=False):
path_to_math_diff_log)), MathDifferences.SSZ)

# Do sorta-math-diff JSON diff
if self.both_files_exist(case_result_dir_1, case_result_dir_2, 'eplusout_hourly.json'):
this_entry.add_math_differences(MathDifferences(self.diff_json_time_series(
if SuiteRunner.both_files_exist(case_result_dir_1, case_result_dir_2, 'eplusout_hourly.json'):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several functions were already static methods but still referenced by self, so these were cleaned up.

self.build_tree_a['source_dir'], self.build_tree_a['build_dir'],
self.build_tree_b['source_dir'], self.build_tree_b['build_dir'],
os.path.join(self.build_tree_a['build_dir'], self.test_output_dir),
os.path.join(self.build_tree_b['build_dir'], self.test_output_dir)
)
diff_runs = []
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New functionality to allow for diffs to run multithreaded.

if diff_results.entries_by_file[0].basename == 'my_file':
results_for_file = diff_results.entries_by_file[0]
elif diff_results.entries_by_file[1].basename == 'my_file':
results_for_file = diff_results.entries_by_file[1]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an interesting one, previously the two files would always come out in the serial order, but now sometimes they would come out backwards. I had to simply check which entry was the one I'm inspecting before calling assertions on it.

@@ -638,6 +643,24 @@ def idf_select_all(self):
self.active_idf_listbox.insert(END, idf)
self.idf_refresh_count_status()

def idf_select_all_except_long_runs(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add in a new way to select all but the longest running files. This can take tons of time off the run just by skipping these few files.

@Myoldmopar
Copy link
Member Author

Parallel diff-ing!!

Take a look at this branch at your leisure and let me know if you run into any issues.

@Myoldmopar
Copy link
Member Author

I just ran this on the rmPlantHack branch and it worked flawlessly, showing all the diffs. The best part...I pulled the branch and it took just over 7 minutes to do a full rebuild including unit tests...then the full regression suite took just over 4 minutes including both sets of runs. Doing 723 files. Things are looking up!

@mitchute
Copy link
Collaborator

Is this run on DecentCI?

@mitchute
Copy link
Collaborator

Looks great to me! We're definitely parallel-processing the diffs now. Nice job, @Myoldmopar 🥇

@Myoldmopar
Copy link
Member Author

Is this run on DecentCI?

It is run on Decent CI, but not in a way that will make a difference with this multi-processing. The call out to the regression script is made as part of a CTest execution. So if you run ctest -j N, each of those tests will independently be calling the regression script in different processes. So it's all good over there already.

@Myoldmopar
Copy link
Member Author

I just pushed a commit based on the rmPlantHack branch that points to this branch in the regression repo. If it looks all clean I'll go ahead and merge this down.

@Myoldmopar Myoldmopar linked an issue May 12, 2021 that may be closed by this pull request
@Myoldmopar
Copy link
Member Author

OK, I think I'm going to merge this...if we notice anything later I'll revert or address it.

@Myoldmopar Myoldmopar merged commit 237b8a3 into master May 23, 2021
@Myoldmopar Myoldmopar deleted the SmallTweaks branch May 23, 2021 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it more parallel
2 participants