Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrapper performance optimizations #413

Merged
merged 19 commits into from
Oct 18, 2021
Merged

wrapper performance optimizations #413

merged 19 commits into from
Oct 18, 2021

Conversation

carns
Copy link
Contributor

@carns carns commented Jul 8, 2021

This pull requests adds multiple performance enhancements to the Darshan runtime wrappers. The optimizations are targeted for the Theta system at the ALCF, but will likely benefit other platforms as well.

These changes reduce function call overhead, reduce locking overhead, uses atomics in place of locks in some locations, and add optional functionality to use the Intel-specific rdtscp timer instruction on platforms where that is more performant than standard timer calls.

The following example shows the impact on a degenerate test case on Theta with 5 million I/O operations to tmpfs on a single process:

## NO DARSHAN static:
50000000 fprintf()s of size 1 each in 12.465365 seconds (4011113.939493 ops/s)
50000000 fscanf()s of size 1 each in 29.004269 seconds (1723884.156082 ops/s)
50000000 fread()s of size 1 each in 3.130983 seconds (15969424.993880 ops/s)

## darshan@main, enabled, static:
50000000 fprintf()s of size 1 each in 215.936336 seconds (231549.728577 ops/s)
50000000 fscanf()s of size 1 each in 263.471802 seconds (189773.628985 ops/s)
50000000 fread()s of size 1 each in 200.895207 seconds (248885.977447 ops/s)

## darshan@perf-opt, enabled, static:
50000000 fprintf()s of size 1 each in 27.554754 seconds (1814568.911264 ops/s)
50000000 fscanf()s of size 1 each in 52.301563 seconds (955994.373946 ops/s)
50000000 fread()s of size 1 each in 18.619044 seconds (2685422.507416 ops/s)

This PR should be rebased after the autotools changes are merged. It also needs additional testing, in particular to confirm that the rdtscp instruction produces counter values consistent with the default timer functions.

@carns carns force-pushed the carns/perf-opt branch 2 times, most recently from 7d950a4 to 18babb9 Compare July 9, 2021 14:52
@carns
Copy link
Contributor Author

carns commented Jul 9, 2021

Rebased this morning to a) consolidate commits more sensibly and eliminate reverted auxiliary timer library experiment and b) sync up with current origin/main

Will still need to rebase again after #381 .

carns added 10 commits July 20, 2021 14:05
- will be used to guide some wrapper performance optimizations
- eliminate darshan-core.h and fold into darshan.h
- make darshan_core_wtime() and it's subroutine static inlines
- switch to clock_gettime() to retrieve time
- no malloc/free or system calls in critical regions
@carns
Copy link
Contributor Author

carns commented Jul 20, 2021

Rebased to current origin/main (i.e., to reflect automake changes)

/*********************************************************
* Wrappers for H5F functions of interest *
*********************************************************/

#define H5F_PRE_RECORD() do { \
HDF5_LOCK(); \
if(!darshan_core_disabled_instrumentation()) { \
if(!__darshan_disabled) { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this __darshan_disabled value is cached from the MAP_OR_FAIL macro which is called before the I/O routine, which we obviously aren't holding locks through. It's possible the following code paths get triggered while Darshan is in the process of shutting down, which probably requires some more care?

Some potential issues stand out:

  1. It's possible internal module data structures (e.g., hdf5_file_runtime structure here) are still around, opening the opportunity for a wrapper to update them while Darshan is shutting down. This would happen if a Darshan I/O wrapper routine executed between the module's output() and cleanup() routines called by darshan-core (former sets which module records to output and the latter cleans up module data structures.
  2. It's also possible that if Darshan has called cleanup() on a module, and then an I/O wrapper triggers for that module, that it may re-allocate module data structures.

These are for sure corner cases, because each would required Darshan to have started shutdown while a wrapper is between between MAP_OR_FAIL (where __darshan_disabled is set) and PRE_RECORD (where __darshan_disabled is checked).

darshan-core itself handles this gracefully, these issues are contained to the modules themselves. Issue 1) could lead to modification of records while Darshan is writing them, while Issue 2) likely just results in a memory leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I don't think that 1) could happen, because the hdf5_file_runtime structure is protected separately by the HDF5_LOCK(). 2) looks possible; there isn't anything preventing it from re-registering the module and possibly getting into risky code paths while the core is shutting down.

I think we can modify the hdf5_file_runtime_initialize() (and it's equivalent in the other functions) to disallow re-initialization, at least until we have a use case for that to happen intentionally. That _initialize() call is likewise protected by HDF5_LOCK() already so it is Ok if it needs to check a protected flag.

There isn't anything to prevent the module from potentially calling the *_record_ref* functions at inappropriate times, but it looks like those are already appropriately serialized internally.

I'll adjust the PR to keep the modules from re-initializing, and also add some comments clarifying who's responsible for protecting which data structures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe a slightly cleaner variation of the above would be to modify darshan_core_register_module() so that it can report failure, and have all of the modules check for success before allocating a module runtime structure.

The safety check to prevent re-initialization after shutdown has started could then be put in darshan_core_register_module() so it's not duplicated in N modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further discussion: the modules are already protected against re-registration, because the core code sets the __darshan_core variable to NULL (preventing further registrations) before invoking any module output routines.

There was, however, still a possibility of modules updating counters in between the module _output() and _cleanup() routines, which could possibly cause inconsistent results in the log. This is fixed now in 9c86c8d.

@shanedsnyder
Copy link
Contributor

I found an existing bug related to timer code in darshan_core_init() -- we should be using the _absolute() variant of wtime before darshan is initialized. That wasn't introduced by anything you did, I just noticed it while reviewing and confirmed it was leading to bogus values when enabling Darshan's internal timing (DARSHAN_INTERNAL_TIMING env var).

That's fixed via 1ce9145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants