-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: modify PyDarshan plotting routines to accept records as input #944
base: main
Are you sure you want to change the base?
Changes from 9 commits
6150dd6
aedaa9b
f63839c
721c982
c7c8a1e
0097625
4cf1aae
75cc536
f1917f3
0f626ac
5abaeb0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,12 +18,12 @@ def autolabel(ax, rects): | |
rotation=45, | ||
) | ||
|
||
def plot_access_histogram(report, mod, ax=None): | ||
def plot_access_histogram(record, mod, ax=None): | ||
""" | ||
Plots a histogram of access sizes for specified module. | ||
|
||
Args: | ||
report (darshan.DarshanReport): report to generate plot from | ||
record: record to generate plot from | ||
mod (str): mod-string for which to generate access_histogram | ||
|
||
""" | ||
|
@@ -33,44 +33,64 @@ def plot_access_histogram(report, mod, ax=None): | |
else: | ||
fig = None | ||
|
||
# TODO: change to report.summary | ||
if 'mod_agg_iohist' in dir(report): | ||
report.mod_agg_iohist(mod) | ||
else: | ||
print("Cannot create summary, mod_agg_iohist aggregator is not registered with the report class.") | ||
|
||
# defaults | ||
labels = ['0-100', '101-1K', '1K-10K', '10K-100K', '100K-1M', '1M-4M', '4M-10M', '10M-100M', '100M-1G', '1G+'] | ||
|
||
agg = report.summary['agg_iohist'][mod] | ||
# TODO: can simplify the read/write vals below after | ||
# support for python 3.6 is dropped | ||
read_vals = [ | ||
agg['READ_0_100'], | ||
agg['READ_100_1K'], | ||
agg['READ_1K_10K'], | ||
agg['READ_10K_100K'], | ||
agg['READ_100K_1M'], | ||
agg['READ_1M_4M'], | ||
agg['READ_4M_10M'], | ||
agg['READ_10M_100M'], | ||
agg['READ_100M_1G'], | ||
agg['READ_1G_PLUS'] | ||
] | ||
|
||
write_vals = [ | ||
agg['WRITE_0_100'], | ||
agg['WRITE_100_1K'], | ||
agg['WRITE_1K_10K'], | ||
agg['WRITE_10K_100K'], | ||
agg['WRITE_100K_1M'], | ||
agg['WRITE_1M_4M'], | ||
agg['WRITE_4M_10M'], | ||
agg['WRITE_10M_100M'], | ||
agg['WRITE_100M_1G'], | ||
agg['WRITE_1G_PLUS'] | ||
] | ||
|
||
agg=record['counters'] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would not call this variable This variable was originally called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will change it! |
||
if mod == 'POSIX': | ||
read_vals = [ | ||
agg['POSIX_SIZE_READ_0_100'][0], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Notice how similar the counter names are here for POSIX and MPI-IO modules. They only differ in the prefix module name that is provided as input to this routine. I would get rid of this module-specific if-else logic and use string manipulation to get the desired module-specific counter names instead. Will be visually much cleaner since it's half as much code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, it will be cleaner. |
||
agg['POSIX_SIZE_READ_100_1K'][0], | ||
agg['POSIX_SIZE_READ_1K_10K'][0], | ||
agg['POSIX_SIZE_READ_10K_100K'][0], | ||
agg['POSIX_SIZE_READ_100K_1M'][0], | ||
agg['POSIX_SIZE_READ_1M_4M'][0], | ||
agg['POSIX_SIZE_READ_4M_10M'][0], | ||
agg['POSIX_SIZE_READ_10M_100M'][0], | ||
agg['POSIX_SIZE_READ_100M_1G'][0], | ||
agg['POSIX_SIZE_READ_1G_PLUS'][0] | ||
] | ||
|
||
write_vals = [ | ||
agg['POSIX_SIZE_WRITE_0_100'][0], | ||
agg['POSIX_SIZE_WRITE_100_1K'][0], | ||
agg['POSIX_SIZE_WRITE_1K_10K'][0], | ||
agg['POSIX_SIZE_WRITE_10K_100K'][0], | ||
agg['POSIX_SIZE_WRITE_100K_1M'][0], | ||
agg['POSIX_SIZE_WRITE_1M_4M'][0], | ||
agg['POSIX_SIZE_WRITE_4M_10M'][0], | ||
agg['POSIX_SIZE_WRITE_10M_100M'][0], | ||
agg['POSIX_SIZE_WRITE_100M_1G'][0], | ||
agg['POSIX_SIZE_WRITE_1G_PLUS'][0] | ||
] | ||
elif mod == 'MPI-IO': | ||
read_vals = [ | ||
agg['MPIIO_SIZE_READ_AGG_0_100'][0], | ||
agg['MPIIO_SIZE_READ_AGG_100_1K'][0], | ||
agg['MPIIO_SIZE_READ_AGG_1K_10K'][0], | ||
agg['MPIIO_SIZE_READ_AGG_10K_100K'][0], | ||
agg['MPIIO_SIZE_READ_AGG_100K_1M'][0], | ||
agg['MPIIO_SIZE_READ_AGG_1M_4M'][0], | ||
agg['MPIIO_SIZE_READ_AGG_4M_10M'][0], | ||
agg['MPIIO_SIZE_READ_AGG_10M_100M'][0], | ||
agg['MPIIO_SIZE_READ_AGG_100M_1G'][0], | ||
agg['MPIIO_SIZE_READ_AGG_1G_PLUS'][0] | ||
] | ||
|
||
write_vals = [ | ||
agg['MPIIO_SIZE_WRITE_AGG_0_100'][0], | ||
agg['MPIIO_SIZE_WRITE_AGG_100_1K'][0], | ||
agg['MPIIO_SIZE_WRITE_AGG_1K_10K'][0], | ||
agg['MPIIO_SIZE_WRITE_AGG_10K_100K'][0], | ||
agg['MPIIO_SIZE_WRITE_AGG_100K_1M'][0], | ||
agg['MPIIO_SIZE_WRITE_AGG_1M_4M'][0], | ||
agg['MPIIO_SIZE_WRITE_AGG_4M_10M'][0], | ||
agg['MPIIO_SIZE_WRITE_AGG_10M_100M'][0], | ||
agg['MPIIO_SIZE_WRITE_AGG_100M_1G'][0], | ||
agg['MPIIO_SIZE_WRITE_AGG_1G_PLUS'][0] | ||
] | ||
#TODO: add support for HDF5/PnetCDF modules | ||
x = np.arange(len(labels)) # the label locations | ||
width = 0.35 # the width of the bars | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -83,7 +83,7 @@ def get_access_count_df(mod_df: Any, mod: str) -> Any: | |
Parameters | ||
---------- | ||
mod_df: "counters" dataframe for the input | ||
module `mod` from a ``darshan.DarshanReport``. | ||
module `mod` from a record. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not just a singular |
||
|
||
mod: the module to obtain the common accesses | ||
table for (i.e "POSIX", "MPI-IO", "H5D"). | ||
|
@@ -102,7 +102,6 @@ def get_access_count_df(mod_df: Any, mod: str) -> Any: | |
df = mod_df.filter(filter_keys) | ||
df = collapse_access_cols(df=df, col_name=col_name) | ||
df_list.append(df) | ||
|
||
return pd.concat(df_list, axis=1) | ||
|
||
|
||
|
@@ -122,14 +121,14 @@ def __init__(self, df: Any, **kwargs): | |
self.html = self.df.to_html(**kwargs) | ||
|
||
|
||
def plot_common_access_table(report: darshan.DarshanReport, mod: str, n_rows: int = 4) -> DarshanReportTable: | ||
def plot_common_access_table(record: dict, mod: str, n_rows: int = 4) -> DarshanReportTable: | ||
""" | ||
Creates a table containing the most | ||
common access sizes and their counts. | ||
|
||
Parameters | ||
---------- | ||
report: a ``darshan.DarshanReport``. | ||
record: a dict. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can be a bit more specific than "just some dictionary" here, it should have counter and fcounter dataframes nested within I suspect. To be honest, we may want to use our own custom class name for this just for clarity (it could still be interchangeable with the original plain There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In cffi_backend.py, rec_dict is used for the accumulate_records function. maybe we could use a name like acc_rec_dict? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, maybe? I was just thinking we kind of have a custom object that subclasses class acc_rec_dict(dict):
# stores darshan counter and fcounter dataframes
pass Or maybe there's a better idea? I also wonder if it might make sense to provide some way to i.e., automatically produce such an object from multiple logs in a convenience function like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm on board with coming up with a more specific class for a "record" type, even if it's just a dict subclass. I don't know if the DarshanRecordCollection thing we added is somehow useful for this or if that's meant to be hidden from users -- probably a better question for @jakobluettgau . I think we're on the same page, but I'll clarify just to double check:
Maybe? At the very least, it might make sense to make the existing API ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, we're on the same page; for my other comment below I imagine you'll just tag on to the C accum infrastructure for now, but I felt compelled to whine :) |
||
|
||
mod: the module to obtain the common access size | ||
table for (i.e "POSIX", "MPI-IO", "H5D"). | ||
|
@@ -145,8 +144,7 @@ def plot_common_access_table(report: darshan.DarshanReport, mod: str, n_rows: in | |
the `df` or `html` attributes, respectively. | ||
|
||
""" | ||
mod_df = report.records[mod].to_df(attach=None)["counters"] | ||
|
||
mod_df=record['counters'] | ||
if mod == "MPI-IO": | ||
mod = "MPIIO" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to be a bit more specific here (and in the other similarly-modified plotting functions)--I believe we'd want to support not just a singular record, and not just many records, but rather a dictionary that contains both the counter and fcounter record data structures, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's a dictionary, with 2 separate DataFrames: 'counters' (for integer record data) and 'fcounters' (for floating point record data)