Skip to content

RIKEN-RCCS/xls-parser

Repository files navigation

xls-parser

The xls_parser.py “transpiles” the XLS file used by the FAPP profiler into a python script (xls_parser.out.py). xls_parser.out.py mimics the calculations of the original XLS file (without the need to access to the XLS file or Excel) and outputs the results into JSON format.

Generating xls_parse.out.py

Run xls_parse.py /path/to/cpu_pa_report.xlsm which will generate the xls_parse.out.py file.

Requirements: openpyxl, (e.g. do pip install openpyxl).

Important:

The cpu_pa_report.xlsm file must already be “loaded”: i.e. you must use Excel once, as described in the fapp manual to load data from the CSV files and save it! The xls_parse.py doesn’t work with the “empty” cpu_pa_report.xlsm as provided by fapp.

Known issues:

Currently, the openpyxl package gives the following 3 warnings when reading the XLSM file. These can be ignored.

/usr/lib/python3.10/site-packages/openpyxl/worksheet/_reader.py:312: UserWarning: Unknown extension is not supported and will be removed
  warn(msg)
/usr/lib/python3.10/site-packages/openpyxl/worksheet/_reader.py:296: UserWarning: Failed to load a conditional formatting rule. It will be discarded. Cause: expected <class 'float'>
  warn(msg)
/usr/lib/python3.10/site-packages/openpyxl/worksheet/_reader.py:312: UserWarning: Conditional Formatting extension is not supported and will be removed
  warn(msg)

Rebuilding xls_parser.out.py

The xls_parser.out.py should not need to be rebuilt for runs which have fewer threads than the XLSM file which was used to generate it. For example: if the XLSM file was generated with CSV data obtained by profiling an 8 thread application, the xls_parser.out.py generated should be good for single thread applications, but might generate wrong results (or fail) for 12 thread applications.

Generally, reusing the same xls_parser.out.py to profile an application with modifications to the code should yield correct results. Any errors, incorrect results should be reported as an issue.

Using xls_parse.out.py

# XMLPATH: the directory with paN.xml files
python xls_parse.out.py $XMLPATH
# see
python xls_parse.out.py --help
# for more options

The generated xls_parse.out.py can be used to print (in JSON format) all the derived counters calculated in the Excel file (without the need for the Excel).

Parameters

The xls_parse.out.py has a one mandatory input: the path to a directory with the paX.xml files (where X=1, ..., 17) generated using the fapp utility:

for i in `seq 1 17`; do
    fapp -C -d ./tmp${i} -Icpupa,nompi -Hevent=pa${i} ${BINARY}
    fapp -A -d ./tmp${i} -Icpupa -txml -o ./pa${i}.xml
done

Note that unlike Excel which reads CSV file, xls_parse.out.py reads XLS files.

Output

The output is printed to stdout in JSON format as (nested) dictionaries.

Motivation

It is inconvenient to download 17 files (plus the XLS) to a local PC to get the desired measurements, especially since profiling is rarely done for a single application. Usually, one does multiple measurements to compare different runs/applications, or one is attempting to improve an application based on the measurements. Using the Excel approach for the first situation is tedious and error-prone, and for the second situation extremely frustrating. It is also common for the programmers to be using Linux (and not Windows), which makes running Excel a further burden.

Fujitsu’s approval to open-source xls_parser.py

The ‘xls-parser’ tool can be OSSed with following conditions:

Distribution of the generated file (xls_parse.out.py) is not permitted.

No support for the values derived from the tool.


‘xls-parser’ツールは、次の条件でOSSにすることができます。

ツールからの生成されたファイル(xls_parse.out.py)の配布は許可されません。

ツールから派生した値は保証されません。

Citing this software

If you find the software useful, please cite it:

@software{vatai2022xlsparser,
  author = {Vatai, Emil},
  month = {March},
  year = {2022},
  title = {{XLS parser}},
  howpublished = {\url{https://github.com/RIKEN-RCCS/xls-parser}},
  version = {v1.0.3}
}