From ce19f67c6e232eb87a4ffd2ba74ae60f63416110 Mon Sep 17 00:00:00 2001 From: Lukas Rothenberger Date: Mon, 31 Oct 2022 15:27:40 +0100 Subject: [PATCH] test: added "update wiki" action --- .github/workflows/update_wiki.yml | 17 +++ wiki/DiscoPoP-Explorer.md | 82 +++++++++++++ wiki/DiscoPoP-Profiler.md | 125 ++++++++++++++++++++ wiki/Guide.md | 40 +++++++ wiki/Home.md | 70 ++++++++++++ wiki/Setup.md | 20 ++++ wiki/Troubleshooting.md | 13 +++ wiki/Tutorial.md | 184 ++++++++++++++++++++++++++++++ wiki/_Footer.md | 3 + wiki/_Sidebar.md | 16 +++ 10 files changed, 570 insertions(+) create mode 100644 .github/workflows/update_wiki.yml create mode 100644 wiki/DiscoPoP-Explorer.md create mode 100644 wiki/DiscoPoP-Profiler.md create mode 100644 wiki/Guide.md create mode 100644 wiki/Home.md create mode 100644 wiki/Setup.md create mode 100644 wiki/Troubleshooting.md create mode 100644 wiki/Tutorial.md create mode 100644 wiki/_Footer.md create mode 100644 wiki/_Sidebar.md diff --git a/.github/workflows/update_wiki.yml b/.github/workflows/update_wiki.yml new file mode 100644 index 000000000..fd7cc2d50 --- /dev/null +++ b/.github/workflows/update_wiki.yml @@ -0,0 +1,17 @@ +name: Update Wiki + +on: + push: + paths: + - 'wiki/**' + branches: + - develop +jobs: + update-wiki: + runs-on: ubuntu-latest + name: Update wiki + steps: + - uses: OrlovM/Wiki-Action@v1 + with: + path: 'wiki' + token: ${{ secrets.GITHUB_TOKEN }} \ No newline at end of file diff --git a/wiki/DiscoPoP-Explorer.md b/wiki/DiscoPoP-Explorer.md new file mode 100644 index 000000000..476370c27 --- /dev/null +++ b/wiki/DiscoPoP-Explorer.md @@ -0,0 +1,82 @@ +# DiscoPoP graph analyzer +DiscoPoP profiler is accompanied by a Python framework, specifically designed to analyze the profiler output files, generate a CU graph, detect potential parallel patterns, and suggest OpenMP parallelizations. +Currently, the following five patterns can be detected: +* Reduction +* Do-All +* Pipeline +* Geometric Decomposition +* Task Parallelism + +## Getting started +We assume that you have already run the DiscoPoP profiler on the target sequential application, and the following files are created in the current working directory: +* `Data.xml` (CU information in XML format created by *CUGeneration* pass) +* `_dep.txt` (Data dependences created by *DPInstrumentation* pass) +* `reduction.txt` and `loop_counter_output.txt` (Reduction operations and loop iteration data identified by *DPReduction* pass) + +In case any of the files mentioned above are missing, please follow the [DiscoPoP manual](../README.md) to generate them. + +In addition to the already mentioned files, a file named `_CUInstResult.txt` is required for the task parallelism detection. +In order to generate it, the following sequence of commands can be used: +``` +python3 -m discopop_explorer --path= --cu-xml= --dep-file= --loop-counter= --reduction= --generate-data-cu-inst= +clang++ -S -emit-llvm -c -std=c++11 -g /CUInstantiation/RT/CUInstantiation_iFunctions.cpp -o iFunctions_CUInst.ll +clang++ -g -O0 -emit-llvm -fno-discard-value-names -c -o tmp_target_app.ll +/opt -S -load=/libi/LLVMCUInstantiation.so -CUInstantiation -input=Data_CUInst.txt tmp_target_app.ll -fm-path=FileMapping.txt -o tmp_target_app_instrumented.ll +clang++ tmp_target_app_instrumented.ll iFunctions_CUInst.ll -o _cui -L$PATH_TO_DISCOPOP_BUILD_DIR/rtlib -lDiscoPoP_RT -lpthread -o _cui +rm tmp_target_app.ll tmp_target_app_instrumented.ll iFunctions_CUInst.ll +./_cui +``` + + +### Pre-requisites +To use the graph analyzer tool, you need to have Python 3.6+ installed on your system. Further Python dependencies can be installed using the following command: +`pip install -r requirements.txt` + +### Usage +To run the graph analyzer, you can use the following command: + +`python3 -m discopop_explorer --path ` + +You can specify the path to DiscoPoP output files. Then, the Python script searches within this path to find the required files. Nevertheless, if you are interested in passing a specific location to each file, here is the detailed usage: + + `discopop_explorer [--path ] [--cu-xml ] [--dep-file ] [--plugins ] [--loop-counter ] [--reduction ] [--json ] [--fmap ] [--cu-inst-res ] [--llvm-cxxfilt-path ] [--generate-data-cu-inst ]` + +Options: +``` + --path= Directory with input data [default: ./] + --cu-xml= CU node xml file [default: Data.xml]. + --dep-file= Dependencies text file [default: dep.txt]. + --loop-counter= Loop counter data [default: loop_counter_output.txt]. + --reduction= Reduction variables file [default: reduction.txt]. + --cu-inst-res= CU instantiation result file. Task Pattern Detector is executed if this option is set. + --llvm-cxxfilt-path= Path to llvm-cxxfilt executable. Required for Task Pattern Detector + if non-standard path should be used. + --plugins= Plugins to execute + --fmap= File mapping [default: FileMapping.txt] + --json Output result as a json file to specified path + --generate-data-cu-inst= Generates Data_CUInst.txt file and stores it in the given directory. + Stops the regular execution of the discopop_explorer. + Requires --cu-xml, --dep-file, --loop-counter, --reduction. + -h --help Show this screen. + --version Show version. +``` + +By default, running the graph analyzer will print out the list of patterns along with OpenMP parallelization suggestions to the standard output. You can also obtain the results in JSON format by passing `--json` argument to the Python script. + +### Walkthrough example +The **test/** folder contains a number of precomputed inputs for testing the tool, e.g., *atax* from Polybench benchmark suite. +You can try out this example workflow. + +**test/reduction/** contains source code and precomputed DiscoPoP output for a simple reduction loop. +The loop itself sums up all numbers from 1 to n. + +You can run DiscoPoP on **main.c** or just use included output. + +After that, you can run **discopop_explorer**. The **--path** argument should point to the output of the DiscoPoP. + +In this example, the output for reduction will point to the lines 6-9, and it will suggest **pragma omp parallel for** OpenMP directive for parallelizing the loop. +You will also find **i** classified as a private variable and **sum** as a reduction variable. Thus, the parallelization directive would be suggested as follows: + +```#pragma omp parallel for private(i) reduction(+:sum)``` + +The suggested pattern is demonstrated in **mainp.c** diff --git a/wiki/DiscoPoP-Profiler.md b/wiki/DiscoPoP-Profiler.md new file mode 100644 index 000000000..fe17d9315 --- /dev/null +++ b/wiki/DiscoPoP-Profiler.md @@ -0,0 +1,125 @@ +*Call clang++ with DiscoPoP LLVM passes.* + +The DiscoPoP profiler consists of multiple LLVM libraries (CUGeneration, +DPInstrumentation and DPReduction) to be passed to clang++ at compilation, as +well as a runtime library for the instrumented code. + +`discopop_profiler` wraps clang++ invocations to include the necessary compiler +and linker flags to make use of the DiscoPoP features. To use one of +the DiscoPoP passes, `discopop_profiler` is used *instead* of `clang++`, with +one of the option flags `--CUGeneration`, `--DPInstrumentation` or +`--DPReduction` to select the LLVM pass. + +## Pre-requisites + +`discopop_profiler` is included in the +[PyPI package `discopop`](https://pypi.org/project/discopop/). As such, it is +installed with + +``` +pip install discopop +``` + +It is required that the DiscoPoP profiler is installed (see +[DiscoPoP profiler installation](../README.md#discopop-profiler-installation)) +and the environment variable `DISCOPOP_INSTALL` is set to the path where +DiscoPoP is installed. + +``` +export DISCOPOP_INSTALL= +``` + +## Usage + +``` +usage: discopop_profiler [--verbose] [--clang CLANG] + (--CUGeneration | --DPInstrumentation | --DPReduction) + + +Call clang++ with DiscoPoP LLVM passes. + +optional arguments: + -h, --help Show this help message and exit. + -V, --version Show version number and exit. + -v, --verbose Show additional information such as clang++ + invocations. + --clang CLANG Path to clang++ executable. + --CUGeneration, --cugeneration + Obtain the computational unit (CU) graph of the target + application. + --DPInstrumentation, --dpinstrumentation + Instrument the target application to obtain data + dependences. + --DPReduction, --dpreduction + Instrument the target application to obtain the list + of reduction operations. +``` + +### CU generation + +To obtain the computational unit (CU) graph of the target application, please +run the following command. + +``` +discopop_profiler --CUGeneration -c +``` + +### Dependence profiling + +To obtain data dependences, we need to instrument the target application. +Running the instrumented application will result in a text file containing all +the dependences that are located in the present working directory. + +``` +discopop_profiler --DPInstrumentation -c -o out.o +discopop_profiler --DPInstrumentation out.o -o +./ +``` + +### Identifying reduction operations + +To obtain the list of reduction operations in the target application, we need to +instrument the target application. Running the instrumented application will +result in a text file containing all the reductions that are located in the +present working directory. + +``` +discopop_profiler --DPReduction -c -o out.o +discopop_profiler --DPReduction out.o -o +./ +``` + +### Usage with projects that use the CMake build system + +Since `discopop_profiler` is invoked like a regular compiler, it is easy to run +DiscoPoP instrumentation on projects that use a build system such as CMake. + +1. Configure CMake to use `discopop_profiler` as `CMAKE_CXX_COMPILER`: + ``` + cmake -DCMAKE_CXX_COMPILER="discopop_profiler" -DCMAKE_CXX_FLAGS="--DPInstrumentation" . + ``` +1. Build the project with DiscoPoP instrumentation applied on the code: + ``` + make + ``` + +## Troubleshooting + +### clang++ executable not found in PATH + +`discopop_profiler` expects to find `clang++-8` or `clang++` in your system's `PATH`. +If clang is installed elsewhere, either add the installation location to your +`PATH`, or set the location to the `clang++` binary to be invoked with + +``` +discopop_profiler --clang= ... +``` + +### Compiler invocation + +`discopop_profiler` prints the exact flags passed to `clang++` if the `--verbose` +flag is set. + +``` +discopop_profiler --verbose ... +``` diff --git a/wiki/Guide.md b/wiki/Guide.md new file mode 100644 index 000000000..2a7cae6c1 --- /dev/null +++ b/wiki/Guide.md @@ -0,0 +1,40 @@ +# Walk-through example +The following walk-through example demonstrates how to use DiscoPoP to analyze a sequential sample application and identify its parallelization opportunities. In this example, we use the program `SimplePipeline`. As its name suggests, this program involves a pipeline pattern. We assume that you have successfully installed DiscoPoP. The following diagram depicts the whole workflow of obtaining the parallelization suggestions. + +![DiscoPoP workflow diagram](/docs/img/DPWorkflow.svg) + +First, switch to the `/test/simple_pipeline` folder that contains the program `SimplePipeline.c`. Then, please run the following commands step-by-step to obtain the desired results. + +1) Run the `dp-fmap` script to obtain the list of files. The output will be written in a file named FileMapping.txt. + + `/scripts/dp-fmap` + +2) To obtain the computational units (CU), please run the following command. + + `clang++ -g -O0 -fno-discard-value-names -Xclang -load -Xclang /libi/LLVMCUGeneration.so -mllvm -fm-path -mllvm ./FileMapping.txt -c SimplePipeline.c` + +The output is an XML file that contains all the CU nodes and their connections. You should be able to obtain an XML file as in [`Data.xml`](/test/simple_pipeline/data/Data.xml). Using the information in this file, we can generate a CU graph. + +3) To obtain data dependences, we need to instrument the application and run it. +``` + clang++ -g -O0 -fno-discard-value-names -Xclang -load -Xclang /libi/LLVMDPInstrumentation.so -mllvm -fm-path -mllvm ./FileMapping.txt -c SimplePipeline.c -o out.o + clang++ out.o -L/rtlib -lDiscoPoP_RT -lpthread -o out + ./out +``` +The output is a text file that contains all the dependences. You should be able to obtain a CU graph as in [`dp_run_dep.txt`](/test/simple_pipeline/data/dp_run_dep.txt). + +A data dependence is represented as a triple ``. `type` denotes the dependence type and can be any of `RAW`, `WAR` or `WAW`. Note that a special type `INIT` represents the first write operation to a memory address. `source` and `sink` are the source code locations of the former and the latter memory access, respectively. `sink` is further represented as a pair ``, while source is represented as a triple ``. The keyword `NOM` (short for "NORMAL") indicates that the source line specified by aggregated `sink` has no control-flow information. Otherwise, `BGN` and `END` represent the entry and exit points of a control region. + +4) Although there is no reduction pattern in SimplePipeline, we strongly suggest that you run the reduction analysis to avoid missing any pattern and obtain necessary loop information. This pass instruments the target application and analyzes its loops to identify their iteration counts and obtain the list of potential reduction operations. Running the instrumented application will result in a text file that containins all the reductions located in the working directory. +``` + clang++ -g -O0 -fno-discard-value-names -Xclang -load -Xclang /libi/LLVMDPReduction.so -mllvm -fm-path -mllvm ./FileMapping.txt -c SimplePipeline.c -o out.o + clang++ out.o -L/rtlib -lDiscoPoP_RT -lpthread -o out + ./out +``` +Besides the list of reduction loops, this step generates two important files named `loop_counter_output.txt` and `loop_meta.txt`. The pattern analysis in the next step requires these files along with CU graph and dependences. + +5) To obtain the list of patterns and OpenMP parallelization suggestions, run the Python application `discopop_explorer`: + + `python3 -m discopop_explorer --cu-xml=Data.xml --dep-file=dp_run_dep.txt` + +You should now be able to see the pipeline pattern that was found in the target application along with its stages plus suitable OpenMP constructs for parallelization. You can access a sample output in [simple_pipeline.json](/test/simple_pipeline.json). Using these hints, you can start parallelizing the target application. diff --git a/wiki/Home.md b/wiki/Home.md new file mode 100644 index 000000000..566305501 --- /dev/null +++ b/wiki/Home.md @@ -0,0 +1,70 @@ +# DiscoPoP - Discovery of Potential Parallelism +Hello World! +DiscoPoP is an open-source tool that helps software developers parallelize their programs with threads. It is a joint project of Technical University of Darmstadt and Iowa State University. + +In a nutshell, DiscoPoP performs the following steps: +* detect parts of the code (computational units or CUs) with little to no internal parallelization potential, +* find data dependences among them, +* identify parallel patterns that can be used to parallelize a code region, +* and finally suggest corresponding OpenMP parallelization constructs and clauses to programmers. + +DiscoPoP is built on top of LLVM. Therefore, DiscoPoP can perform the above-mentioned steps on any source code which can be transferred into the LLVM IR. + +A more comprehensive overview of DiscoPoP can be found on our [project website](https://www.discopop.tu-darmstadt.de/). + +## Getting started +### Pre-requisites +Before doing anything, you need a basic development setup. We have tested DiscoPoP on Ubuntu, and the prerequisite packages should be installed using the following command: + + sudo apt-get install git build-essential cmake + +Additionally, you need to install LLVM on your system. Currently, DiscoPoP only supports LLVM versions between 8.0 and 11.1. Due to API changes, which lead to compilation failures, it does not support lower and higher versions. Please follow the [installation tutorial](https://llvm.org/docs/GettingStarted.html), or install LLVM 11 via a package manager as shown in the following snippet, if you have not installed LLVM yet. + + apt-get install libclang-11-dev clang-11 llvm-11 + +### DiscoPoP profiler installation +First, clone the source code into the designated folder. Then, create a build directory: + + mkdir build; cd build; + +Next, configure the project using CMake. The preferred LLVM installation path for DiscoPoP can be set using the -DLLVM_DIST_PATH= CMake variable. + + cmake -DLLVM_DIST_PATH= .. + +Once the configuration process is successfully finished, run `make` to compile and obtain the DiscoPoP libraries. All the shared objects will be stored in the build directory under a folder named as `libi/`. + + +### Running DiscoPoP +DiscoPoP contains different tools for analyzing the target sequential application, namely CUGeneration, DPInstrumentation, and DPReduction. In the following, we will explain how to run each of them. However, before executing anything, please run the `dp-fmap` script in the root folder of the target application to obtain the list of files. The output will be written in a file named `FileMapping.txt`. + + /scripts/dp-fmap + +#### CU generation +To obtain the computational unit (CU) graph of the target application, please run the following command. + + clang++ -g -O0 -fno-discard-value-names -Xclang -load -Xclang /libi/LLVMCUGeneration.so -mllvm -fm-path -mllvm ./FileMapping.txt -c + +#### Dependence profiling +To obtain data dependences, we need to instrument the target application. Running the instrumented application will result in a text file containing all the dependences that are located in the present working directory. + + clang++ -g -O0 -fno-discard-value-names -Xclang -load -Xclang /libi/LLVMDPInstrumentation.so -mllvm -fm-path -mllvm ./FileMapping.txt -c -o out.o + clang++ out.o -L/rtlib -lDiscoPoP_RT -lpthread + ./ + +#### Identifying reduction operations +To obtain the list of reduction operations in the target application, we need to instrument the target application. Running the instrumented application will result in a text file containing all the reductions that are located in the present working directory. + + clang++ -g -O0 -fno-discard-value-names -Xclang -load -Xclang /libi/LLVMDPReduction.so -mllvm -fm-path -mllvm ./FileMapping.txt -c -o out.o + clang++ out.o -L/rtlib -lDiscoPoP_RT -lpthread + ./ + +*NOTE:* Please use the exact compiler flags that we used. Otherwise, you might not get the correct results, or the analysis might fail. + +DiscoPoP also provides a wrapper [discopop_profiler](discopop_profiler/README.md) to +easily invoke clang with the DiscoPoP LLVM passes. + +#### Pattern identfication +Once you have all the results generated by DiscoPoP passes, you can use them to identify possible parallel design patterns. To learn more, please read the pattern detection [README](/discopop_explorer/README.md), which explains how to run pattern identification in detail. + +## Walk-through example +In the `test/` folder, we have provided sample programs to help you start using DiscoPoP. You can find the walk-through example [here](https://github.com/discopop-project/discopop/wiki/Tutorial). diff --git a/wiki/Setup.md b/wiki/Setup.md new file mode 100644 index 000000000..98f9b7037 --- /dev/null +++ b/wiki/Setup.md @@ -0,0 +1,20 @@ +## Getting started +### Pre-requisites +Before doing anything, you need a basic development setup. We have tested DiscoPoP on Ubuntu, and the prerequisite packages should be installed using the following command: + + sudo apt-get install git build-essential cmake + +Additionally, you need to install LLVM on your system. Currently, DiscoPoP only supports LLVM versions between 8.0 and 11.1. Due to API changes, which lead to compilation failures, it does not support lower and higher versions. Please follow the [installation tutorial](https://llvm.org/docs/GettingStarted.html), or install LLVM 11 via a package manager as shown in the following snippet, if you have not installed LLVM yet. + + apt-get install libclang-11-dev clang-11 llvm-11 + +### DiscoPoP profiler installation +First, clone the source code into the designated folder. Then, create a build directory: + + mkdir build; cd build; + +Next, configure the project using CMake. The preferred LLVM installation path for DiscoPoP can be set using the -DLLVM_DIST_PATH= CMake variable. + + cmake -DLLVM_DIST_PATH= .. + +Once the configuration process is successfully finished, run `make` to compile and obtain the DiscoPoP libraries. All the shared objects will be stored in the build directory under a folder named as `libi/`. \ No newline at end of file diff --git a/wiki/Troubleshooting.md b/wiki/Troubleshooting.md new file mode 100644 index 000000000..b13c523e3 --- /dev/null +++ b/wiki/Troubleshooting.md @@ -0,0 +1,13 @@ +### How to use DiscoPoP with projects which use CMake build system? +To run DiscoPoP instrumentation on projects which use CMake, you need to use the following commands instead of the normal CMake. +1. You first need to run CMake to just configure the project for compilation: +```bash +cmake -DCMAKE_CXX_COMPILER= -DCMAKE_CXX_FLAGS="-c -g -O0 -fno-discard-value-names -Xclang -load -Xclang /libi/LLVMDPInstrumentation.so -mllvm -fm-path -mllvm " +``` +2. Then, configure the project for linking: +```bash +cmake -DCMAKE_CXX_COMPILER= -DCMAKE_CXX_FLAGS="-g -O0 -fno-discard-value-names -Xclang -load -Xclang /libi/LLVMDPInstrumentation.so -mllvm -fm-path -mllvm " -DCMAKE_CXX_STANDARD_LIBRARIES="-L/rtlib -lDiscoPoP_RT -lpthread" . +``` +3. Running `make` will build the project with DiscoPoP instrumentation applied on the code. + +You may use Github issues to report potential bugs or ask your questions. In case you need individual support, please contact us using discopop[at]lists.parallel.informatik.tu-darmstadt.de. \ No newline at end of file diff --git a/wiki/Tutorial.md b/wiki/Tutorial.md new file mode 100644 index 000000000..26a11bd1e --- /dev/null +++ b/wiki/Tutorial.md @@ -0,0 +1,184 @@ +In this example, we demonstrate how to use DiscoPoP to extract data dependencies, computational units, parallel patterns, and finally the parallelization suggestions. + +## Setting environment variables + +This document assumes that you have already installed llvm, clang, and DiscoPoP. Please set the following environment variables to make the commands simpler. + +``` +CLANG +CLANG++ +DISCOPOP_BUILD +src_file +``` + +This code sample contains two functions: initialize and compute. In function “initialize”, there are two loops which initialize the arrays (i.e., v and a). + +## Assigning IDs to different files in the program + +DiscoPoP can analyze projects containing multiple files scattered in different directories. We have developed a script (i.e., `dp-fmap`) that assigns a unique ID to each file in the project. You can find the script in the scripts directory under the DiscoPoP root directory. Currently, we support the following file types: + +``` +c|cc|cpp|h|hpp +``` + +However, it might be the case that you have c/c++ files which have a different file extension (e.g., “.C”). In this case, you can add the desired extension by changing the content of the `dp-fmap` file. + +Running `dp-fmap` in the tutorial directory, a `FileMapping.txt` file with the following content is generated: + +``` +1 /$DiscoPoP_root/test/tutorial/src-parallel-cpu/tutorial.c +``` + +## Extracting Computational Units (CUs) + +To analyze a program with DiscoPoP, we first need to extract its computational units. This is a static process and we get the CUs by running Command 1 on each source code listed in the `FileMapping.txt` file: + +```sh +$CLANG -g -O0 -S -emit-llvm -fno-discard-value-names \ + -Xclang -load -Xclang ${DISCOPOP_BUILD}/libi/LLVMCUGeneration.so \ + -mllvm -fm-path -mllvm ./FileMapping.txt \ + -c -I $include_dir -o ${src_file}.ll $src_file +``` +*(Command 1: Extracting computational units of a single file)* + +Here is a brief description about the flags that we pass to the compiler: + +- `-g`: enables obtaining debug information from the source code, e.g., the line numbers. +- `-O0`: makes sure that we analyze the whole source code. +- `-S -emit-llvm`: outputs the LLVM IR version of the input file. This option might be interesting especially with the instrumentation libraries which we discuss in the next section. +- `-fno-discard-value-names`: keeps names (e.g., variables, functions) as they are in the source code +- `-Xclang -load -Xclang`: commands the clang compiler to use the pass which succeeds the flag +- `-mllvm -fm-path -mllvm`: instructs the clang compiler to receive the FileMapping in the path which is specified after the flag in the command line +- `-c`: compiles a single input file. We need it because DiscoPoP analyzes files in the program one by one. +- `-I`: we need this flag if we need to include any libraries +- `-o`: You may need this flag if you are interested to save the output file. + +When you run Command 1, some files will be generated including: + +- `Data.xml`: It contains the computational units in the specified input file (i.e., $src_file). +- `DP_CUIDCounter.txt`: It contains the ID of the last CU in the input file. When analyzing the next file in the program, DiscoPoP reads this file to assign IDs starting from the number saved in this file. + +The xml file contains many information about the program. There are four types of nodes in the xml file including: functions, loops, CUs, and dummies. Each node has an ID, a type, a name (some nodes have empty names), the start and end line of the node in the source code. + +Function nodes which are represented by type 1 contain information about functions in each file of the source code. Function nodes contain children nodes which can be CUs, loop nodes, and dummies. Also, you can find the list of function arguments there. + +Nodes with type 0 are CUs. They follow a read-after-write pattern. They are the atoms of parallelization; meaning that we do not look inside a CU for parallelization opportunities. The information that we report for CUs are the following: + +- `BasicBlockID`: The ID of the basic block that the CU happens in. A basic block is a block of code with single entry and exit points. A basic block may contain multiple CUs but a CU may not span over multiple basic blocks. +- `readDataSize`: number of bytes which is read in this CU. We consider LLVM-IR load instructions to compute this value. +- `writeDataSize`: Number of bytes written in this CU. It is computed like `readDataSize`. +- `instructionsCount`: Number of LLVM-IR instructions in the CU. +- `instructionLines`: The line numbers in which the CU appears. +- `readPhaseLines`: LLVM-IR load instructions which happen within the CU boundaries. +- `writePhaseLines`: LLVM-IR store instructions in the CU. +- `returnInstructions`: It indicates the line number of return instructions if the CU contains return instructions. +- `Successors`: The succeeding CU when analyzing the source code top-down in the source code. +- `localVariables`: the variables which appear within the CU. We also report the line number where the variable is defined, its name and its type. +- `globalVariables`: variables which break the read-after-write rule for the CU. They cause the creation of a new CU which will succeed the CU. +- `callsNode`: It indicates the line number of a called function if the CU contains a call instruction. + +Loop nodes have type 2. They contain children nodes which can be CUs, other loops, or dummy nodes. + +Dummy nodes are usually library functions whose source code is not available. We cannot profile them and thus do not provide parallelization suggestions for them. + +Please note that DiscoPoP appends CUs to an existing `Data.xml` file and thus if you need to extract computational units of the program again, you need to remove the existing `Data.xml` file. You also need to remove the `DP_CUIDCounter.txt` file to generate CUs starting from 0. + +## Identifying data dependencies + +DiscoPoP uses a signature to store data dependences. You can configure the settings of this signature by creating a dp.conf file in the root directory of your program. The contents of the config file usually contains the following parameters: + +- `DP_DEBUG`: If `DP_DEBUG` is set to one, DiscoPoP prints out debug information. +- `SIG_ELEM_BIT`: Size of each element in the signature in bits. +- `SIG_NUM_ELEM`: Size of the signature. The bigger it is, the less false positives/negatives are reported. +- `SIG_NUM_HASH`: Number of signatures. A value of two indicates that one signature is used for read accesses and one signature for write accesses. +- `USE_PERFECT`: When it is set to one, DiscoPoP uses a perfect signature. The default value is one. + +To find parallelization opportunities, we need to extract data dependencies inside the program. For that, we need to instrument the memory accesses, link the program with DiscoPoP run-time libraries, and finally execute the program with several representative inputs. For the instrumentation, you need to apply Command 2 to each file of the program. However, you can also apply Command 2 to specific files if you are interested to find parallelization opportunities in those files and you are sure that there are no (data or control) dependencies with the uninstrumented files. We always recommend profiling all the files in a program to find all the available parallelization opportunities. + +```sh +$CLANG -g -O0 -S -emit-llvm -fno-discard-value-names \ + -Xclang -load -Xclang ${DISCOPOP_BUILD}/libi/LLVMDPInstrumentation.so \ + -mllvm -fm-path -mllvm ./FileMapping.txt \ + -I $include_dir -o${src_file}_dp.ll $src_file +``` +*(Command 2: Instrumenting memory access instructions in a input file)* + +To link the instrumented program with DiscoPoP libraries, you need to execute Command 3: + +```sh +$CLANG++ ${src_file}_dp.ll -o dp_run -L${DISCOPOP_BUILD}/rtlib -lDiscoPoP_RT -lpthread +``` +*(Command 3: Linking instrumented code with DiscoPoP runtime libraries)* + +When you have instrumented and linked the program with DiscoPoP runtime libraries, you need to execute it with several representative inputs. We need this constraint to make sure that we minimize the chance of missing data dependences in code sections which might not be covered with a specific input. We explain how to find out which parts of the program were not executed with the given input in Section 5. + +To execute the program, we use Command 4. + +```sh +./dp_run +``` +*(Command 4: Executing the program to obtain data dependences)* + +After executing the program, you find a text file which ends with “{ExecutableName}_dep.txt” which contains the data dependences identified with the provided input. For this example, the ExecutableName is dp_run and thus the dependence file is dp_run_dep.txt + +## Finding reductions in loops + +Some loops may have inter-iteration dependencies that can be resolved using the OpenMP reduction clause. To identify such data dependences, we need to instrument the loops. DiscoPoP automatically detects these loops. We use Command 5 for the instrumentation. + +```sh +$CLANG -g -O0 -S -emit-llvm -fno-discard-value-names \ + -Xclang -load -Xclang ${DISCOPOP_BUILD}/libi/LLVMDPReduction.so \ + -mllvm -fm-path -mllvm ./FileMapping.txt \ + -I $include_dir -o ${src_file}_red.bc $src_file +``` +*(Command 5: Instrumenting loops with the LLVM pass which detects reduction pattern )* + +Then, we need to link the files to generate the executable. Command 6 performs this. + +```sh +$CLANG $bin_dir/${src_file}_red.bc -o dp_run_red -L${DISCOPOP_BUILD}/rtlib -lDiscoPoP_RT -lpthread +``` +*(Command 6: Linking the instrumented loops with DiscoPoP runtime libraries for the reduction detection)* + +Finally, we should execute the program to obtain the reduction opportunities. Command 7 executes our test program. + +```sh +./dp_run_red +``` +*(Command 7: executing the program which is instrumented to detect reduction pattern)* + +After execution, you will find a file named `reduction.txt` in your root directory. This file contains information about the loops which contain reduction operation. + +Please note that the overhead for reduction detection is less than that of profiling the whole program because we merely profile specific loops in the program. + +## Detecting parallel patterns and parallelization suggestions + +To find the patterns and the parallelization suggestions, you need to call the discopop python module with the appropriate input, i.e., `Data.xml`, `dp_run_dep.txt`, and `reduction.txt` file. + +Figure 1 demonstrates a simplified view of the computational units and the relevant data dependencies in the loops of function “initialize” in our test program. Based on the information, the pattern detector component identifies that the loops in the function are doall loops. Further, the pattern implementor suggests to wrap the loops with OpenMP parallel for constructs and the related data sharing clauses. Figure 1 also contains the parallelization suggestions. + +![A simplified view of the CUs, data dependences, parallel patterns and parallelization suggestions which are identified in function “initialize” of our test program.](https://github.com/discopop-project/discopop/raw/master/docs/img/init1.svg) +*(Figure 1: A simplified view of the CUs, data dependences, parallel patterns and parallelization suggestions which are identified in function “initialize” of our test program. )* + +Moreover, Figure 2 shows the analysis information for function compute in the test program. Unlike the function “initialize”, there is an inter-iteration dependence which can be resolved with the OpenMP reduction clause. + +![A simplified view of the CUs, data dependences, parallel patterns and parallelization suggestions which are identified in function “compute” of our test program.](https://github.com/discopop-project/discopop/raw/master/docs/img/reduction1.svg) +*(Figure 2: A simplified view of the CUs, data dependences, parallel patterns and parallelization suggestions which are identified in function “compute” of our test program. )* + +Please note that like many scientific applications which work on arrays or matrices, the suggestions do not change if we change the input size. Thus, it is possible to analyze the program with small inputs, obtain the parallelization suggestions and execute the parallelized version with larger inputs. However, this is a recommendation merely and it being applicable or not depends highly on the code. + +Furthermore, we need to mention that DiscoPoP has an optimistic approach towards parallelization and thus programmers require to validate the final suggestions. Considering the example above, it can be easily confirmed by looking at the loops that there are no inter-iteration dependences. + +## Running serial and parallel codes + +You can execute the codes by inserting the parallelization suggestions into the source code. You need to compile the parallelized program with `-fopenmp`. The speedup which is gained by parallelizing the code highly depends on the hardware platform on which you execute the serial and parallel codes. + +## Common errors + +| Common errors | Solution | +| - | -| +| ModuleNotFoundError: No module named | Install the required python dependencies | +| DiscoPoP finished the analysis fine but it does not provide any suggestions | Delete the FileMapping file and regenerate it | +| node IDs in the `Data.xml` file start with 0 (e.g., id=”0:1”) | FileMapping is not generated fine. Remove it and regenerate it. | +| -g: command not found | Path to clang is not set | +| /libi/LLVMCUGeneration.so: cannot open shared object file: | Path to DiscoPoP build directory is not set correctly | diff --git a/wiki/_Footer.md b/wiki/_Footer.md new file mode 100644 index 000000000..0c77aa985 --- /dev/null +++ b/wiki/_Footer.md @@ -0,0 +1,3 @@ +License + +© DiscoPoP is available under the terms of the BSD-3-Clause license, as specified in the LICENSE file. \ No newline at end of file diff --git a/wiki/_Sidebar.md b/wiki/_Sidebar.md new file mode 100644 index 000000000..e9cc99324 --- /dev/null +++ b/wiki/_Sidebar.md @@ -0,0 +1,16 @@ +### DiscoPoP Wiki +* [Home](https://github.com/discopop-project/discopop/wiki) +* [DiscoPoP Profiler](https://github.com/discopop-project/discopop/wiki/DiscoPoP-Profiler) +* [DiscoPoP Explorer](https://github.com/discopop-project/discopop/wiki/DiscoPoP-Explorer) + +### Setup +* [Setup](https://github.com/discopop-project/discopop/wiki/Setup) +* [Guide](https://github.com/discopop-project/discopop/wiki/Guide) +* [Tutorial](https://github.com/discopop-project/discopop/wiki/Tutorial) + +### Troubleshooting +* [Troubleshooting](https://github.com/discopop-project/discopop/wiki/Troubleshooting) + + + +Version: 1.4.0 \ No newline at end of file