Skip to content

akhilarunkumar/GPUJoule_release

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the release of GPUJoule energy estimation framework If you have any questions regarding GPUJoule please email aarunkum@asu.edu.

If you find this work useful, please consider citing the following papers

* A. Arunkumar, E. Bolotin, D. Nellans, and C.-J. Wu, "Understanding the
Future of Energy Efficiency in Multi-Module GPUs", in proceedings of the
IEEE International Symposium of High Performance Computer Architecture,
2019.

* A. Arunkumar, E. Bolotin, B. Cho, U. Milic, E. Ebrahimi, O. Villa,
A. Jaleel, C.-J. Wu, and D. Nellans, "MCM-GPU: Multi-Chip Module GPUs
for Continued Performance Scalability", in proceedings of the 44th Annual
International Symposium on Computer Architecture, 2017.

* U. Milic, O. Villa, E. Bolotin, A. Arunkumar, E. Ebrahimi, A. Jaleel,
A. Ramirez, and D. Nellans, "Beyond the Socket: NUMA-Aware GPUs", in 
proceedings of the 50th Annual IEEE/ACM International Symposium on
Microarchitecture, 2017.

GPUJoule Description


GPUJoule is an instruction level energy model for GPUs. It measures the total energy of GPU application execution as the sum total of the energy consumed for executing the different instructions on the GPU. To build the GPUJoule energy model for the GPU of interest, we would first estimate the energy cost of executing different compute instructions (EPI), memory transactions (EPT), and SM lane stalls (EPStall). Then we can combine these energy costs by the number of different types of instruction execution to estimate the GPU energy consumption.

If the GPU supports native instructions of type 1,2, ... n, and memory transactions of type 1,2, ... m, then the GPU energy is estimated as:

GPU-Energy = (IC1 * EPI-1) + (IC2 * EPI-2) + ... + (ICn * EPI-n) +
             (TC1 * EPT-1) + (TC2 * EPT-2) + ... + (TCm * EPT-m) + 
             (Nstall * EPStall) +
             (GPUIdlePower * ExecutionTime)

Release Directory Structure


This release follows the below directory structure:

|-> energy_model_ubench // Directory containing microbenchmarks that are used to construct the GPU energy model
    |-> compute_epi // Compute instruction microbenchmarks
    |-> data_movement_ept // Data movement instruction microbenchmarks
    |-> stall_energy // Stall energy related microbenchmarks
    |-> energy_model_data // Placeholder directory for power and execution time data generated by running the microbenchmarks
    |-> run_compute_ubench.sh // script to execute compute instruction microbenchmarks
    |-> run_datamovement_ubench.sh // script to execute data movement instruction microbenchmarks
    |-> run_stall_ubench.sh // script to execute stall energy microbenchmarks
|-> validation_ubench // directory containing microbenchmarks that can be used for energy model validation
|-> nvml_power_monitor/example // Directory containing a sample program that uses nvml to periodically measure the frequency, power consumption, and temperature of the GPU card.
|-> README // this readme file.

Please update


Please update the GPUJOULE_DIR macro in the microbenchmark files and scripts. Please update the GPU_NAME macro in the power_monitor.c file. Please update the NumCTA macro in the run scripts. We suggest (2 x SMs) on GPU.


Compiling Microbenchmarks


The microbenchmarks can be compiled using nvcc. To ensure the compiler does not optimize away the intended functionality of the microbenchmarks, disable nvcc frontend and backend compiler optimizations. This can be done by using a command similar to the one below:

if the microbenchmark that's being compiled is available at "$bench"

nvcc -O0 -Xcompiler -O0 -Xptxas -O0 $bench -o "$bench".out

To Generate EPI / EPT Values


After compiling the microbenchmarks, each benchmark can be executed using the sample scripts provided in energy_model_ubench directory. The microbenchmarks/ scripts will generate a power trace and an execution time file for each microbenchmark. Using this we can get the energy consumption of the microbenchmarks.

For a second run of the same benchmarks, collect performance counter data for various instruction types that can be measured. You should typically see the counts measured on the most relavant performance counter, to the expected number of operations in the microbenchmark to match up. Depending on the microbenchmark, you might see significant counts for "misc insts", "bit convert insts" and stalls. These are overheads that occur in many microbenchmarks.

To find the EPI of instruction of interest precisely, we should subtract the idle power energy, and other overhead energy consumptions.

The energy consumption for "misc insts" can be found using the mov inst microbenchmark. The misc insts are generally mov instructions in our experience.

The stall energy consumption can be measured using the microbenchmarks under the stall_energy directory. These benchmarks combine FADD64 and L1D cache data instructions in different ratios. We notice that FADD64 instruction doesn't introduce any stalls. Therefore, to measure the stall energy we can use total energy consumption of these microbenchmarks as the dependent variable and number of Stalls, and number of L1D accesses as independent variables and perform linear regression to find the EPStall.

The EPI/EPT/EPStall values can be used together to form the GPUJoule energy model.


Energy model validation


We release a set of microbenchmarks to validate the GPUJoule energy model. These microbenchmarks combine instructions of different type. By measuring the energy consumption of these microbenchmarks using hardware measurements, and comparing them to the energy estimates given by GPUJoule, we can validate the energy model and also identify any coverage issues.

We can perform similar validation using real GPU applications.

About

Release of GPUJoule energy modeling framework for GPUs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages