AGMS VTune Hotspot Analysis #19

beepy0 · 2019-04-18T22:18:06Z

No description provided.

beepy0 · 2019-06-27T15:31:59Z

For this test we consider samples of data coming from a Zipfian, Uniform and Discrete Normal distributions respectively. Each distribution had two different sample sizes - one hundred thousand and one million integers in size. This is done partly to monitor if any changes in behavior occur when the algorithm is left to run longer, while the bigger sample size also provides means for more precise analytics via VTune since in some cases, the 100k sample didn't provide sufficient time for the program to gather enough data and do the profiling.

When it comes to the update vector size, we used a size of 16384 tuples which equates to 128 rows and 128 buckets respectively. It is important to note that while AGMS' throughput depends heavily on the size of its update vector, the buckets size alone scales linearly, while for Fast-AGMS throughput is dependent on whether buckets size is a power of 2, due to a truncation function used for its buckets hashing. That function is used as a substitute to the modulo operation which in our testing proved to be more compute-heavy [REF].

Looking at the generated profiles, it is apparent that each run yelded somewhat similar results. The CPU utilization graph on VTune points towards a CPU bottleneck fort AGMS as the whole graph was brown, indicating that the CPU was computing the whole time the program was running.

[PIC]

For a more detailed analysis, we first quickly look at the summary tab. For the zipfian distribution with 100k samples, total execution time was 6.413s and the Top Hotspots section showed the parts of the code that took the most compute time - the functions AGMS_Sketch::Update_Sketch, seq_xor and EH3.

[PIC]

Xi_EH3::element also takes some time but in this case it is neglectable, as the function is mostly used to call EH3 as shown in [REF: SNIPPET].

[SNIPPET]

As pointed out previously, the rest of the distributions showed very similar results with execution times dependent on the data sample size. For this reason we chose to show diagrams of only the zipfian distribution when it comes to this analysis, but in case there is major deviation in any of the other distributions, it will be noted and shown additionally.

Over the tabs menu there is a dropdown menu for additional overview information. We select the HPC Performance Characterization category to reveal a vector-performance analysis. According to this view, the compiler has seen the loops containing floating-point variables and has decided not to vectorize them, so all of the three most time-consuming functions were left as scalar operations. This suggests that there is a potential speed-up to be gained from optimizing via SIMD.

Switching back to the Hotspots overview from the dropdown menu, the Bottom-up/Top-down Tree tabs are useful to get a better picture of what each of the functions is doing. The focus here is the Microarchitecture Usage column which can give hints if we have performance-related issues like branch mispredictions or CPU stalls due to memory latency. The values are represented in percentage and the general rule of thumb is to not get any cells colored in red, as it is the case here. This concludes that the algorithm is mainly compute-bound on this platform for the given parameters.

beepy0 self-assigned this Apr 18, 2019

beepy0 mentioned this issue Apr 18, 2019

Benchmarks and optimization #16

Closed

7 tasks

beepy0 changed the title ~~VTune Hotspot Analysis~~ AGMS VTune Hotspot Analysis Apr 18, 2019

beepy0 added this to the Benchmark/Optimization milestone Jun 18, 2019

beepy0 closed this as completed Oct 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGMS VTune Hotspot Analysis #19

AGMS VTune Hotspot Analysis #19

beepy0 commented Apr 18, 2019

beepy0 commented Jun 27, 2019 •

edited

Loading

AGMS VTune Hotspot Analysis #19

AGMS VTune Hotspot Analysis #19

Comments

beepy0 commented Apr 18, 2019

beepy0 commented Jun 27, 2019 • edited Loading

beepy0 commented Jun 27, 2019 •

edited

Loading