Skip to content

Latest commit

 

History

History
113 lines (84 loc) · 8 KB

barcelona.org

File metadata and controls

113 lines (84 loc) · 8 KB

Barcelona : visiting the BSC

Here’s a summary of the different activities I did with Arnaud and Co. during three days in Barcelona (2014/10/29-31)

Day 1

The morning, I present Ocelotl and the aggregation techniques that are implemented. Slides are present here. The principle is to provide overviews of large traces (we reach 30 GB and 3 billion events) by using both data and visual aggregation to render an easily interpretable representation of the application behavior. Data aggregation is based on information theory concepts like Shannon entropy and Kullback Leibler divergence. User provide a parameter which helps to determine a partition of the system. This partition aggregates in priority the parts of the representation where the behavior is homogeneous. This behavior can be described through statistics like the duration of states (functions) that occur during the application execution, the number of events, the average value of a variable… To give meaning to the aggregation, it is essential to structure the trace and constrain the aggregation. This structure helps to find which can be responsible of the observed behavior : for instance, in the case of a spatiotemporal aggregation, the hierarchical structure of the hardware components can highlight behaviors that can be explained by the difference between the machines (computing capabilities, memory, I/O, network, etc.).

During the presentation, I present mainly two cases:

  • A multimedia application that shows the capabilities of the temporal overview by highlighting easily pertubations due to an overloaded buffer. This is the main use-case of the SoC-Trace project, within Ocelotl is developed.

images/ts_record.png

In the image above, we can see the temporal overview provided by Ocelotl. It is a stacked bar chart, representing the quantity of time passed in each different type of states for each temporal aggregate. These events are shown using different colors. Hovering the visualization enables to get more details. Spatial dimension (the resources) is aggregated after the partitionning. On the bottom right, information curves (green: complexity gain, red: information gain) help to find aggregations that are relevant. We are also working on detecting automatically the best aggregation by using the shapes of these curves.

  • A MPI application LU, executed on 700 processes (3 clusters, almost 100 multicore nodes). This is a very interesting case to show what we gain by using the spatiotemporal overview instead of the temporal one: we are indeed able to detect that a cluster behaves heterogeneously during all the computation phase, while the temporal view is only able to show a temporal perturbation that occurs on another cluster.

images/lu_t.png

images/lu_st.png

The first image shows the temporal overview, the second the spatiotemporal overview. In this last one, the time is the x axis, and the resources are on the y axis. These resources are organized as a hierarchy that is not explicitely shown (and has to be deduced by the aggregation pattern). We plan to add an explicit representation of this y axis. Colors represent here the mode, i.e. the most present state in the aggregate. We also use alpha (transparency) to vary the color intensity, which is a function of the time duration amplitude. Diagonal bars represent visual aggregates, i.e. part of the trace that are spatially not aggregated according the aggregation algorithm, but that are too small to be represented correctly separately. Simple diagonals (falling from left to right) indicate that all the nodes that are visually aggregated share the same temporal pattern. Thicker diagonals (rising from left to right) indicate that visually aggregated nodes do not share their temporal pattern. We are thinking about improving these textures to distinguish both aggregation types better.

The night, we discuss with Arnaud about the possibility to convert Pajé traces (more precisely, pjdump traces, which are Pajé traces processed by pj_dump in order to rebuild some events like the states and the links) to Paraver format, which is the format used by the Paraver tool. There is some similarities between both formats (same kind of concepts about the event semantics) and a Perl script will be perfect to perform the conversion. While Arnaud modifies the script he already wrote to include some aspects that were not taken into account, I develop a wrapper to pipeline this script with the pjdump importer already present in Framesoc. By the way, Framesoc is the trace and tool management infrastructure whose Ocelotl is a plug-in. This infrastructure stores traces as data base, which is efficient to read them quickly and perform some filtering or complex queries. Framesoc also contains a Gantt chart and some statistic visualizations.

Day 2

I also apply the same principle to import Pajé traces to Framesoc. Thanks to Lucas who have added a way to compile pjdump in a static way, I can integrate it into a Pajé importer linked to the pjdump importer. It’s not the most efficient technique, but it enables to import easily Pajé trace without having to convert them in pjdump format first. For Paraver, there is some problems with the hierarchy that does not fit well. However, I succeed in importing a trace, despite some errors. Augustin gives me a Pajé trace (SMPI BigDFT, 40 threads) that we will be use as a comparison with Paraver trace of the same application. I let Arnaud explain the details.

In this trace, we easily spot that one MPI process has a behavior different from the others. It is also visible (but less easily) in the Framesoc’s Gantt chart.

images/smpi_bigdft_40_ocelotl.png

images/smpi_bigdft_40_framesoc.png

I also get a big Paraver trace from the Barcelonians (16 GB). A problem: I don’t have enough room on my disk and I have to upload it on a distant server to process it.

The night, we work very late with Arnaud : we first solve the problem of the nodes hierarchy and the thread associated with them. We have to understand in details the header of the Paraver trace using the documentation. We will not have converted the big Paraver trace when I will go to sleep, because of some issues on the network…

We also want to convert a Pajé trace into Paraver, and in particular the big 12 GB and 700 processes Nancy LU trace. The objective is to give this trace to Juan and Harald and see how they would analyze a trace of such size. Arnaud succeed in writing a script, despite some errors that don’t provide the right state names.

Day 3

After correcting some bugs the morning, I succeed in opening the big Paraver Trace. Here is an example of what I get.

images/paraver_bigdft_unbalanced.png

This is a pretty good result, because with Paraver, it seems cumbersome to open the full trace that barely fits in RAM. It has to be cut and filtered before if I understood well. In my case, I take advantage of the database and the fact that the readen events are not all stored in the memory. The importing and indexing times are a bit long (a dozen of minute), but this step is just necessary once, because the data base is kept for the further analysis sessions. Ocelotl reads and prints a result in few minutes.

We also show the result of a small paraver trace of 19 MB

images/paraver_bigdft_mpitrace.png

However, for both cases, there is some issues that hinder us when we try to analyze these traces: in particular, states have wrong names and we cannot relate them to the right function call.

The afternoon, I provide the 12GB LU 700 processes Pajé trace to Juan and Harald. Harald succeed in opening it, however, the conversion was not correct: a MPI state was mapped on the state with the ID 0, which corresponds to the Idle state, according to the convention used by Paraver. We should reconvert this trace properly by correcting the perl script. We hope meet the Barcelonians again in one month and this time, we will provide them a correct trace.