## **Contents**

## Part I Parallel Programming Models and Methodologies

| 1 | Para                                           | allel Pro            | ogramming Models                               | 3  |  |
|---|------------------------------------------------|----------------------|------------------------------------------------|----|--|
|   | Vassilios V. Dimakopoulos                      |                      |                                                |    |  |
|   | 1.1                                            | Introdu              | action                                         | 3  |  |
|   | 1.2                                            | Classif              | fication of Parallel Programming Models        | 4  |  |
|   | 1.3                                            | Shared-Memory Models |                                                | 7  |  |
|   |                                                | 1.3.1                | POSIX Threads                                  | 7  |  |
|   |                                                | 1.3.2                | OpenMP                                         | 8  |  |
|   | 1.4                                            | Distrib              | outed-Memory Models                            | 9  |  |
|   |                                                | 1.4.1                | Message-Passing (MPI)                          | 9  |  |
|   | 1.5                                            | Hetero               | geneity: GPGPU and Accelerator Models          | 10 |  |
|   |                                                | 1.5.1                | CUDA                                           | 11 |  |
|   |                                                | 1.5.2                | OpenCL                                         | 11 |  |
|   |                                                | 1.5.3                | Directive-Based Models                         | 12 |  |
|   | 1.6                                            | Hybrid               | l Models                                       | 13 |  |
|   |                                                | 1.6.1                | Pthreads + MPI                                 | 13 |  |
|   |                                                | 1.6.2                | OpenMP + MPI                                   | 14 |  |
|   |                                                | 1.6.3                | PGAS                                           | 14 |  |
|   | 1.7                                            | Other I              | Parallel Programming Models                    | 15 |  |
|   |                                                | 1.7.1                | Languages and Language Extensions              | 15 |  |
|   |                                                | 1.7.2                | Skeletal Programming                           | 16 |  |
|   | 1.8                                            | Conclu               | ision                                          | 17 |  |
|   | Refe                                           | erences.             |                                                | 17 |  |
| 2 | Con                                            | pilation             | n Tool Chains and Intermediate Representations | 21 |  |
|   | Julien Mottin, François Pacull, Ronan Keryell, |                      |                                                |    |  |
|   | and Pascal Schleuniger                         |                      |                                                |    |  |
|   | 2.1 Introduction                               |                      |                                                |    |  |
|   |                                                | 2.1.1                | Application Domains                            | 21 |  |
|   |                                                | 2.1.2                | Target Platforms                               | 22 |  |

xii Contents

|    | 2.2             | The Tool Chains                                          | 23        |  |  |
|----|-----------------|----------------------------------------------------------|-----------|--|--|
|    | 2.3             | Intermediate Representations                             | 24        |  |  |
|    |                 | 2.3.1 High Level Intermediate Representation: SME-C      | 24        |  |  |
|    |                 | 2.3.2 Low-Level Intermediate Representation: IR2         | 25        |  |  |
|    |                 |                                                          | 26        |  |  |
|    | 2.4             | <u> </u>                                                 | 28        |  |  |
|    |                 |                                                          | 28        |  |  |
|    |                 |                                                          | 30        |  |  |
|    | Refe            |                                                          | 32        |  |  |
| D. | 4 TT            |                                                          |           |  |  |
| Pä | rt II           | HW/SW Architectures Concepts                             |           |  |  |
| 3  |                 |                                                          | 35        |  |  |
|    |                 | en Mottin, Mickael Cartron, and Giulio Urlini            |           |  |  |
|    | 3.1             |                                                          | 35        |  |  |
|    | 3.2             |                                                          | 35        |  |  |
|    | 3.3             | j                                                        | 38        |  |  |
|    |                 | 1                                                        | 38        |  |  |
|    |                 |                                                          | 39        |  |  |
|    | 3.4             |                                                          | 10        |  |  |
|    |                 | 3.4.1 IP Level Cosimulation Platform                     | 10        |  |  |
|    |                 | 3.4.2 HWPE Level Cosimulation Platform                   | 10        |  |  |
|    | 3.5             | Fast Simulation Platform                                 | 11        |  |  |
|    | 3.6             | Simulation Platform Usage                                | 12        |  |  |
|    | Refe            | erences                                                  | 13        |  |  |
| 4  | The             | Architecture and the Technology Characterization         |           |  |  |
|    |                 | n FPGA-Based Customizable Application-Specific           |           |  |  |
|    |                 |                                                          | 15        |  |  |
|    |                 | nan Bartosiński, Martin Daněk, Leoš Kafka, Lukáš Kohout, |           |  |  |
|    |                 | Jaroslav Sýkora                                          |           |  |  |
|    | 4.1             | •                                                        | 15        |  |  |
|    | 4.2             |                                                          | 17        |  |  |
|    | 4.3             |                                                          | 18        |  |  |
|    |                 | 1                                                        | 18        |  |  |
|    |                 |                                                          | 50        |  |  |
|    |                 |                                                          | 51        |  |  |
|    |                 |                                                          | 51        |  |  |
|    |                 |                                                          | 52        |  |  |
|    |                 |                                                          | 52        |  |  |
|    | 4.4             |                                                          | 53        |  |  |
|    | ¬. <del>+</del> |                                                          | 57        |  |  |
|    | 4.5             |                                                          | , ,<br>59 |  |  |
|    | 4.5             | · · · · · · · · · · · · · · · · · · ·                    | 59        |  |  |
|    |                 |                                                          | 53        |  |  |
|    |                 | 7.3.4 IVII IVIC CUIT III SPAITAIIU (                     | טנ        |  |  |

Contents xiii

|    | 4.6    | Techn    | ology Characterization                              | 64  |
|----|--------|----------|-----------------------------------------------------|-----|
|    |        | 4.6.1    | Analysis of Scaling Limits                          | 64  |
|    |        | 4.6.2    | Synthesis Experimental Results                      | 66  |
|    | 4.7    | Appli    | cations                                             | 68  |
|    |        | 4.7.1    | Methodology                                         | 68  |
|    |        | 4.7.2    | Finite Impulse Response (FIR) Filter                | 70  |
|    |        | 4.7.3    | Matrix Multiplication (MATMUL)                      | 71  |
|    |        | 4.7.4    | Mandelbrot Set (MANDEL)                             | 71  |
|    |        | 4.7.5    | Image Segmentation (IMGSEG)                         | 71  |
|    | 4.8    |          | sis of Weaknesses                                   | 73  |
|    |        | 4.8.1    | Operation Frequency                                 | 73  |
|    |        | 4.8.2    | FPGA Area                                           | 73  |
|    |        | 4.8.3    | Full-Reduction Windup Latencies                     | 74  |
|    |        | 4.8.4    | MCU Performance                                     | 74  |
|    | 4.9    |          | nary                                                | 76  |
|    | Refe   | rences   |                                                     | 76  |
| D. | 4 TIT  | D        | Time and Faulta Managament                          |     |
| Pa | rt III | Kun      | -Time and Faults Management                         |     |
| 5  | Faul   | lt Toler | ance                                                | 81  |
| J  |        |          | gosta, Mickael Cartron, and Antonio Miele           | 01  |
|    | 5.1    |          | luction                                             | 81  |
|    | 5.2    |          | amming-Model Support Level                          | 83  |
|    | 5.2    | 5.2.1    |                                                     | 0.5 |
|    |        |          | Reconfiguration of Fault-Tolerant Applications      | 84  |
|    | 5.3    | Off-Li   | ine Analysis and Optimization Level                 | 86  |
|    |        | 5.3.1    | Static Scheduling in the Presence of Real-Time      |     |
|    |        |          | Constraints and Uncertainty due to Recovery Actions | 86  |
|    | 5.4    | Runtii   | me/OS Level                                         | 90  |
|    |        | 5.4.1    | Lightweight Detection Based on Thread Duplication   | 90  |
|    |        | 5.4.2    | Flexible Scrubbing Service for P2012                | 94  |
|    |        | 5.4.3    | OS Support for Fault Tolerance                      | 94  |
|    |        | 5.4.4    | ReDAS, Fault-Management Layer Based                 |     |
|    |        |          | on Thread Level Replication                         | 95  |
|    |        | 5.4.5    | Run-Time Aging Detection and Management             | 97  |
|    |        | 5.4.6    | Fault Tolerance for Multi-Core Platforms Using      |     |
|    |        |          | Redundant Deterministic Multithreaded Execution     | 99  |
|    | 5.5    | Concl    | usion                                               | 100 |
|    | Refe   | erences  |                                                     | 101 |
| 6  | Intr   | oductic  | on to Dynamic Code Generation: An Experiment        |     |
| U  |        |          | x Multiplication for the STHORM Platform            | 103 |
|    |        |          | uroussé, Victor Lomüller, and Henri-Pierre Charles  | 103 |
|    | 6.1    |          | uction                                              | 103 |
|    | 6.2    |          | riew of deGoal                                      | 106 |
|    |        |          | Kernels and Compilettes                             | 106 |

xiv Contents

|    |       | 6.2.2   | Workflow of Code Generation                      | 107 |
|----|-------|---------|--------------------------------------------------|-----|
|    |       | 6.2.3   | A Tutorial Example                               | 108 |
|    | 6.3   | An Ex   | periment on Matrix Multiplication                | 113 |
|    |       | 6.3.1   | Implementation of Matrix Multiplication          | 113 |
|    |       | 6.3.2   | Experimental Results                             | 115 |
|    | 6.4   | Relate  | ed Work                                          | 118 |
|    | 6.5   | Concl   | usion                                            | 120 |
|    | Refe  | rences  |                                                  | 121 |
| Pa | rt IV | Case    | Studies                                          |     |
| 7  | Sign  | al Proc | cessing: Radar                                   | 125 |
|    | Micl  |         | reteau and Claudia Cantini                       |     |
|    | 7.1   | Brief l | Description of the RT-STAP Algorithm             | 125 |
|    |       | 7.1.1   | Detailed Description of the Computational Phases | 127 |
|    |       | 7.1.2   | Data-Parallel Cholesky Factorization             | 129 |
|    | 7.2   | Relate  | ed Tool-Chain                                    | 131 |
|    |       | 7.2.1   | Application and Execution Platform Modeling      | 132 |
|    |       | 7.2.2   | Parallelisation on the STHORM Platform           | 134 |
|    |       | 7.2.3   | IR Code Generation                               | 135 |
|    | 7.3   | Concl   | usion                                            | 138 |
|    | Refe  | rences  |                                                  | 138 |
| 8  |       |         | cessing: Object Recognition                      | 139 |
|    |       |         | ga, George Chasapis, Vassilios V. Dimakopoulos,  |     |
|    | and.  |         | s Aggelis                                        |     |
|    | 8.1   |         | MAX Algorithm                                    | 139 |
|    | 8.2   | DOL/    | BIP-Based Parallelization                        | 141 |
|    |       | 8.2.1   | HMAX System Level Modeling in BIP                | 142 |
|    |       | 8.2.2   | Performance Analysis on the System Model         | 145 |
|    |       | 8.2.3   | Implementation and Experimental Results          | 146 |
|    | 8.3   |         | CL-Based Parallelization                         | 146 |
|    |       | 8.3.1   | Basics of OpenCL                                 | 147 |
|    |       | 8.3.2   | Parallelizing HMAX Using OpenCL                  | 148 |
|    |       | 8.3.3   | First Version: Using Global L3 Memory            | 149 |
|    |       | 8.3.4   | Second Version: Liberal Approach                 | 150 |
|    |       | 8.3.5   | Third Version: Collaborative Approach            | 151 |
|    |       | 8.3.6   | Experimental Results                             | 152 |
|    | 8.4   | Openl   | MP-Based Parallelization                         | 153 |
|    |       | 8.4.1   | OpenMP on STHORM                                 | 153 |
|    |       | 8.4.2   | Parallelizing HMAX Using OpenMP                  | 154 |
|    |       | 8.4.3   | Experimental Results                             | 156 |
|    | Refe  | rences  |                                                  | 157 |

Contents xv

| 9 | Video Processing: Foreground Recognition in the ASVP Platform<br>Petr Honzík, Roman Bartosiński, Martin Daněk, Leoš Kafka, |        |                                    | 159 |  |
|---|----------------------------------------------------------------------------------------------------------------------------|--------|------------------------------------|-----|--|
|   | Lukáš Kohout, and Jaroslav Sýkora                                                                                          |        |                                    |     |  |
|   | 9.1 Introduction                                                                                                           |        |                                    | 159 |  |
|   | 9.2                                                                                                                        | Platfo | rm                                 | 161 |  |
|   | 9.3                                                                                                                        | Applio | cation                             | 163 |  |
|   |                                                                                                                            | 9.3.1  | Requirements                       | 163 |  |
|   |                                                                                                                            | 9.3.2  | Motion Detection                   | 163 |  |
|   |                                                                                                                            | 9.3.3  | MoG Implementation                 | 166 |  |
|   |                                                                                                                            | 9.3.4  | Morphological Opening              | 166 |  |
|   |                                                                                                                            | 9.3.5  | Object Labelling                   | 166 |  |
|   | 9.4                                                                                                                        | Imple  | mentation and Results              | 167 |  |
|   |                                                                                                                            | 9.4.1  | Foreground/Background Segmentation | 168 |  |
|   |                                                                                                                            | 9.4.2  | Morphological Opening              | 171 |  |
|   |                                                                                                                            | 9.4.3  | Results                            | 172 |  |
|   | 9.5                                                                                                                        | Summ   | nary                               | 174 |  |
|   | References                                                                                                                 |        |                                    | 175 |  |



http://www.springer.com/978-1-4614-8799-9

Smart Multicore Embedded Systems (Eds.)M. Torquati; K. Bertels; S. Karlsson; F. Pacull 2014, XXVI, 175 p. 77 illus., 54 illus. in color., Hardcover ISBN: 978-1-4614-8799-9