# Arrays results on i7-10875H

The global script `arrays-all.sh` can be used to run them all.

For each table, I put in bold the worst and the best.

## Presumably with g++ 12.2

Options:
- Compilation: `-std=c++17 -march=native -fopt-info-vec-all`
- Runtime: `1024 100000`

### AoS

| Array \ Option         | -O0   | -O2   | -O3   |
| :--------------------- | ----: | ----: | ----: |
| Classic C array        | 0.974 | 0.068 | 0.231 |
| std::array             | 0.959 | 0.069 | 0.215 |
| std::valarray          | 0.955 | 0.082 | 0.236 |
| std::vector            | 1.585 | **0.051** | 0.229 |
| std::list              | **2.141** | 0.278 | 0.290 |

### SoA

| Array \ Option         | -O0   | -O2   | -O3   |
| :--------------------- | ----: | ----: | ----: |
| Classic C array        | 0.654 | 0.073 | 0.040 |
| std::array             | 1.834 | 0.044 | **0.015** |
| std::valarray          | 2.420 | 0.090 | 0.038 |
| std::vector 1          | 1.999 | 0.096 | 0.050 |
| std::vector 2          | 1.193 | 0.079 | 0.087 |
| std::list              | **4.308** | 0.225 | 0.230 |


## With g++ 13.2

The compiler made some visible improvements, and we had to raise up the power of 2 to 200000.

Options:
- Compilation: `-std=c++17 -march=native -fopt-info-vec-all`
- Runtime: `1024 200000`

Some tries have given similar results : `-std=c++20`, additional `-mtune=native`.


### AoS

Execution time in seconds (vectorization):
| Array \ Option         | -O0      | -O2   | -O3   |
| :--------------------- | -------: | ----: | ----: |
| Classic C array        | 1.26     | 0.06 () | **0.04** (32) |
| std::array             | 1.27     | 0.06 () | **0.04** (32) |
| std::valarray          | 1.26     | 0.06 () | **0.04** (32) |
| std::vector            | 2.13     | 0.10 () | **0.04** (32) |
| std::list              | **2.82** | 0.35 () | 0.35     (  ) |



### SoA

Execution time in seconds (vectorization):
| Array \ Option         | -O0      | -O2     | -O3           |
| :--------------------- | -------: | ------: | ------------: |
| Classic C array        | 0.85     | 0.10 () | 0.02     (32) |
| std::array             | 1.26     | 0.06 () | **0.01** (32) |
| std::valarray          | 3.21     | 0.10 () | 0.03     (32) |
| std::vector 1          | 2.69     | 0.10 () | 0.02     (32) |
| std::vector 2          | 1.59     | 0.10 () | 0.10     ()   |
| std::list              | **5.88** | 0.28 () | 0.28     ()   |


As expected:
- `std::list` is the bad guy,
- the option `-O2` option remove the interface boilerplate and make all the * contiguous memory* collection equivalent.
- the option `-O3` presumably add some more vectorization and benefit mostly typ the SoA layout.

Unexpected and understood:
- the second implementation fo `std::vector` does not benefit from `-O3` ?!?

So, they may some differences between an access through iterators, and an access through operator[] ?

## With g++ 14.2

Trying to enlarge differences, and we raised up again to 300000.

Options:
- Compilation: `-std=c++17 -march=native -fopt-info-vec-all`
- Runtime: `1024 300000`


### AoS

Execution time in seconds (vectorization):
| Array \ Option         | -O0      | -O2   | -O3   |
| :--------------------- | -------: | ----: | ----: |
| Classic C array        | 1.90     | 0.09 () | **0.06** (32) |
| std::array             | 1.91     | 0.09 () | **0.06** (32) |
| std::valarray          | 1.90     | 0.09 () | **0.06** (32) |
| std::vector            | 3.18     | 0.09 () | **0.06** (32) |
| std::list              | **4.34** | 0.53 () | 0.53     (  ) |



### SoA

Execution time in seconds (vectorization):
| Array \ Option         | -O0      | -O2     | -O3           |
| :--------------------- | -------: | ------: | ------------: |
| Classic C array        | 1.29     | 0.17 () | 0.03     (32) |
| std::array             | 1.89     | 0.09 () | **0.01** (32) |
| std::valarray          | 4.38     | 0.15 () | 0.02     (32) |
| std::vector 1          | 3.99     | 0.15 () | 0.02     (32) |
| std::vector 2          | 2.41     | 0.14 () | 0.15     ()   |
| std::list              | **8.86** | 0.41 () | 0.41     ()   |


## With clang++ 19.1.15

With docker image `silkeh/clang:19`. Trying to enlarge differences, and we raised up again to 300000.

Options:
- Compilation: `-std=c++17 -march=native -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize`
- Runtime: `1024 300000`


### AoS

Execution time in seconds (vectorization width/interleave):
| Array \ Option         | -O0      | -O2   | -O3   |
| :--------------------- | -------: | ----: | ----: |
| Classic C array        | 1.70     | 0.09 () | **0.09** () |
| std::array             | 1.70     | 0.09 () | **0.09** () |
| std::valarray          | 1.70     | 0.09 () | **0.09** () |
| std::vector            | 2.94     | 0.09 () | **0.09** () |
| std::list              | **3.52** | 0.52 () | 0.52     () |



### SoA

Execution time in seconds (vectorization):
| Array \ Option         | -O0      | -O2     | -O3           |
| :--------------------- | -------: | ------: | ------------: |
| Classic C array        | 1.15     | **0.03** (4/1) | **0.03** (4/1) |
| std::array             | 2.83     | **0.03** (4/4) | **0.03** (4/4) |
| std::valarray          | 3.43     | **0.03** (4/4) | **0.03** (4/4) |
| std::vector 1          | 3.57     | **0.03** (4/4) | **0.03** (4/4) |
| std::vector 2          | 2.39     | **0.04** (4/4) | **0.03** (4/4)   |
| std::list              | **6.74** | 0.48 () | 0.49     ()   |


As one can see in the performance and in the compilation outputs, clang++ is vectorizing already at the option level `-O2`. The results are very uniforms between the different flavors of contiguous memory collections, but the results are not as good as with the g++ for `std::array`.

# Essayer

- confronter les intuitions a QuickBench ?


© *CNRS 2024*
*Assembled and written in french by David Chamont, this work is made available according to the terms of the [Creative Commons License - Attribution - NonCommercial - ShareAlike 4.0 International](http://creativecommons.org/licenses/by-nc-sa/4.0/)*