# Arrays results on i7-10875H

The global script `arrays-all.sh` can be used to run them all.

For each table, I put in bold the worst and the best.

## Presumably with g++ 12.2

Options:
- Compilation: `-std=c++17 -march=native -fopt-info-vec-all`
- Runtime: `1024 100000`

### AoS

| Array \ Option         | -O0   | -O2   | -O3   |
| :--------------------- | ----: | ----: | ----: |
| Classic C array        | 0.974 | 0.068 | 0.231 |
| std::array             | 0.959 | 0.069 | 0.215 |
| std::valarray          | 0.955 | 0.082 | 0.236 |
| std::vector            | 1.585 | **0.051** | 0.229 |
| std::list              | **2.141** | 0.278 | 0.290 |

### SoA

| Array \ Option         | -O0   | -O2   | -O3   |
| :--------------------- | ----: | ----: | ----: |
| Classic C array        | 0.654 | 0.073 | 0.040 |
| std::array             | 1.834 | 0.044 | **0.015** |
| std::valarray          | 2.420 | 0.090 | 0.038 |
| std::vector 1          | 1.999 | 0.096 | 0.050 |
| std::vector 2          | 1.193 | 0.079 | 0.087 |
| std::list              | **4.308** | 0.225 | 0.230 |


## With g++ 13.2

The compiler made some visible improvements, and we had to raise up the power of 2 to 200000.

Options:
- Compilation: `-std=c++17 -march=native -fopt-info-vec-all`
- Runtime: `1024 200000`

Some tries have given similar results : `-std=c++20`, additional `-mtune=native`.


### AoS

Execution time in seconds (vectorization):
| Array \ Option         | -O0      | -O2   | -O3   |
| :--------------------- | -------: | ----: | ----: |
| Classic C array        | 1.26     | 0.06 () | **0.04** (32) |
| std::array             | 1.27     | 0.06 () | **0.04** (32) |
| std::valarray          | 1.26     | 0.06 () | **0.04** (32) |
| std::vector            | 2.13     | 0.10 () | **0.04** (32) |
| std::list              | **2.82** | 0.35 () | 0.35     (  ) |



### SoA

Execution time in seconds (vectorization):
| Array \ Option         | -O0      | -O2     | -O3           |
| :--------------------- | -------: | ------: | ------------: |
| Classic C array        | 0.85     | 0.10 () | 0.02     (32) |
| std::array             | 1.26     | 0.06 () | **0.01** (32) |
| std::valarray          | 3.21     | 0.10 () | 0.03     (32) |
| std::vector 1          | 2.69     | 0.10 () | 0.02     (32) |
| std::vector 2          | 1.59     | 0.10 () | 0.10     ()   |
| std::list              | **5.88** | 0.28 () | 0.28     ()   |


As expected:
- `std::list` is the bad guy,
- the option `-O2` option remove the interface boilerplate and make all the * contiguous memory* collection equivalent.
- the option `-O3` presumably add some more vectorization and benefit mostly typ the SoA layout.

Unexpected and ununderstood:
- the second implementation foof `std::vector` does not benefit from `-O3` ?!?

So, they may some differences between an access through iterators, and an access through operator[] ?

Actually, the global time of few milliseconds is too small to be significant., so we made additional tests with -o2 and -o3 with 2000000 elements.

## With g++ 14.2 and 10x more computing

Because it is too long, we remove here the `-O0` option and the `std::list` collection.

Options:
- Compilation: `-std=c++17 -march=native -fopt-info-vec-all`
- Runtime: `1024 2000000`

### AoS

Execution time in seconds (vectorization):
| Array \ Option         | -O2   | -O3   |
| :--------------------- | ----: | ----: |
| Classic C array        | 0.657 () | **0.457** (32) |
| std::array             | 0.629 () | **0.443** (32) |
| std::valarray          | 0.630 () | **0.457** (32) |
| std::vector            | 0.629 () | **0.462** (32) |



### SoA

Execution time in seconds (vectorization):
| Array \ Option         | -O2     | -O3           |
| :--------------------- | ------: | ------------: |
| Classic C array        | 1.007 () | **0.200** (32) |
| std::array             | 0.638 () | **0.123** (32) |
| std::valarray          | 1.001 () | **0.193** (32) |
| std::vector 1          | 1.016 () | **0.190** (32) |
| std::vector 2          | 1.011 () | 1.0003     ()   |


## With g++ 14.2 and -funroll-loops

Options:
- Compilation: `-std=c++17 -march=native -funroll-loops`
- Runtime: `1024 2000000`


### AoS

Execution time in seconds (vectorization):
| Array \ Option         | -O2   | -O3   |
| :--------------------- | ----: | ----: |
| Classic C array        | 0.591 () | **0.622** (32) |
| std::array             | 0.594 () | **0.618** (32) |
| std::valarray          | 0.591 () | **0.621** (32) |
| std::vector            | 0.590 () | **0.627** (32) |



### SoA

Execution time in seconds (vectorization):
| Array \ Option         | -O2     | -O3           |
| :--------------------- | ------: | ------------: |
| Classic C array        | 0.747 () | **0.210** (32) |
| std::array             | 0.591 () | **0.247** (32) |
| std::valarray          | 0.740 () | **0.207** (32) |
| std::vector            | 0.750 () | **0.213** (32) |


The additional `-funroll-loops` mostly benefit to SoA, especially with `-O3`.

## With clang++ 19.1.15

With docker image `silkeh/clang:19`.

Options:
- Compilation: `-std=c++17 -march=native -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize`
- Runtime: `1024 2000000`


### AoS

Execution time in seconds (vectorization width/interleave):
| Array \ Option         | -O2   | -O3   |
| :--------------------- | ----: | ----: |
| Classic C array        | 0.616 () | 0.616 () |
| std::array             | 0.585 () | 0.590 () |
| std::valarray          | 0.616 () | 0.619 () |
| std::vector            | 0.615 () | 0.587 () |



### SoA

Execution time in seconds (vectorization width/interleave):
| Array \ Option         | -O2     | -O3           |
| :--------------------- | ------: | ------------: |
| Classic C array        | 0.241 (4/4) | 0.240 (4/4) |
| std::array             | 0.202 (4/4) | 0.250 (4/4) |
| std::valarray          | 0.240 (4/4) | 0.241 (4/4) |
| std::vector 1          | 0.243 (4/4) | 0.240 (4/4) |
| std::vector 2          | 0.240 (4/4) | 0.240 (4/4)   |


The results are very uniforms between the different flavors of contiguous memory collections, including the `vector2` flavor, but the results are not as good as with the g++. Also:
- clang++ is vectorizing already at the option level `-O2`.
- The magic option `-funroll-loops` does not nmake miracles wit h `clang++`. 
- So to get the `interleave` of `4` with `C array`, we had to make `__restrict__` the pointers. 

# Essayer

- Confronter les intuitions a QuickBench ?


© *CNRS 2024*
*Assembled and written in french by David Chamont, this work is made available according to the terms of the [Creative Commons License - Attribution - NonCommercial - ShareAlike 4.0 International](http://creativecommons.org/licenses/by-nc-sa/4.0/)*