# Choose your data structure

## AoS (Array of Structs)

In the code example below, a "SAXPY" (`y = a*x+y`)  calculation is done on a collection of `XY` elements.

In [None]:
%%file tmp.xy.h

struct XY
 {
  double x, y {0.} ;
  void saxpy( double a )
   { y = a*x + y ; }
 } ;

In [None]:
%%file tmp.aos-functions.h

#include <cstdlib> // for rand

template< typename Itr >
void randomize_x( Itr begin, Itr end )
 {
  for ( Itr itr = begin ; itr!=end ; ++itr )
   { itr->x = std::rand()/(RAND_MAX+1.)-0.5 ; }
 }

template< typename Itr >
void saxpy( Itr begin, Itr end, double a )
 {
  for ( Itr itr = begin ; itr!=end ; ++itr )
   { itr->saxpy(a) ; }
 }

template< typename Itr >
double accumulate_y( Itr begin, Itr end )
 {
  double res {0.} ;
  for ( Itr itr = begin ; itr!=end ; ++itr )
   { res += itr->y ; }
  return res ;
 }

In [None]:
%%file tmp.aos.cpp

#include "tmp.xy.h"
#include "tmp.aos-functions.h"
#include <cassert> // for assert
#include <cstdlib> // for atoi
#include <iostream>

int main( int argc, char * argv[] )
 {
  assert(argc==3) ;
  std::size_t size {atoull(argv[1])} ;
  std::size_t repeat {atoull(argv[2])} ;
  std::cout.precision(18) ;

  XY * collection {new XY[size]} ;
  auto begin {collection} ;
  auto end {begin+size} ;

  randomize_x(begin,end) ;
  double volatile a {0.1} ;
  while (repeat--)
    saxpy(begin,end,a) ;
  double res {accumulate_y(begin,end)/size} ;
  std::cout<<res<<std::endl ;

  delete [] collection ;
 }

In [None]:
%%file tmp.aos.bash
echo

rm -f tmp.aos.exe tmp.aos.py
g++ -std=c++17 -march=native tmp.aos.cpp -o tmp.aos.exe
./tmp.aos.exe $*

echo "s = 0" >> tmp.aos.py
for i in 0 1 2 3 4 5 6 7 8 9
do \time -f "s += %U" -a -o ./tmp.aos.py ./tmp.aos.exe $* >> /dev/null
done
echo "print('(~ {:.3f} s)'.format(s/10.))" >> tmp.aos.py
python3 tmp.aos.py

echo

In [None]:
!bash -l tmp.aos.bash 1024 100000

The `main` function is currently using an old-fashioned C array, and the script does not set explicitly the GCC optimization option, which means it is using the default `-O0` (no compiler optimization).

 You are asked to try this code, then investigate the alternative arrays `std::array`, `std::valarray`, `std::vector`, `std::list` and the alternative GCC compilation options `-O2` (usual optimisations) and `-O3` (aggressive optimizations, including automatic vectorization). Fill the results below, and try to explain the differences.

| Array \ Option         | -O0  | -O2  | -O3  |
| :--------------------- | ---: | ---: | ---: |
| Classic C array        | 0.   | 0.   | 0.   |
| std::array             | 0.   | 0.   | 0.   |
| std::valarray          | 0.   | 0.   | 0.   |
| std::vector            | 0.   | 0.   | 0.   |
| std::list              | 0.   | 0.   | 0.   |


Note: the coefficient `a` is stored in `volatile` variable so to avoid that the compiler assumes that all the repetition are doing the same calculation and optimize it out ( as `g++ -O3` has proven to do).

```cpp

## SoA (Struct of Arrays)

Now let's try another approach: instead of creating a structure that groups together `x` and` y` and making it into an array (as it is naturally done on an object-oriented approach), let's try to make a global structure that contains an array of  `x` on one hand, and an array of `y` on the other hand.

This is what the code skeleton below offers, again using C arrays and default -O0. Again, try alternative collections and compilation options. Fill the results table and explain.

In [None]:
%%file tmp.soa.h

#include "tmp.xy.h"

class SoA
 {
  public :
    SoA( std::size_t size ) : m_size(size), m_xs(new double[size]), m_ys(new double[size]) {}
    ~SoA() { delete [] m_xs ; delete [] m_ys ; }
    std::size_t size() { return m_size ; }
    XY operator()( std::size_t indice ) const
     { return { m_xs[indice], m_ys[indice] } ; }
    auto & xs() { return m_xs ; }
    auto & ys() { return m_ys ; }
    void saxpy( double a )
     {
      for ( std::size_t i=0 ; i<m_size ; ++i )
        m_ys[i] = a*m_xs[i] + m_ys[i] ;
     }
  private :
    std::size_t m_size ;
    double * m_xs ;
    double * m_ys ;
 } ;

In [None]:
%%file tmp.soa-functions.h

#include "tmp.soa.h"
#include <cstdlib> // for rand

void randomize_x( SoA & collection )
 {
  for ( std::size_t i=0 ; i<collection.size() ; ++i )
   { collection.xs()[i] = std::rand()/(RAND_MAX+1.)-0.5 ; }
 }

double accumulate_y( SoA & collection )
 {
  double res {0.} ;
  for ( std::size_t i=0 ; i<collection.size() ; ++i )
   { res += collection.ys()[i] ; }
  return res ;
 }

In [None]:
%%file tmp.soa.cpp

#include "tmp.soa-functions.h"
#include <iostream>
#include <cassert> // for assert
#include <cstdlib> // for atoi

int main( int argc, char * argv[] )
 {
  assert(argc==3) ;
  std::size_t size {atoull(argv[1])} ;
  std::size_t repeat {atoull(argv[2])} ;

  SoA collection(size) ;
  randomize_x(collection) ;
  double volatile a {0.1} ;
  while (repeat--)
    collection.saxpy(a) ;
  double res = accumulate_y(collection)/size ;

  std::cout.precision(18) ;
  std::cout<<res<<std::endl ;
 }

In [None]:
%%file tmp.soa.bash
echo

rm -f tmp.soa.exe tmp.soa.py
g++ -std=c++17 =march=native tmp.soa.cpp -o tmp.soa.exe
./tmp.soa.exe $*

echo "s = 0" >> tmp.soa.py
for i in 0 1 2 3 4 5 6 7 8 9
do \time -f "s += %U" -a -o ./tmp.soa.py ./tmp.soa.exe $* >> /dev/null
done
echo "print('({:.3f} s)'.format(s/10.))" >> tmp.soa.py
python3 tmp.soa.py

echo

In [None]:
!bash -l tmp.soa.bash 1024 100000

To help in the analysis, [GodBolt](https://godbolt.org/) can be used, which allows to observe the dose of "inlining", or to look for the presence of vectorial instructions in assembly, such as `addpd` (Add Packed Doubles) or` mulpd` (Multiply Packed Double). You can also try to run `g++` with the option `-fopt-info-vec-all`, and try to decipher the output so to know if the code was vectorized or not.

| Array \ Option         | -O0  | -O2  | -O3  |
| :--------------------- | ---: | ---: | ---: |
| Classic C array        | 0.   | 0.   | 0.   |
| std::array             | 0.   | 0.   | 0.   |
| std::valarray          | 0.   | 0.   | 0.   |
| std::vector            | 0.   | 0.   | 0.   |
| std::list              | 0.   | 0.   | 0.   |


Try replace `std::size_t` with `unsigned` as type of the index in the `saxpy` loop. What happens ? Any idea why ?

© *CNRS 2021*
*Assembled and written in french by David Chamont, translated by Karim Hasnaoui, this work is made available according to the terms of the [Creative Commons License - Attribution - NonCommercial - ShareAlike 4.0 International](http://creativecommons.org/licenses/by-nc-sa/4.0/)*