The LvArray::ArrayOfArrays
provides functionality similar to a std::vector< std::vector< T > >
but with just three allocations. The primary purpose is to reduce the number of memory transfers when moving between memory spaces but it also comes with the added benefit of reduced memory fragmentation.
The LvArray::ArrayOfArrays
requires three template arguments.
T
: The type of values stored in the inner arrays.INDEX_TYPE
: An integral type used in index calculations, the suggested type isstd::ptrdiff_t
.BUFFER_TYPE
: A template template parameter specifying the buffer type used for allocation and de-allocation, theLvArray::ArrayOfArrays
contains aBUFFER_TYPE< T >
along with twoBUFFER_TYPE< INDEX_TYPE >
.
LvArray::ArrayOfArrays
has a single constructor which takes two optional parameters; the number of inner arrays and the capacity to allocate for each inner array. Both values default to zero.
Given an ArrayOfArrays< T, ... > array
, an equivalent std::vector< std::vector< T > > vector
, a non negative integer n
, integer i
such that 0 <= i < vector.size()
, integer j
such that 0 <= j < vector[ i ].size()
, two iterators first
and last
and a variadic pack of parameters ... args
then following are equivalent:
Getting information about the outer array or a specific inner array |
---|
LvArray::ArrayOfArrays< T, ... > std::vector< std::vector< T > > |
==================================== =================================== |
array.size() vector.size() |
array.capacity() vector.capacity() |
array.sizeOfArray( i ) vector[ i ].size() |
array.capacityOfArray( i ) vector[ i ].capacity() |
Modifying the outer array |
---|
LvArray::ArrayOfArrays< T, ... > std::vector< std::vector< T > > |
======================================= ======================================================================== |
array.reserve( n ) vector.reserve( n ) |
array.resize( n ) vector.resize( n ) |
array.appendArray( n ) vector.push_back( std::vector< T >( n ) ) |
array.appendArray( first, last ) vector.push_back( std::vector< T >( first, last ) ) |
array.insertArray( i, first, last ) vector.insert( vector.begin() + i, std::vector< T >( first, last ) ) |
array.eraseArray( i ) vector.erase( vector.first() + i ) |
Modifying an inner array |
---|
LvArray::ArrayOfArrays< T, ... > std::vector< std::vector< T > > |
=============================================== ============================================================================== |
array.resizeArray( i, n, args ... ) vector[ i ].resize( n, T( args ... ) ) |
array.clearArray( i ) vector[ i ].clear() |
array.emplaceBack( i, args ... ) vector[ i ].emplace_back( args ... ) |
array.appendToArray( i, first, last ) vector[ i ].insert( vector[ i ].end(), first, last ) |
array.emplace( i, j , args ... ) vector[ i ].emplace( j, args ... ) ` |
array.insertIntoArray( i, j , first, last ) vector[ i ].insert( vector[ i ].begin() + j, first, last ) |
eraseFromArray( i, j, n ) vector[ i ].erase( vector[ i ].begin() + j, vector[ i ].begin() + j + n ) . |
Accessing the data |
---|
LvArray::ArrayOfArrays< T, ... > std::vector< std::vector< T > > |
==================================== =================================== |
array( i, j ) vector[ i ][ j ] |
array[ i ][ j ] vector[ i ][ j ] |
It is worth noting that operator[]
returns a one dimensional LvArray::ArraySlice
.
../../examples/exampleArrayOfArrays.cpp
[Source: examples/exampleArrayOfArrays.cpp]
LvArray::ArrayOfArraysView
is the view counterpart to LvArray::ArrayOfArrays
. The LvArray::ArrayOfArraysView
shares the three template parameters of LvArray::ArrayOfArraysView
but it has an additional boolean parameter CONST_SIZES
which specifies if the view can change the sizes of the inner arrays. If CONST_SIZES
is true the sizes
buffer has type BUFFER_TYPE< INDEX_TYPE const >
otherwise it has type BUFFER_TYPE< std::remove_const_t< INDEX_TYPE > >
.
The LvArray::ArrayOfArraysView
doesn't have a default constructor, it must always be created from an existing LvArray::ArrayOfArrays
. From a LvArray::ArrayOfArrays< T, INDEX_TYPE, BUFFER_TYPE >
you can get three different view types.
toView()
returns aLvArray::ArrayOfArraysView< T, INDEX_TYPE const, false, BUFFER_TYPE >
which can modify existing values and change the size of the inner arrays as long as their size doesn't exceed their capacity. -toViewConstSizes()
returns aLvArray::ArrayOfArraysView< T, INDEX_TYPE const, true, BUFFER_TYPE >
which can modify existing values but cannot change the size of the inner arrays.toViewConst()
returns aLvArray::ArrayOfArraysView< T const, INDEX_TYPE const, true, BUFFER_TYPE >
which provides read only access.
../../examples/exampleArrayOfArrays.cpp
[Source: examples/exampleArrayOfArrays.cpp]
LvArray::ArrayView
also has a method emplaceBackAtomic
which is allows multiple threads to append to a single inner array in a thread safe manner.
../../examples/exampleArrayOfArrays.cpp
[Source: examples/exampleArrayOfArrays.cpp]
In order to use the LvArray::ArrayOfArrays
efficiently it is important to understand the underlying data structure. A LvArray::ArrayOfArrays< T, INDEX_TYPE, BUFFER_TYPE > array
contains three buffers:
BUFFER_TYPE< T > values
: Contains the values of each inner array.BUFFER_TYPE< INDEX_TYPE > sizes
: Of lengtharray.size()
,sizes[ i ]
contains the size of inner arrayi
.BUFFER_TYPE< INDEX_TYPE > offsets
: Of lengtharray.size() + 1
, inner arrayi
begins atvalues[ offsets[ i ] ]
and has capacityoffsets[ i + 1 ] - offsets[ i ]
.
Given M = offsets[ array.size() ]
which is the sum of the capacities of the inner arrays then
array.appendArray( n )
is O( n ) because it entails appending a new entry tosizes
andoffsets
and then appending then
new values tovalues
.array.insertArray( i, first, last )
isO( array.size() + M + std::distance( first, last )
. It involves inserting an entry intosizes
andoffsets
which isO( array.size() )
, making room invalues
for the new array which isO( M )
and finally copying over the new values which isO( std::distance( first, last ) )
.array.eraseArray( i )
isO( array.size() + M )
. It involves removing an entry fromsizes
andoffsets
which isO( array.size() )
and then it involves shifting the entries invalues
down which isO( M )
.- The methods which modify an inner array have the same complexity as their
std::vector
counterparts if the capacity of the inner array won't be exceeded by the operation. Otherwise they have an added cost ofO( M )
because new space will have to be made invalues
. So for examplearray.emplaceBack( i, args ... )
isO( 1 )
ifarray.sizeOfArray( i ) < array.capacityOfArray( i )
andO( M )
otherwise.
LvArray::ArrayOfArrays
also supports two methods that don't have an std::vector< std::vector >
counterpart. The first is compress
which shrinks the capacity of each inner array to match it's size. This ensures that the inner arrays are contiguous in memory with no extra space between them.
../../examples/exampleArrayOfArrays.cpp
[Source: examples/exampleArrayOfArrays.cpp]
The second is resizeFromCapacities
which takes in a number of inner arrays and the capacity of each inner array. It clears the ArrayOfArrays
and reconstructs it with the given number of empty inner arrays each with the provided capacity. It also takes a RAJA execution policy as a template parameter which specifies how to compute the offsets array from the capacities. Currently only host execution policies are supported.
../../examples/exampleArrayOfArrays.cpp
[Source: examples/exampleArrayOfArrays.cpp]
The three types of LvArray::ArrayOfArrayView
obtainable from an LvArray::ArrayOfArrays
all act differently when moved to a new memory space.
LvArray::ArrayOfArraysView< T, INDEX_TYPE const, false, LvArray::ChaiBuffer >
, obtained by callingtoView()
. When it is moved to a new space the values are touched as well as the sizes. The offsets are not touched.LvArray::ArrayOfArraysView< T, INDEX_TYPE const, true, LvArray::ChaiBuffer >
, obtained by callingtoViewConstSizes()
. When it is moved to a new space the values are touched but the sizes and offsets aren't.LvArray::ArrayOfArraysView< T const, INDEX_TYPE const, true, LvArray::ChaiBuffer >
, obtained by callingtoViewConst()
. None of the buffers are touched in the new space.
Calling the explicit move
method with the touch parameter set to true
on a view type has the behavior described above. However calling move( MemorySpace::host )
on an LvArray::ArrayOfArrays
will also touch the offsets (if moving to the GPU the offsets aren't touched). This is the only way to touch the offsets so if an LvArray::ArrayOfArrays
was previously on the device then it must be explicitly moved and touched on the host before any modification to the offsets can safely take place.
../../examples/exampleArrayOfArrays.cpp
[Source: examples/exampleArrayOfArrays.cpp]
When LVARRAY_BOUNDS_CHECK
is defined access via operator[]
and operator()
is checked. If an invalid access is detected the program is aborted. Methods such as sizeOfArray
, insertArray
and emplace
are also checked.
../../examples/exampleArrayOfArrays.cpp
[Source: examples/exampleArrayOfArrays.cpp]
Like with the LvArray::Array
it is a good idea to pass around the most restrictive LvArray::ArrayOfArrays
object possible. If a function only reads the values of an array it should accept an LvArray::ArrayOfArraysView< T const, INDEX_TYPE const, true, BUFFER_TYPE >
. If it only reads and writes to the values of the inner arrays it should accept an LvArray::ArrayOfArraysView< T, INDEX_TYPE const, true, BUFFER_TYPE >
. If it will modify the size of the inner arrays and can guarantee that their capacity won't be exceeded it should accept a LvArray::ArrayOfArraysView< T, INDEX_TYPE const, false, BUFFER_TYPE >
. Only when the function needs to modify the outer array or can't guarantee that the capacity of an inner array won't be exceeded should it accept an LvArray::ArrayOfArrays
.
Examining the computational complexities listed above resizing the inner arrays of an LvArray::ArrayOfArrays
can be as fast as the equivalent operations on a std::vector< std::vector< T > >
only if the capacity of the inner arrays is not exceeded. Whenever possible preallocate space for the inner arrays. The following examples demonstrate the impact this can have.
One use case for the LvArray::ArrayOfArrays
is to represent the node-to-element map of a computational mesh. Usually this is constructed from a provided element-to-node map which is a map from an element index to a list of node indices that make up the element. For a structured mesh or an unstructured mesh made up of a single element type this map can be represented as a two dimensional LvArray::Array
since every element has the same number of nodes. However the inverse node-to-element map cannot be easily represented this way since in general not all nodes are a part of the same number of elements. In a two dimensional structured mesh an interior node is part of four elements while the four corner nodes are only part of a single element.
This is an example of how to construct the node-to-element map represented as a std::vector< std::vector >
from the element-to-node map.
../../benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp
[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]
For a mesh with N
nodes this construction has complexity O( N )
since there are N
calls to std::vector::emplace_back
each of which is O( 1 )
. The same approach works when the node-to-element map is a LvArray::ArrayOfArrays
however the complexity is much higher.
../../benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp
[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]
Since nothing is done to preallocate space for each inner array every call to LvArray::ArrayOfArrays::emplaceBack
has complexity O( M )
where M
is the sum of the capacities of the inner arrays. M
is proportional to the number of nodes in the mesh N
so the entire algorithm runs in O( N * N )
.
This can be sped up considerably with some preallocation.
../../benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp
[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]
Since this method guarantees that the capacity of each inner arrays won't be exceeded the complexity is reduced back down to O( N )
. In addition the loop appending to the inner arrays can be parallelized.
One problem with this approach is that it may allocate significantly more memory than is necessary to store the map since most of the inner arrays may not have the maximal length. Another potential issue is that the array is not compressed since some inner arrays will have a capacity that exceeds their size. Precomputing the size of each inner array (the number of elements each node is a part of) and using resizeFromCapacities
solves both of these problems and is almost as fast as over allocating.
../../benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp
[Source: benchmarks/benchmarkArrayOfArraysNodeToElementMapConstructionKernels.cpp]
The following timings are from a clang 10 release build on LLNL's Quartz system run with a 200 x 200 x 200
element structured mesh. The reported time is the best of ten iterations.
Function | RAJA Policy | Time |
---|---|---|
vector |
N/A | 0.99s |
overAllocation |
seq_exec 0 |
.49s |
resizeFromCapacities |
seq_exec 0 |
.58s |
overAllocation |
omp_parallel_for_exec |
0.11s |
resizeFromCapacities |
omp_parallel_for_exec |
0.17s |
The naive
method is much to slow to run on this size mesh. However on a 30 x 30 x 30
mesh it takes 1.28 seconds.