![DLI Header](../images/DLI_Header.png)

# From Raw Initialization Loop to Standard Library Algorithms

In this notebook you will refactor the raw loops in the `initialize` function to use standard library algorithms.

## Learning Objectives

By the time you complete this notebook you should:

- Understand potential the potential performance gains made from creating and/or keeping data from unnecessary device migrations
- Be familiar with the `std::views:iota` range factory and the `std::for_each_n` and `std::fill_n` algorithms
- Be able to refactor the `initialize` function so that it can later generate data directly on the GPU

## Reducing Communication Between Devices

In the next notebook we will finally parallelize `daxpy` in order that it can run on accelerator devices like GPUs. When doing so, it is important to avoid unnecessary memory migrations across devices, including from the CPU to a GPU device. With this in mind, we would like to prepare the `initialize` function to also be able to perform its work in parallel, on the GPU device.

The amount of work required to perform the initialization may not in fact be sufficient to saturate the available GPU, and were it the only computation in an application, might be best-suited for remaining on the CPU. However, because the time required to transfer memory between devices, including from the CPU to the GPU, an often be quite high, it is often a good idea to create and/or keep data on the GPU in order to reduce communication overheads. While the single function may not benefit from the acceleration, the overall performance of the application cab be improved.

## Initializing `x` with the `std::for_each_n` Algorithm and the `std::views::iota` Range Factory

If you recall, in the `initialize` function we initialize the values of the `x` vector with a raw loop, assigning each value its index.

```c++
  for (std::size_t i = 0; i < x.size(); ++i) {
    x[i] = (double)i;
   ...
  }
```

In the following exercise you will use the range factory `std::views::iota`, setting its initial value to `0`, and the size of the `x` vector as input iterator and size arguments respectively to the `std::for_each_n` algorithm such that each element in `x` can be assigned a unique, incrementing "index" value generated by the iota range factory.

Please take a moment to review [the `std::for_each_n` algorithm](https://en.cppreference.com/w/cpp/algorithm/for_each_n) and [the `std::views::iota` range factory](https://en.cppreference.com/w/cpp/ranges/iota_view).

## Initializing `y` with the `std::fill_n` Algorithm

If you recall, in the `initialize` function we initialize all the values of the `y` vector to be `2.`.

```c++
  for (std::size_t i = 0; i < x.size(); ++i) {
    ...
    y[i] = 2.;
  }
```

In the following exercise you will use the `std:fill_n` algorithm to fill all values in the `y` vector to `2.`.

Please take a moment to review [the `std::fill_n` algorithm](https://en.cppreference.com/w/cpp/algorithm/fill_n).

## Exercise 2: Refactor `initialize`

For this exercise you will work with [exercise2.cpp](exercise2.cpp), which starts with the solution from exercise 1. The `TODO`s indicate the parts of the source code that need adding or refactoring. Below are the parts of the file containing `TODO`'s.

```c++
#include <algorithm>
// TODO: add C++ standard library includes as necessary

...

/// Intialize vectors `x` and `y`: raw loop sequential version
void initialize(std::vector<double> &x, std::vector<double> &y) {
  assert(x.size() == y.size());
  // TODO: Initialize `x` using SEQUENTIAL std::for_each_n algorithm with std::views::iota
  // TODO: Initialize `y` using SEQUENTIAL std::fill_n algorithm
}

...
```

The example compiles and runs as provided, but it produces incorrect results due to the incomplete `initialize` implementation.
In the compilation commands below, the C++ standard version is now C++20, to enable the use of `views::iota`.

### Compile and Run

Once you fix the `initialize` implementation, the following blocks should compile and run correctly:

In [None]:
!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise2.cpp
!./daxpy 1000000

In [None]:
!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -isystem/usr/local/range-v3/include -o daxpy exercise2.cpp
!./daxpy 1000000

In [None]:
!nvc++ -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy exercise2.cpp
!./daxpy 1000000

### Solution to Exercise 2

The [solution for this first exercise is `solutions/exercise2.cpp`](solutions/exercise2.cpp) which you can view if you get stuck or want to check your work.

The following cells compile and run the solution for exercise 2 using different compilers.

In [None]:
# Using iota range for initialize 
!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise2.cpp
!./daxpy 1000000

In [None]:
!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy solutions/exercise2.cpp
!./daxpy 1000000

In [None]:
!nvc++ -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy solutions/exercise2.cpp
!./daxpy 1000000

## Next

Please proceed to [the next notebook](../05-Parallel/Parallel.ipynb).

![DLI Header](../images/DLI_Header.png)