# [Session 1] Toolkit

Throughout the first session we shall introduce a few useful libraries and present how to interact with them. Treat the following as a collection of utilities which could be later applied in combination with all techniques presented later.

## Data handling

For all IO operations in this course we will `#include <iostream>` library.
Also, we will utilise syntax constructions from the _C++17_ standard thus `--std=c++17` is added as a compilation flag in all commands below.

In case of very simple programs with one input and one output file the fastests approach is to use provide file paths with `>` and `<` shell operators during binary execution. Input is then caught with `std::cin`, output is printed with `std::cout`. The easiest (but clean!) solution is usually a good start, especially during the prototyping phase. Such operations may be later polished out in the final code refactoring.

This is a general remark: developers should be mindful and avoid overcommiting to a specific technology/solution in in initial phases of a project. It is especially important for scientsts, for whom data exploration, cleaning and testing of various approaches may take up a significant part of a project's timeline.

Remember: _**Premature optimisation is the root of all evil** ~[Tony Hoare](https://en.wikipedia.org/wiki/Tony_Hoare)_

### Manual IO through streams

In case of multiple input/output files passing them all with shell operators becomes cumbersome.

Let us take a look at the code of `example1.cpp`:

```cpp

#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
```

Notice the inclusion of standard libraries for file and string streams operations.

These are quite useful for the construction below which will generate path to a fixed input file regardless of our current `$PWD`:

```cpp

// construct PWD-independent path to the input file
std::filesystem::path CWD = std::filesystem::current_path();
std::filesystem::path EXECPATH = (CWD / argv[0]).parent_path();
std::ifstream file(EXECPATH / "data/vector.txt");
```

What follows after is the main logic: reading a text file (a vector of floating-point numbers specified in one column) line by line into a `vector` of type `float`. Notice closing the file handle opened just above. Finally we iterate over the data vector and print the values into the standard output stream.

```cpp

std::vector<float> data;
std::string line;
float temp;

while (std::getline(file, line)){
    std::istringstream ss(line);
    if (ss >> temp) data.push_back(temp);
}
file.close();

for (const float& value : data){
    std::cout << value << std::endl;
}
```

Please compile and execute this test program with:

```
g++ example1.cpp -o example1 --std=c++17  && ./example1
```

Reading a CSV-formatted table is quite similar: see the code of `example2.cpp`:

```cpp

#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>

int main(int argc, char* argv[]){

    std::vector<std::vector<float>> data;
    std::string line;
    float temp;

    // construct PWD-independent path to the input file
    std::filesystem::path CWD = std::filesystem::current_path();
    std::filesystem::path EXECPATH = (CWD / argv[0]).parent_path();
    std::ifstream file(EXECPATH / "data/matrix.csv");

    while (std::getline(file, line)){
        std::istringstream ss(line);
        std::vector<float> row;
        while (ss >> temp){
            row.push_back(temp);
            if (ss.peek() == ',') ss.ignore();
        }
        data.push_back(row);
    }
    file.close();

    for (const std::vector<float>& row : data){
        for (int i=0; i<row.size(); ++i){
            std::cout << row[i];
            if (i!=row.size()-1) std::cout << ",";
        }
        std::cout << std::endl;
    }

    return 0;
}

```

In the case of a CSV file with a matrix of numbers:
* we store the data in a `vector` of `vectors` of `floats`
* each line we read we split by the comma character
* printing out the data is additionally complicated by including the comma in-between the vector elements but not at the end of each "row"

Please compile and execute this test program with:

```
g++ example2.cpp -o example2 --std=c++17  && ./example2
```

### readcsv library

It is generally a good practice to utilise available code libraries whenever possible, especially for more tedious tasks. Luckily, reading and writing text files is one of such cases. Please take note of [rapidcsv](https://github.com/d99kris/rapidcsv) - a one-file, header-only _C++_ library which is publicly available. We shall try it out to interact with text files through its own interfaces.

Remember to always check the license of open source code you use. In this case the library is under [BSD-3 Clause](https://choosealicense.com/licenses/bsd-3-clause/) which specifies the obligation to include the copyright notice upon code re-distribution.

Following the repository's README:

> Simply copy src/rapidcsv.h to your project/include directory and include it.

We will thus download the latest release of the project (at the time of writting that is `8.77`) into our current working directory, uncompress it and later specify the path to its core file to the compiler:

```
wget https://github.com/d99kris/rapidcsv/archive/refs/tags/v8.77.zip

unzip v8.77.zip
```


Let us break down the sections of `example3.cpp`.

In the first one (below) we construct a `doc` object, instance of the main library class: `Document`. This will serve as a data handle, as the class defines various functions to retrieve the file's contents.

```cpp

// construct PWD-independent path to the input file
std::filesystem::path CWD = std::filesystem::current_path();
std::filesystem::path EXECPATH = (CWD / argv[0]).parent_path();
rapidcsv::Document doc(
    EXECPATH / "data/matrix.csv",
    rapidcsv::LabelParams(-1, -1),
    rapidcsv::SeparatorParams(',')
);

int numRows = doc.GetRowCount();
int numColumns = doc.GetColumnCount();
```

What follows next is a standard data parsing; this time into a two-dimensional `float` array. As we do not use the built-in `vector` standard library manual memory management is required.

```cpp

    float** data = new float*[numRows];
    for (int row = 0; row < numRows; ++row) data[row] = new float[numColumns];

    for (int row=0; row<numRows; ++row)
        for (int col=0; col<numColumns; ++col)
            data[row][col] = doc.GetCell<float>(col, row);

    for (int row = 0; row < numRows; ++row) delete[] data[row];
    delete[] data;
```

The final section presents how to iterate over columns of the `doc` object, construct `vector<float>` instances ouf of them and iterate over each of the elements in the vectors, printing it.

```cpp

    for (int i=0; i<numColumns; ++i){
        std::vector<float> col = doc.GetColumn<float>(i);
        for (const float& value : col) std::cout << value << std::endl;
        std::cout << "===" << std::endl;

    }
```

Please compile and execute this test program with:

```
g++ example3.cpp -Irapidcsv-8.77/src -o example3 --std=c++17  && ./example3
```

Note the `-I` flag pointing to the library directory mentioned previously.

More [concrete examples](https://github.com/d99kris/rapidcsv/blob/master/README.md#example-usage) as well as an [API documention](https://github.com/d99kris/rapidcsv/blob/master/doc/README.md) are available on _GitHub_.

## Unit testing

Catch2: https://github.com/catchorg/Catch2

https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md

https://github.com/catchorg/Catch2/blob/devel/docs/command-line.md#specify-the-number-of-benchmark-samples-to-collect

https://github.com/catchorg/Catch2/blob/devel/docs/Readme.md

Whenever running unit tests with _Catch2_ your code should be divoded into three files:
1. _Catch2_ itself (above)
2. Your source code (below)
3. Tests (below below)

Let us first inspect `example4src.cpp`:

```cpp

#include <cmath>
#include <cstdlib>
#include <ctime>

float F(float x){
    return (2*x + pow(2,x)) / x;
}

int G(int seed=time(nullptr)){
    srand(seed);
    return 1 + std::rand() % 100;
}
```

These are two simple functions we would like to design tests for.

The first one implements a mathematical function $F(x) = \frac{2x + 2^x}{x}$

The second one returns a pseudo-random number between 1 and 100. Notice the RNG seed set to the function argument with a default based on the current system time allowing us to fix the pseudo-random number generation process for unit tests.

The final ingredient is the test file containing all test cases and assertions. See the content of `example4test.cpp`. The first section specifies other files which we include: our source code and the header file (`.hpp`) of the _Catch2_ framework are mandatory; `limits` is included only to detect a division by zero with `std::isinf()`.


We will not go over the whole documentation of the framework 

```cpp

#include <limits>
#include "example4src.cpp"
#include "catch_amalgamated.hpp"
```

Simply put: each unit test case contains: a name, a tag and a kind of assertion (or multiple) inside its body:

```cpp

TEST_CASE(
    "F: Execution",
    "[F]"
){
    REQUIRE_NOTHROW ( F(1) );
    REQUIRE ( std::isinf(F(0)) );

}
```

Specifying approximate targets is especially useful for testing code designed for data processing:

```cpp

TEST_CASE(
    "F: Result",
    "[F]"
){
    REQUIRE ( F(2) == 4.0 );
    Catch::Approx target = Catch::Approx(12.0).epsilon(0.1);
    REQUIRE ( F(5.86) == target );
}
```

In order to benchmark execution time of a given objective function we need to use the `BENCHMARK` keyword; more information on benchmarking [here](google.com).

```cpp

TEST_CASE(
    "F: Simple Benchmarking",
    "[F]"
){
    BENCHMARK( "0.1" ) { return F(0.1); };
    BENCHMARK( "10" ) { return F(10); };
    BENCHMARK( "-7" ) { return F(-7); };
}
```

Lastly, we test the RNG-related function specifying our custom seed:

```cpp

TEST_CASE(
    "G: Fix-RNG",
    "[G]"
){
    REQUIRE ( G(0) == 31 );
}
```

When it comes to compilation: `catch_amalgamated.cpp` file is the program's entrypoint - it contains the `main()` function and it automatically detects all `TEST_CASES` you specify in your custom source code. You should provide both of them to the compiler:

```
g++ example4test.cpp catch_amalgamated.cpp  -o example4 --std=c++17
```

During the program's execution it is a good idea to specify various _Catch2_ flags. 

```
./example4 --durations yes --benchmark-samples 100 --benchmark-resamples 100000
```

## Real Analysis

### Boost

https://github.com/pulver/autodiff

https://www.boost.org/doc/libs/1_65_0/libs/math/doc/html/math_toolkit/roots/brent_minima.html

```cpp
#include <iostream>
#include <boost/math/differentiation/autodiff.hpp>
#include <boost/math/tools/minima.hpp>

template <typename T>
T x4(T const& x){
  T x4 = x * x;
  x4 *= x4;
  return x4;
}

struct F{
  double operator()(double const& x){
    return (x + 3) * (x - 1) * (x - 1);
  }
};

int main(){

  // automatic differentiation
  constexpr unsigned ord = 5;
  double arg = 0.1;
  auto const x = boost::math::differentiation::make_fvar<double, ord>(arg);
  auto const y = x4(x);
  for (unsigned i=0; i<=ord; ++i)
    std::cout <<
      "d(" << i << ")/dx [x^4] | x=" << arg << " | = " << y.derivative(i)
      << std::endl;

  std::cout << "=====" << std::endl;

  // mimimum search
  int bits = std::numeric_limits<double>::digits;
  double lower = -4.0;
  double upper = 4.0 / 3;
  std::pair<double, double> r = boost::math::tools::brent_find_minima(
    F(), lower, upper, bits
  );
  std::cout.precision(std::numeric_limits<double>::digits10);
  std::cout << "   min[F(x)] = " << r.second << std::endl;
  std::cout << "minarg[F(x)] = " << r.first << std::endl;

  return 0;
}

```

g++ example5.cpp -o example5 --std=c++17  && ./example5

### GSL

http://gnu.ist.utl.pt/software/gsl/manual/html_node/Numerical-Differentiation-Examples.html

https://www.gnu.org/software/gsl/doc/html/roots.html

https://www.gnu.org/software/gsl/doc/html/diff.html

```cpp
#include <iostream>
#include <gsl/gsl_errno.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_roots.h>
#include <gsl/gsl_deriv.h>

// f(x) = x^2 - 4
double function(double x, void* params){
    return x * x - 4.0;
}

int main(){

    // root search

    gsl_function F;
    F.function = &function;
    F.params = nullptr;
    double xLower = 1.0;
    double xUpper = 3.0;

    const gsl_root_fsolver_type* solverType = gsl_root_fsolver_brent;
    gsl_root_fsolver* solver = gsl_root_fsolver_alloc(solverType);
    gsl_root_fsolver_set(solver, &F, xLower, xUpper);

    int status;
    int iter = 0;
    int maxIter = 100;
    double root;
    double epsabs = 0.0;
    double epsrel = 1e-6;

    do{
        iter++;
        status = gsl_root_fsolver_iterate(solver);
        root = gsl_root_fsolver_root(solver);
        xLower = gsl_root_fsolver_x_lower(solver);
        xUpper = gsl_root_fsolver_x_upper(solver);
        status = gsl_root_test_interval(xLower, xUpper, epsabs, epsrel);
        if (status == GSL_SUCCESS){
            std::cout << "Function root found at x = " << root << std::endl;
            break;
        }
    } while (status == GSL_CONTINUE && iter < maxIter);

    gsl_root_fsolver_free(solver);

    std::cout << "=====" << std::endl;

    // derrivative
    double result, abserr;
    double arg = 2.1;
    double h = 1e-8;
    gsl_deriv_central(&F, arg, h, &result, &abserr);
    std::cout << "f'(x) = " << result << " +/- " << abserr << std::endl;

    return 0;
}

```

g++ example6.cpp -o example6 --std=c++17  -lgsl -lgslcblas && ./example6

## Linear Algebra

### Armadillo

https://github.com/adevress/armadillo/blob/master/examples/example1.cpp

https://arma.sourceforge.net/docs.html

https://anderkve.github.io/FYS3150/book/introduction_to_cpp/intro_to_armadillo.html

```cpp
//
// Efficient statistical computing with C++ Armadillo
//
// Maciek Bak
// Swiss Institute of Bioinformatics
// 15.07.2019
//

#include <iostream>
#include <armadillo>

//=============================================================================

void armadillo_toy_examples(){
  
  // Generate a vector from the standard normal distribution
  arma::vec v_normal = arma::randn(5);
  v_normal.print("v_normal:");

  // Create a 4x4 random matrix and print it on the screen
  arma::Mat<double> A = arma::randu(4,4);
  std::cout << "A:\n" << A << "\n";

  // Create a new diagonal matrix using the main diagonal of A:
  arma::Mat<double>B = arma::diagmat(A);
  std::cout << "B:\n" << B << "\n";

  // New matrix: directly specify the matrix size (elements are uninitialised)
  arma::mat C(2,3);  // typedef mat  =  Mat<double>
  std::cout << "C.n_rows: " << C.n_rows << std::endl;
  std::cout << "C.n_cols: " << C.n_cols << std::endl;

  // Directly access an element (indexing starts at 0)
  C(1,2) = 456.0;
  std::cout << "C[1][2]:\n" << C(1,2) << "\n";
  C = 5.5; // scalars are treated as a 1x1 matrix
  C.print("C:");

  // Inverse
  std::cout << "inv(C): " << std::endl << inv(C) << std::endl;
  
  // Rotate a point (0,1) by -Pi/2 ---> (1,0)
  arma::vec Position = {0,1};
  Position.print("Current coordinates of a point:");
  double Pi = 3.14159265359;
  double phi = -Pi/2;
  arma::mat RotationMatrix = {
    {+cos(phi), -sin(phi)},
    {+sin(phi), +cos(phi)}
  };
  Position = RotationMatrix * Position;
  Position.print("New coordinates of the point:");
}

//=============================================================================

void armadillo_read_write_objects(){

  // Create a 5x5 matrix with random values from uniform distribution on [0;1]
  // Save a double matrix to a csv format, then load it.
  arma::Mat<double> uniform_matrix = arma::randu(3,5);
  uniform_matrix.save("data/arma_uniform_matrix.csv", arma::csv_ascii);
  arma::Mat<double> load_matrix;
  load_matrix.load("data/arma_uniform_matrix.csv", arma::csv_ascii);

  // Armadillo can save directly to files or write to pre-opened streams.
  // In order to add column names to output tables we have to write the
  // header manually to a file stream and then save the matrix to the stream.
  std::ofstream file("data/arma_uniform_matrix_with_headers.csv");
  file << "A,B,C,D,E" << std::endl;
  uniform_matrix.save(file, arma::csv_ascii);
  file.close();
  //
  // As Armadillo objects are numerical structures the input shall not contain
  // row/column names. Armadillo should be used only for heavy computations.
}

//=============================================================================

int main(int argc, const char **argv){

  std::cout << "Armadillo version: " << arma::arma_version::as_string()
    << std::endl;

  // set RNG seed:
  arma::arma_rng::set_seed(0);
  //arma::arma_rng::set_seed_random();

  armadillo_toy_examples();

  armadillo_read_write_objects();

  return 0;
}

```

g++ example7.cpp -o example7 --std=c++17 -larmadillo  && ./example7

### Eigen

https://eigen.tuxfamily.org/index.php?title=Main_Page#Documentation

```cpp
//
// Efficient statistical computing with C++ Eigen
//
// Maciek Bak
// Swiss Institute of Bioinformatics
// 29.07.2019
//

#include <iostream>
#include <ctime>
#include "Eigen/Dense"

using namespace Eigen;

//=============================================================================

void eigen_toy_examples(){

  // Generate a vector from the standard normal distribution
  VectorXd v_normal = VectorXd::Random(5);
  std::cout << "v_normal:" << std::endl;
  std::cout << v_normal << std::endl;

  // Create a 4x4 random matrix and print it on the screen
  MatrixXd A = MatrixXd::Random(4, 4);
  std::cout << "A:" << std::endl;
  std::cout << A << std::endl;

  // Rotate a point (0,1) by -Pi/2 ---> (1,0)
  Vector2d Position(0, 1);
  std::cout << "Current coordinates of a point:" << std::endl;
  std::cout << Position << std::endl;
  double Pi = 3.14159265359;
  double phi = -Pi/2;
  // Rotation Matrix:
  Matrix2d RotationMatrix;
  RotationMatrix << +cos(phi), -sin(phi), +sin(phi), +cos(phi);
  Position = RotationMatrix * Position;
  std::cout << "New coordinates of the point:" << std::endl;
  std::cout << Position << std::endl;

}

//=============================================================================

int main(int argc, const char **argv){

  std::cout 
    << "Eigen version: "
    << EIGEN_WORLD_VERSION
    << "."
    << EIGEN_MAJOR_VERSION
    << "."
    << EIGEN_MINOR_VERSION
    << std::endl;

  // set RNG seed:
  std::srand(0);

  eigen_toy_examples();

  return 0;
}

```

```
g++ example8.cpp -o example8 --std=c++17  && ./example8
```

## Conclusion

The code above provides simple use cases of libraries which are specifically useful for scientific computing. The point of this course is not to thoroughly inspect each of the presented tools, rather to provide minimal working examples and navigate the participants in the right directions (the official documenation is usually a good place to start).

---