# HDFql

If you are used to managing data in a (No)SQL database, you will find HDFql very familiar. It is a high-level (declarative) language that allows you to manage HDF5 files using a SQL-like language and only a few lines of code in C, C++, Java, Python, Fortran, C#, or R. See the __[HDFql website](https://www.hdfql.com/)__ for more information. The HDFql implementation depends on the HDF5 library, but all of that is hidden from users. HDFql support threading and parallel I/O, but we will not cover that here. Check out the __[HDFql documentation](https://www.hdfql.com/#documentation/)__ for more information!

The C++ version of our model problem is our starting point, but we will use the HDFql bindings instead of the HDF5 library C-API.

## Installation

The installation is pretty simple. You can download binaries from the __[HDFql website](https://www.hdfql.com/)__ for Windows, Linux, and macOS. Extract the zip file and add the `bin` directory to your `PATH` environment variable, and you are ready to go. Without writing any code, you can use the `HDFqlCLI` command-line tool to create, read, and write HDF5 files.

In [None]:
%%bash
wget -q -nc -P build https://www.hdfql.com/releases/2.5.0/HDFql-2.5.0_Linux64.zip
unzip -oq build/HDFql-2.5.0_Linux64.zip -d build

## Reusing the C++ example we already have

Instead of introducing a new language, we'll stick with C++. The sample path generation is the same as before, but we will use the HDFql C++ bindings to write the data to an HDF5 file. Our workhorse will be the `HDFql::execute` function, which takes a string as an argument and executes it as an HDFql statement. The HDFql statements are very similar to SQL statements, but they are not identical. The HDFql website has a __[quick start](https://www.hdfql.com/quickstart)__ where you can look up the syntax.

To make this example a little more interesting, we have added a few details that might trip up unsuspecting users.

1. We show how to pass values from the C++ host language to HDFql statements by using a C++ `ostringstream` object, which you can think of as a C++ `StringBuilder`. (See lines 33 and 39-42 for examples.)
2. `DATASET` is a reserved keyword in HDFql, so we must escape it by using quotation marks. (See lines 33 for an example.)
3. We show how to register the array variable `ou_process` with HDFql so that we can use it in the HDFql `CREATE DATASET` statement. (See line 33 for an example.)

In [None]:
%%writefile src/ou_hdfql.cpp
#include "ou_sampler.hpp"

#include "HDFql.hpp"
#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

#define sstr(x) (query.str(""),query.clear(),query << x,query.str().c_str())

int main()
{
    const size_t path_count = 100, step_count = 1000;
    const double dt = 0.01, theta = 1.0, mu = 0.0, sigma = 0.1;

    cout << "Running with parameters:"
         << " paths=" << path_count << " steps=" << step_count
         << " dt=" << dt << " theta=" << theta << " mu=" << mu << " sigma=" << sigma << endl;

    vector<double> ou_process;
    ou_sampler(ou_process, path_count, step_count, dt, theta, mu, sigma);
    
    //
    // Write the sample paths to an HDF5 file using the HDFql C++ bindings!
    //

    HDFql::execute("CREATE TRUNCATE AND USE FILE ou_hdfql.h5");
    HDFql::execute("CREATE ATTRIBUTE source AS VARCHAR VALUES(\"https://github.com/HDFGroup/hdf5-tutorial\")");

    ostringstream query;
    HDFql::execute(sstr("CREATE DATASET \"dataset\" AS DOUBLE(" << path_count << ", " << step_count << ") VALUES FROM MEMORY " << HDFql::variableTransientRegister(ou_process)));

    HDFql::execute("CREATE ATTRIBUTE dataset/comment AS VARCHAR VALUES(\"This dataset contains sample paths of an Ornstein-Uhlenbeck process.\")");
    HDFql::execute("CREATE ATTRIBUTE dataset/Wikipedia AS VARCHAR VALUES(\"https://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process\")");
    HDFql::execute("CREATE ATTRIBUTE dataset/rows AS VARCHAR VALUES(\"path\")");
    HDFql::execute("CREATE ATTRIBUTE dataset/columns AS VARCHAR VALUES(\"time\")");
    HDFql::execute(sstr("CREATE ATTRIBUTE dataset/dt AS DOUBLE VALUES(" << dt << ")"));
    HDFql::execute(sstr("CREATE ATTRIBUTE dataset/θ AS DOUBLE VALUES(" << theta << ")"));
    HDFql::execute(sstr("CREATE ATTRIBUTE dataset/μ AS DOUBLE VALUES(" << mu << ")"));
    HDFql::execute(sstr("CREATE ATTRIBUTE dataset/σ AS DOUBLE VALUES(" << sigma << ")"));

    HDFql::execute("CLOSE FILE");

    return 0;
}

The beauty of HDFql is that you can reuse the same statements in any of the supported host languages  (C, C++, Java, Python, Fortran, C#, and R). What's different for different host languages is the syntax for passing values from the host language to HDFql statements.

In [None]:
%%bash
g++ -std=c++17 -Wall -pedantic -I./build/hdfql-2.5.0/include -L./build/hdfql-2.5.0/wrapper/cpp -I./include  ./src/ou_hdfql.cpp ./src/ou_sampler.cpp -o ./build/ou_hdfql -lHDFql
export LD_LIBRARY_PATH=/workspaces/hdf5-tutorial/build/hdfql-2.5.0/wrapper/cpp/:$LD_LIBRARY_PATH
./build/ou_hdfql
ls -iks ou_hdfql.h5