In this section, we consider the following variable:
using coordinate_type = xf::xcoordinate<xf::fstring>; using dimension_type = xf::xdimension<xf::fstring>; using variable_type = xf::xvariable<double, coordinate_type>; data_type d = xt::eval(xt::random::rand({6, 3}, 15., 25.)); variable_type v(std::move(d), { {"group", xf::axis({"a", "b", "d", "e", "g", "h"})}, {"city", xf::axis({"London", "Paris", "Brussels"})} });
Printing this variable in a Jupyter Notebook gives:
London | Paris | Brussels | |
---|---|---|---|
a | 16.3548 | 23.3501 | 24.6887 |
b | 17.2103 | 18.0817 | 20.4722 |
d | 16.8838 | 24.9288 | 24.9646 |
e | 24.6769 | 22.2584 | 24.8111 |
g | 16.0986 | 22.9811 | 17.9703 |
h | 15.0478 | 16.1246 | 21.3976 |
xframe provides flexible indexing methods for data selection, similar to the ones of xarray. These methods are summarized in the following table:
Dimension lookup | Index lookup | xvariable syntax |
---|---|---|
Positional | By integer | v(2, 1) |
Positional | By label | v.locate("d", "Paris") |
By name | By integer | v.iselect({{"group", 2}, {"city", 1}}) |
By name | By label | v.select({{"group", "d"}, {"city", "Paris"}}) |
The most basic way to access elements of an xvariable
is to use operator()
, like
you would do with an xtensor
:
std::cout << v(2, 1) << std::endl;
Contrary to Python, it is not possible to have different return types for a same method in C++. Multi selection is done with free functions that return views on the variable:,
#include "xvariable_view.hpp" auto view1 = xf::ilocate(v, xf::irange(0, 5, 2), xf::irange(1, 3)); std::cout << view1 << std::endl;
Paris | Brussels | |
---|---|---|
a | 23.3501 | 24.6887 |
d | 24.9288 | 24.9646 |
g | 22.9811 | 17.9703 |
Therefore a change in the view will reflect in the underlying variable:
view1(0, 1) = 0.; std::cout << v(2, 2) << std::endl; // Outputs 0.
In the code creating the view, irange
returns a range slice from xtensor, so any multi
selection in xtensor is also supported in xframe.
xvariable
also supports label-based indexing, with the locate
method for single
point selection, and locate
free function for multi selection:
std::cout << v.locate("d", "Paris") << std::endl; auto view2 = xf::locate(v, xf::range("a", "h", 2), xf::range("Paris", "Brussels")); std::cout << view2 << std::endl; // Same output as previous code
Be aware of the difference between range
and irange
parameters: for the former one,
accepting labels, the last value is included while for the latter one, accepting integral
indices, the las value is excluded.
xframe provides label-based slices similar to those of xtensor, so label-based multi selection is really similar to positional multi selection.
With the dimension names, we do not have to rely on the dimension order. We can use them explicitely to select data; Like positional indexing, xframe provides methods and free functions depending on the kind of selection you want to do:
// Dimension by name, index by position std::cout << v.iselect({{"city", 1}, {"group", 2}}) << std::endl; auto view3 = xf::iselect(v, {{"city", xf::irange(1, 3)}, {"group", xf::irange(0, 5, 2)}}); // Dimension by name, index by label std::cout << v.select({{"city", "Paris"}, {"group", "d"}}) << std::endl; auto view4 = xf::select(v, {{"city", xf::range("Paris", "Brussels")}, {"group", xf::range("a", "h", 2)}}); // view3 and view4 gives the same output as view2 and view1
Contrary to xarray, xframe does not provide a selection operator accepting a map argument.
drop
and keep
functions return slices that can be used to create a view with
the listed labels along the specified dimensions dropped or kept:
auto view5 = xf::select(v, {{"city", xf::drop("London")}, {"group", xf::keep("a", "d", "g")}}); // view5 is equivalent to view4
This is different form xarray where the xarray.DataArray.drop
method returns a new object.
To achieve the same with xframe, simply assign the view to a new xvariable
object:
variable_type v2 = view5;
Masking views allow to select data points based on conditions expresses on labels. These conditions
can be arbitrary complicated boolean expressions. Contrary to other views which are generally a
subset of the original data, a masking view has the same shape as its underlying xvariable
.
Masking views are created with the where
function:
data_type d2 = {{ 1., 2., 3. }, { 4., 5., 6. }, { 7., 8., 9. }}; auto v3 = variable_type( d2, { {"x", xf::axis(3)}, {"y", xf::axis(3)}, } ); auto view6 = xf::where( v3, not_equal(v3.axis<int>("x"), 2) && v3.axis<int>("y") < 2 ); std::cout << view6 << std::endl;
In a Jupyter Notebookn, this outputs the following:
0 | 1 | 2 | |
---|---|---|---|
0 | 1 | 2 | masked |
1 | 4 | 5 | masked |
2 | masked | masked | masked |
When assigning to a masked view, masked values are not changed. Like other views, a masking view is a proxy on its underlying variable, no copy is made, so changing an unmasked value actually changes the corresponding value in the undnerlying variable.
Data selection in variables return either references or views; therefore, contrary to xarray, it is possible to assign values to a subset of a variable with any of the indexing method:
// The next four lines are equivalent, they change a single value of v: v(2, 1) = 2.5; v.locate("d", "Paris") = 2.5; v.iselect({{"city", 1}, {"group", 2}}) = 2.5; v.select({{"city", "Paris"}, {"group", "d"}}) = 2.5; data_type d3 = {{0., 1.}, {2., 3.}, {4., 5.}}; auto v4 = variable_type( d3, { {"group", xf::axis({"a", "d", "g"})}, {"city", xf::axis({"Paris", "Brussels"})} } ); // The next four lines are equivalent, they change a subset of v xf::ilocate(v, xf::irange(0, 5, 2), xf::irange(1, 3)) = v4; xf::locate(v, xf::range("a", "h", 2), xf::range("Paris", "Brussels")) = v4; xf::iselect(v, {{"city", xf::irange(1, 3)}, {"group", xf::irange(0, 5, 2)}}) = v4; xf::select(v, {{"city", xr::range("Paris", "Brussels")}, {"group", xf::range("a", "h", 2)}}) = v4;
Printing v
after the assign gives
London | Paris | Brussels | |
---|---|---|---|
a | 16.3548 | 0 | 1 |
b | 17.2103 | 18.0817 | 20.4722 |
d | 16.8838 | 2 | 3 |
e | 24.6769 | 22.2584 | 24.8111 |
g | 16.0986 | 4 | 5 |
h | 15.0478 | 16.1246 | 21.3976 |
Reindexing views give variables new set of coordinates to corresponding dimensions. Like other views,
no copy is involved. Asking for values corresponding to new labels not found in the original set of
coordinates returns missing values. In the next example, we reindex the city
dimension:
auto view7 = xf::reindex(v, {{"city", xf::axis({"London", "New York", "Brussels"})}});
London | New York | Brussels | |
---|---|---|---|
a | 16.3548 | N/A | 24.6887 |
b | 17.2103 | N/A | 20.4722 |
d | 16.8838 | N/A | 24.9646 |
e | 24.6769 | N/A | 24.8111 |
g | 16.0986 | N/A | 17.9703 |
h | 15.0478 | N/A | 21.3976 |
Like xarray, xframe provides the useful reindex_like
shortcut which allows to reindex a
variable given the set of coordinates of another variable:
auto v5 = variable_type( d, { {"group", xf::axis({"a", "b", "d", "e", "g", "h"})}, {"city", xf::axis({"London", "New York", "Brussels"})} } ); auto view8 = xf::reindex_like(v, v5); // view8 is equivalent to view7
A reindexing view is a read-only view, it is not possible to change its value with indexing. This allows memory optimizations, the view does not have to store the missing values, it can return a proxy to a static-allocated missing value.
The align
function allows to reindex many variables with more flexible options:
auto t1 = xf::align<join::inner>(v, v5); std::cout << std::get<0>(t1) << std::endl; std::cout << std::get<1>(t1) << std::endl;
The last lines print the same output:
London | Brussels | |
---|---|---|
a | 16.3548 | 24.6887 |
b | 17.2103 | 20.4722 |
d | 16.8838 | 24.9646 |
e | 24.6769 | 24.8111 |
g | 16.0986 | 17.9703 |
h | 15.0478 | 21.3976 |
In the following, the variables are aligned w.r.t the union of the coordinates instead of their intersection:
auto t2 = xf::align<join::outer>(v, v5); std::cout << std::get<0>(t2) << std::endl; std::cout << std::get<1>(t2) << std::endl;
The first outuput is
London | Paris | Brussels | New York | |
---|---|---|---|---|
a | 16.3548 | 23.3501 | 24.6887 | N/A |
b | 17.2103 | 18.0817 | 20.4722 | N/A |
d | 16.8838 | 24.9288 | 24.9646 | N/A |
e | 24.6769 | 22.2584 | 24.8111 | N/A |
g | 16.0986 | 22.9811 | 17.9703 | N/A |
h | 15.0478 | 16.1246 | 21.3976 | N/A |
While the second have N/A
in the Paris
column.