# Expressions containing NDArray objects

Python-Blosc2 implements a powerful way to operate with NDArray (and other flavors) objects.  In this section, we will see how to do computations with NDArray arrays in a simple way.


In [None]:
import numpy as np

import blosc2

## A simple example
First, let's create a couple of NDArrays.

In [None]:
shape = (500, 1000)
a = blosc2.linspace(0, 1, np.prod(shape), dtype=np.float32, shape=shape, urlpath="a.b2nd", mode="w")
b = blosc2.linspace(1, 2, np.prod(shape), dtype=np.float64, shape=shape, urlpath="b.b2nd", mode="w")

Now, let's create an expression that involves `a` and `b`.

In [None]:
c = a**2 + b**2 + 2 * blosc2.sin(a * b) + 1
print(c.info)  # at this stage, the expression has not been computed yet

We see that the outcome of the expression is a `LazyExpr` object.  This object is a placeholder for the actual computation that will be done when we compute it.  This is a powerful feature because it allows us to build complex expressions without actually computing them until we really need the result.

Also, note that you can throw [many math functions](https://www.blosc.org/python-blosc2/reference/array_operations.html) at your expressions. These are mainly the ones supported by [numexpr](https://github.com/pydata/numexpr), plus different reduction operations.

Now, let's compute it. `LazyExpr` objects follow the [LazyArray interface](https://www.blosc.org/python-blosc2/reference/lazyarray.html), and this provides several ways for performing the computation, depending on the object we want as the desired output.

First, let's use the `compute` method.  The result will be another NDArray array:

In [None]:
d = c.compute()  # compute the expression
print(f"Type: {type(d)}")
print(f"Compression ratio: {d.schunk.cratio:.2f}x")

Or, we can store the result in a file:

In [None]:
d = c.compute(urlpath="result.b2nd", mode="w")
!ls -lh result.b2nd

Note that all the output is stored in the file as computation proceeds; this is an efficient way to store large results on disk.  Incidentally, both operands and results are stored on disk here, so you can operate with very large arrays in a very small memory footprint.

Now, let's compute the expression and store the result in a NumPy array.  For this, we will use the `__getitem__` method:

In [None]:
npd = d[:]
print(f"Type: {type(npd)}")

As you can see, the result is a NumPy array now.

Depending on your needs, you can choose to get the result as a NDArray array or as a NumPy array.  The former is more storage efficient, but the latter is more flexible when interacting with other libraries that do not support NDArray arrays.

You can also compute just *part* of the expression by passing an item argument to the lazy array:

In [None]:
d[0, :20]  # just computes row 0 and cols 0 to 20

## Saving expressions to disk

You can save literal expressions to disk.  For this, use the `save` method of ``LazyArray`` objects.  For example, let's save the expression `c` to disk:

In [None]:
c = a**2 + b**2 + 2 * blosc2.sin(a * b) + 1
c.save(urlpath="expr.b2nd")

And you can load it back with the `open` function:

In [None]:
c2 = blosc2.open("expr.b2nd")
print(c2.info)

Now, you can compute it as before:

In [None]:
d2 = c2.compute()
print(f"Compression ratio: {d2.schunk.cratio:.2f}x")

## Reductions

We can also perform reductions as part of expressions.  Let's see an example:

In [None]:
c = (a + b).sum()
c

As we can see, the result is a scalar. That means that reductions in expressions always perform the computation immediately.

We can also specify the axis for the reduction:

In [None]:
c = (a + b).sum(axis=1)
print(f"Shape of c: {c.shape}")
# Show the first 4 elements of the result
c[:4]

Reductions can also be part of more complex expressions:

In [None]:
c = (a + b).sum(axis=0) + 2 * a + 1
print(f"Shape of c: {c.shape}")
# Show the first 4 elements of the result
c[0, 0:4]

In particular, note that the result of the reduction above has a different shape than `a`, but the expression is still computed correctly.  This is because the shape of the reduction is *compatible* with the shape of the operands.

## Querying NDArray arrays

A powerful feature of Blosc2 compute engine is its ability to do queries on NDArray arrays with structured types.  Let's see an example.

In [None]:
N = 1000_000
rng = np.random.default_rng(seed=1)
it = ((-x + 1, x - 2, rng.normal()) for x in range(N))
sa = blosc2.fromiter(
    it, dtype=[("A", "i4"), ("B", "f4"), ("C", "f8")], shape=(N,),
    urlpath="sa-1M_tutorial2.b2nd", mode="w"
)
print("First 3 rows:\n", sa[:3])

Now, we can select rows depending on the value of different fields:

In [None]:
A = sa["A"]
B = sa["B"]
C = sa["C"]
expr = sa[A > B]
expr[:]

We can do the same on a more compact way by using an expression in string form inside the brackets:

In [None]:
sa["A > B"][:]

The expression can also be a complex one:

In [None]:
sa["(A > B) & (sin(C) > .5)"][:]

We can also query and extract a single field:

In [None]:
sa["C"]["A > B"][:]

And perform reductions on queries on a single field:

In [None]:
sa["C"]["(A < B) & (C > 0)"].mean()

Combining all this weaponry allows to query your data on a simple and efficient way. As the computation is lazy, all the operations are grouped and executed together for maximum performance. The only exception is that, when a reduction is found, it is computed eagerly; but it can still be part of more general expressions, as well as being able to be saved and loaded from disk.

## Summary

In this section, we have seen how to perform computations with NDArray arrays, and more in particular, how to create expressions, compute them, and save them to disk. Also, we have looked at performing reductions, broadcasting, selections and combinations of both. Lazy expressions allow you to build and compute complex computations from operands that can be in-memory, on-disk or remote (`C2Array`) in a simple and effective way.