# NDArray: A NDim, Compressed Data Container

NDArray objects let users perform different operations with NDArray arrays like setting, copying or slicing them. In this section, we are going to see how to create and manipulate a NDArray array in a simple way.


In [1]:
import numpy as np

import blosc2

## Creating an array
First, we create an array, with zeros being used as the default value for uninitialized portions of the array.


In [2]:
array = blosc2.zeros((10000, 10000), dtype=np.int32)
print(array.info)

type    : NDArray
shape   : (10000, 10000)
chunks  : (80, 10000)
blocks  : (1, 10000)
dtype   : int32
cratio  : 100000.00
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=4,
        : nthreads=7, blocksize=40000, splitmode=<SplitMode.ALWAYS_SPLIT: 1>,
        : filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
        : <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
        : 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=7)



Note that all the compression and decompression parameters, as well as the chunks and blocks shapes are set to the default.

## Reading and writing data
We can write, read and modify NDArray arrays using NumPy array as data source.

In [3]:
array[0, :] = np.arange(10000, dtype=array.dtype)
array[:, 0] = np.arange(10000, dtype=array.dtype)

In [4]:
array[0, 0]

array(0, dtype=int32)

In [5]:
array[0, :]

array([   0,    1,    2, ..., 9997, 9998, 9999], dtype=int32)

In [6]:
array[:, 0]

array([   0,    1,    2, ..., 9997, 9998, 9999], dtype=int32)

## Persistent data
When we create a NDArray array, we can specify where it will be stored. Indeed, we can specify all the compression/decompression and other storage parameters.
To store an array on-disk we just have to specify a `urlpath` as to where to save the new array. For NDArrays, it usual to use the `.b2nd` extension.

In [7]:
array = blosc2.full(
    (1000, 1000),
    fill_value=b"pepe",
    chunks=(100, 100),
    blocks=(50, 50),
    urlpath="ndarray_tutorial.b2nd",
    mode="w",
)
print(array.info)

type    : NDArray
shape   : (1000, 1000)
chunks  : (100, 100)
blocks  : (50, 50)
dtype   : |S4
cratio  : 1111.11
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=4,
        : nthreads=7, blocksize=10000, splitmode=<SplitMode.ALWAYS_SPLIT: 1>,
        : filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
        : <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
        : 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=7)



This time we have set the chunks and blocks shapes. You can open and access it whenever you like.

In [8]:
array2 = blosc2.open("ndarray_tutorial.b2nd")
print(array2.info)

type    : NDArray
shape   : (1000, 1000)
chunks  : (100, 100)
blocks  : (50, 50)
dtype   : |S4
cratio  : 1111.11
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=4,
        : nthreads=1, blocksize=10000, splitmode=<SplitMode.ALWAYS_SPLIT: 1>,
        : filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
        : <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
        : 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=1)



## Compression params
Here we can see how when we make a copy of a NDArray array we can change its compression parameters in an easy way.

In [9]:
b = np.arange(1000000).tobytes()
array1 = blosc2.frombuffer(b, shape=(1000, 1000), dtype=np.int64, chunks=(500, 10), blocks=(50, 10))
print(array1.info)

type    : NDArray
shape   : (1000, 1000)
chunks  : (500, 10)
blocks  : (50, 10)
dtype   : int64
cratio  : 7.45
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=8,
        : nthreads=7, blocksize=4000, splitmode=<SplitMode.ALWAYS_SPLIT: 1>,
        : filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
        : <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
        : 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=7)



In [10]:
cparams = blosc2.CParams(
    codec=blosc2.Codec.ZSTD,
    clevel=9,
    filters=[blosc2.Filter.BITSHUFFLE],
    filters_meta=[0],
)

array2 = array1.copy(chunks=(500, 10), blocks=(50, 10), cparams=cparams)
print(array2.info)

type    : NDArray
shape   : (1000, 1000)
chunks  : (500, 10)
blocks  : (50, 10)
dtype   : int64
cratio  : 7.49
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=8,
        : nthreads=7, blocksize=4000, splitmode=<SplitMode.ALWAYS_SPLIT: 1>,
        : filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
        : <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
        : 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=7)



## Metalayers and variable length metalayers

We have seen that you can pass to the NDArray constructor any compression or decompression parameters that you want, and now, we will add metalayers to these. Metalayers are small metadata for informing about the properties of data that is stored on a container. As explained in [the SChunk tutorial](07.schunk-basics.html), there are two kinds of metalayers. The first one (`meta`), is the system one, and must be added at construction time; it cannot be deleted and can only be updated with values that have the same bytes size as the old value. They are easy to access and edit by users:

In [11]:
meta = {"dtype": "i8", "coords": [5.14, 23.0]}
array = blosc2.zeros((1000, 1000), dtype=np.int16, chunks=(100, 100), blocks=(50, 50), meta=meta)

You can work with them like if you were working with a dictionary. To access this dictionary you will use the `schunk` attribute that an NDArray has.

In [12]:
array.schunk.meta

<blosc2.schunk.Meta at 0x106fa7c10>

In [13]:
array.schunk.meta.keys()

['b2nd', 'dtype', 'coords']

As you can see, Blosc2 internally uses such metalayers to store shapes, ndim, dtype, etc, and retrieve this data when needed.  For example, the `b2nd` metalayer has this info.

In [14]:
array.schunk.meta["b2nd"]

[0, 2, [1000, 1000], [100, 100], [50, 50], 0, '<i2']

And we can look at the our own user meta:

In [15]:
array.schunk.meta["coords"]

[5.14, 23.0]

To add a metalayer after the creation or a variable length metalayer, you can use the `vlmeta` accessor from the SChunk. Similarly to `meta`, it works as a dictionary.

In [16]:
print(array.schunk.vlmeta.getall())
array.schunk.vlmeta["info1"] = "This is an example"
array.schunk.vlmeta["info2"] = "of user meta handling"
array.schunk.vlmeta.getall()

{}


{b'info1': 'This is an example', b'info2': 'of user meta handling'}

You can update them with a value larger than the original one:

In [17]:
array.schunk.vlmeta["info1"] = "This is a larger example"
array.schunk.vlmeta.getall()

{b'info1': 'This is a larger example', b'info2': 'of user meta handling'}

You can store any kind of data in the `vlmeta` metalayer, as long as it is serializable with msgpack.  This is a very flexible way to store metadata in a Blosc2 container.

In [18]:
array.schunk.vlmeta["info3"] = {"a": 1, "b": 2}
array.schunk.vlmeta.getall()

{b'info1': 'This is a larger example',
 b'info2': 'of user meta handling',
 b'info3': {'a': 1, 'b': 2}}

Variable length metadata can be deleted:

In [19]:
del array.schunk.vlmeta["info1"]
array.schunk.vlmeta.getall()

{b'info2': 'of user meta handling', b'info3': {'a': 1, 'b': 2}}

This is very useful to store metadata that is not known at the time of creation of the container, or that can be updated or deleted at any time.

## Creating a NDArray from a NumPy array

Let's create a NDArray from a NumPy array using the `asarray` constructor:

In [20]:
shape = (100, 100, 100)
dtype = np.float64
nparray = np.linspace(0, 100, np.prod(shape), dtype=dtype).reshape(shape)
b2array = blosc2.asarray(nparray)
print(b2array.info)
b2array[0, 0, :4]

type    : NDArray
shape   : (100, 100, 100)
chunks  : (50, 100, 100)
blocks  : (1, 100, 100)
dtype   : float64
cratio  : 13.73
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=8,
        : nthreads=7, blocksize=80000, splitmode=<SplitMode.ALWAYS_SPLIT: 1>,
        : filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
        : <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
        : 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=7)



array([0.    , 0.0001, 0.0002, 0.0003])

## Building a NDArray from an iterator

Finally, let's see how you can create a NDArray filled with data from an iterator, store it into a file, and reopen it. Let's create a structured array with 3 fields and 1 million of elements.

In [21]:
N = 1000_000
rng = np.random.default_rng()
it = ((-x + 1, x - 2, rng.normal()) for x in range(N))
%time sa = blosc2.fromiter(it, dtype='i4,f4,f8', shape=(N,), urlpath="sa-1M.b2nd", mode="w")
!ls -lh sa-1M.b2nd
sa2 = blosc2.open("sa-1M.b2nd")
sa2.info

CPU times: user 514 ms, sys: 29.8 ms, total: 544 ms
Wall time: 523 ms
Detected ARM ...
-rw-r--r--@ 1 francesc  staff   7.0M Nov 28 04:50 sa-1M.b2nd


0,1
type,NDArray
shape,"(1000000,)"
chunks,"(125000,)"
blocks,"(4000,)"
dtype,"[('f0', '"
cratio,2.24
cparams,"CParams(codec=, codec_meta=0, clevel=1, use_dict=False, typesize=16, nthreads=1, blocksize=64000, splitmode=, filters=[, , , , , ], filters_meta=[0, 0, 0, 0, 0, 0], tuner=)"
dparams,DParams(nthreads=1)


That's all for now.  There are more examples in the [examples directory of the git repository](https://github.com/Blosc/python-blosc2/tree/main/examples/ndarray) for you to explore.  Enjoy!