Earlier we have seen, how ulab
’s functions and methods can be accessed in micropython
. This last section of the book explains, how these functions are implemented. By the end of this chapter, not only would you be able to extend ulab
, and write your own numpy
-compatible functions, but through a deeper understanding of the inner workings of the functions, you would also be able to see what the trade-offs are at the python
level.
As mentioned earlier, the python
functions are organised into sub-modules at the C level. The C sub-modules can be found in ./ulab/code/
.
ndarrays
are efficient containers of numerical data of the same type (i.e., signed/unsigned chars, signed/unsigned integers or mp_float_t
s, which, depending on the platform, are either C float
s, or C double
s). Beyond storing the actual data in the void pointer *array
, the type definition has eight additional members (on top of the base
type). Namely, the dtype
, which tells us, how the bytes are to be interpreted. Moreover, the itemsize
, which stores the size of a single entry in the array, boolean
, an unsigned integer, which determines, whether the arrays is to be treated as a set of Booleans, or as numerical data, ndim
, the number of dimensions (uint8_t
), len
, the length of the array (the number of entries), the shape (*size_t
), the strides (*int32_t
). The length is simply the product of the numbers in shape
.
The type definition is as follows:
typedef struct _ndarray_obj_t {
mp_obj_base_t base;
uint8_t dtype;
uint8_t itemsize;
uint8_t boolean;
uint8_t ndim;
size_t len;
size_t shape[ULAB_MAX_DIMS];
int32_t strides[ULAB_MAX_DIMS];
void *array;
} ndarray_obj_t;
The values of an ndarray
are stored in a contiguous segment in the RAM. The ndarray
can be dense, meaning that all numbers in the linear memory segment belong to a linar combination of coordinates, and it can also be sparse, i.e., some elements of the linear storage space will be skipped, when the elements of the tensor are traversed.
In the RAM, the position of the item M(n1, n2, ..., nk − 1, nk) in a dense tensor of rank k is given by the linear combination
\begin{equation}
P(n_1, n_2, ..., n_{k-1}, n_k) = n_1 s_1 + n_2 s_2 + ... + n_{k-1}s_{k-1} + n_ks_k = \sum_{i=1}^{k}n_is_i
\end{equation}
where si are the strides of the tensor, defined as
\begin{equation}
s_i = \prod_{j=i+1}^k l_j
\end{equation}
where lj is length of the tensor along the jth axis. When the tensor is sparse (e.g., when the tensor is sliced), the strides along a particular axis will be multiplied by a non-zero integer. If this integer is different to ± 1, the linear combination above cannot access all elements in the RAM, i.e., some numbers will be skipped. Note that |s1| > |s2| > ... > |sk − 1| > |sk|, even if the tensor is sparse. The statement is trivial for dense tensors, and it follows from the definition of si. For sparse tensors, a slice cannot have a step larger than the shape along that axis. But for dense tensors, si/si + 1 = li.
When creating a view, we simply re-calculate the strides
, and re-set the *array
pointer.
The shape
and strides
members of the array tell us how we have to move our pointer, when we want to read out the numbers. For technical reasons that will become clear later, the numbers in shape
and in strides
are aligned to the right, and begin on the right hand side, i.e., if the number of possible dimensions is ULAB_MAX_DIMS
, then shape[ULAB_MAX_DIMS-1]
is the length of the last axis, shape[ULAB_MAX_DIMS-2]
is the length of the last but one axis, and so on. If the number of actual dimensions, ndim < ULAB_MAX_DIMS
, the first ULAB_MAX_DIMS - ndim
entries in shape
and strides
will be equal to zero, but they could, in fact, be assigned any value, because these will never be accessed in an operation.
With this definition of the strides, the linear combination in P(n1, n2, ..., nk − 1, nk) is a one-to-one mapping from the space of tensor coordinates, (n1, n2, ..., nk − 1, nk), and the coordinate in the linear array, n1s1 + n2s2 + ... + nk − 1sk − 1 + nksk, i.e., no two distinct sets of coordinates will result in the same position in the linear array.
Since the strides
are given in terms of bytes, when we iterate over an array, the void data pointer is usually cast to uint8_t
, and the values are converted using the proper data type stored in ndarray->dtype
. However, there might be cases, when it makes perfect sense to cast *array
to a different type, in which case the strides
have to be re-scaled by the value of ndarray->itemsize
.
The following macro definition is taken from vector.h, and demonstrates, how we can iterate over a single array in four dimensions.
#define ITERATE_VECTOR(type, array, source, sarray) do {
size_t i=0;
do {
size_t j = 0;
do {
size_t k = 0;
do {
size_t l = 0;
do {
*(array)++ = f(*((type *)(sarray)));
(sarray) += (source)->strides[ULAB_MAX_DIMS - 1];
l++;
} while(l < (source)->shape[ULAB_MAX_DIMS-1]);
(sarray) -= (source)->strides[ULAB_MAX_DIMS - 1] * (source)->shape[ULAB_MAX_DIMS-1];
(sarray) += (source)->strides[ULAB_MAX_DIMS - 2];
k++;
} while(k < (source)->shape[ULAB_MAX_DIMS-2]);
(sarray) -= (source)->strides[ULAB_MAX_DIMS - 2] * (source)->shape[ULAB_MAX_DIMS-2];
(sarray) += (source)->strides[ULAB_MAX_DIMS - 3];
j++;
} while(j < (source)->shape[ULAB_MAX_DIMS-3]);
(sarray) -= (source)->strides[ULAB_MAX_DIMS - 3] * (source)->shape[ULAB_MAX_DIMS-3];
(sarray) += (source)->strides[ULAB_MAX_DIMS - 4];
i++;
} while(i < (source)->shape[ULAB_MAX_DIMS-4]);
} while(0)
We start with the innermost loop, the one recursing l
. array
is already of type mp_float_t
, while the source array, sarray
, has been cast to uint8_t
in the calling function. The numbers contained in sarray
have to be read out in the proper type dictated by ndarray->dtype
. This is what happens in the statement *((type *)(sarray))
, and this number is then fed into the function f
. Vectorised mathematical functions produce dense arrays, and for this reason, we can simply advance the array
pointer.
The advancing of the sarray
pointer is a bit more involving: first, in the innermost loop, we simply move forward by the amount given by the last stride, which is (source)->strides[ULAB_MAX_DIMS - 1]
, because the shape
and the strides
are aligned to the right. We move the pointer as many times as given by (source)->shape[ULAB_MAX_DIMS-1]
, which is the length of the very last axis. Hence the the structure of the loop
size_t l = 0;
do {
...
l++;
} while(l < (source)->shape[ULAB_MAX_DIMS-1]);
Once we have exhausted the last axis, we have to re-wind the pointer, and advance it by an amount given by the last but one stride. Keep in mind that in the the innermost loop we moved our pointer (source)->shape[ULAB_MAX_DIMS-1]
times by (source)->strides[ULAB_MAX_DIMS - 1]
, i.e., we re-wind it by moving it backwards by (source)->strides[ULAB_MAX_DIMS - 1] * (source)->shape[ULAB_MAX_DIMS-1]
. In the next step, we move forward by (source)->strides[ULAB_MAX_DIMS - 2]
, which is the last but one stride.
(sarray) -= (source)->strides[ULAB_MAX_DIMS - 1] * (source)->shape[ULAB_MAX_DIMS-1];
(sarray) += (source)->strides[ULAB_MAX_DIMS - 2];
This pattern must be repeated for each axis of the array, and this is how we arrive at the four nested loops listed above.
In addition to un-wrapping the iteration loops by means of macros, there is another way of traversing all elements of a tensor: we note that, since |s1| > |s2| > ... > |sk − 1| > |sk|, P(n1, n2, ..., nk − 1, nk) changes most slowly in the last coordinate. Hence, if we start from the very beginning, (ni = 0 for all i), and walk along the linear RAM segment, we increment the value of nk as long as nk < lk. Once nk = lk, we have to reset nk to 0, and increment nk − 1 by one. After each such round, nk − 1 will be incremented by one, as long as nk − 1 < lk − 1. Once nk − 1 = lk − 1, we reset both nk, and nk − 1 to 0, and increment nk − 2 by one.
Rewinding the arrays in this way is implemented in the function ndarray_rewind_array
in ndarray.c.
void ndarray_rewind_array(uint8_t ndim, uint8_t *array, size_t *shape, int32_t *strides, size_t *coords) {
// resets the data pointer of a single array, whenever an axis is full
// since we always iterate over the very last axis, we have to keep track of
// the last ndim-2 axes only
array -= shape[ULAB_MAX_DIMS - 1] * strides[ULAB_MAX_DIMS - 1];
array += strides[ULAB_MAX_DIMS - 2];
for(uint8_t i=1; i < ndim-1; i++) {
coords[ULAB_MAX_DIMS - 1 - i] += 1;
if(coords[ULAB_MAX_DIMS - 1 - i] == shape[ULAB_MAX_DIMS - 1 - i]) { // we are at a dimension boundary
array -= shape[ULAB_MAX_DIMS - 1 - i] * strides[ULAB_MAX_DIMS - 1 - i];
array += strides[ULAB_MAX_DIMS - 2 - i];
coords[ULAB_MAX_DIMS - 1 - i] = 0;
coords[ULAB_MAX_DIMS - 2 - i] += 1;
} else { // coordinates can change only, if the last coordinate changes
return;
}
}
}
and the function would be called as in the snippet below. Note that the innermost loop is factored out, so that we can save the if(...)
statement for the last axis.
size_t *coords = ndarray_new_coords(results->ndim);
for(size_t i=0; i < results->len/results->shape[ULAB_MAX_DIMS -1]; i++) {
size_t l = 0;
do {
...
l++;
} while(l < results->shape[ULAB_MAX_DIMS - 1]);
ndarray_rewind_array(results->ndim, array, results->shape, strides, coords);
} while(0)
The advantage of this method is that the implementation is independent of the number of dimensions: the iteration requires more or less the same flash space for 2 dimensions as for 22. However, the price we have to pay for this convenience is the extra function call.
Whenever we invoke a binary operator, call a function with two arguments of ndarray
type, or assign something to an ndarray
, we have to iterate over two views at the same time. The task is trivial, if the two ndarray
s in question have the same shape (but not necessarily the same set of strides), because in this case, we can still iterate in the same loop. All that happens is that we move two data pointers in sync.
The problem becomes a bit more involving, when the shapes of the two ndarray
s are not identical. For such cases, numpy
defines so-called broadcasting, which boils down to two rules.
- The shapes in the tensor with lower rank has to be prepended with axes of size 1 till the two ranks become equal.
- Along all axes the two tensors should have the same size, or one of the sizes must be 1.
If, after applying the first rule the second is not satisfied, the two ndarray
s cannot be broadcast together.
Now, let us suppose that we have two compatible ndarray
s, i.e., after applying the first rule, the second is satisfied. How do we iterate over the elements in the tensors?
We should recall, what exactly we do, when iterating over a single array: normally, we move the data pointer by the last stride, except, when we arrive at a dimension boundary (when the last axis is exhausted). At that point, we move the pointer by an amount dictated by the strides. And this is the key: dictated by the strides. Now, if we have two arrays that are originally not compatible, we define new strides for them, and use these in the iteration. With that, we are back to the case, where we had two compatible arrays.
Now, let us look at the second broadcasting rule: if the two arrays have the same size, we take both ndarray
s’ strides along that axis. If, on the other hand, one of the ndarray
s is of length 1 along one of its axes, we set the corresponding strides to 0. This will ensure that that data pointer is not moved, when we iterate over both ndarray
s at the same time.
Thus, in order to implement broadcasting, we first have to check, whether the two above-mentioned rules can be satisfied, and if so, we have to find the two new sets strides.
The ndarray_can_broadcast
function from ndarray.c takes two ndarray
s, and returns true
, if the two arrays can be broadcast together. At the same time, it also calculates new strides for the two arrays, so that they can be iterated over at the same time.
bool ndarray_can_broadcast(ndarray_obj_t *lhs, ndarray_obj_t *rhs, uint8_t *ndim, size_t *shape, int32_t *lstrides, int32_t *rstrides) {
// returns True or False, depending on, whether the two arrays can be broadcast together
// numpy's broadcasting rules are as follows:
//
// 1. the two shapes are either equal
// 2. one of the shapes is 1
memset(lstrides, 0, sizeof(size_t)*ULAB_MAX_DIMS);
memset(rstrides, 0, sizeof(size_t)*ULAB_MAX_DIMS);
lstrides[ULAB_MAX_DIMS - 1] = lhs->strides[ULAB_MAX_DIMS - 1];
rstrides[ULAB_MAX_DIMS - 1] = rhs->strides[ULAB_MAX_DIMS - 1];
for(uint8_t i=ULAB_MAX_DIMS; i > 0; i--) {
if((lhs->shape[i-1] == rhs->shape[i-1]) || (lhs->shape[i-1] == 0) || (lhs->shape[i-1] == 1) ||
(rhs->shape[i-1] == 0) || (rhs->shape[i-1] == 1)) {
shape[i-1] = MAX(lhs->shape[i-1], rhs->shape[i-1]);
if(shape[i-1] > 0) (*ndim)++;
if(lhs->shape[i-1] < 2) {
lstrides[i-1] = 0;
} else {
lstrides[i-1] = lhs->strides[i-1];
}
if(rhs->shape[i-1] < 2) {
rstrides[i-1] = 0;
} else {
rstrides[i-1] = rhs->strides[i-1];
}
} else {
return false;
}
}
return true;
}
A good example of how the function would be called can be found in vector.c, in the vector_arctan2
function:
mp_obj_t vectorise_arctan2(mp_obj_t y, mp_obj_t x) {
...
uint8_t ndim = 0;
size_t *shape = m_new(size_t, ULAB_MAX_DIMS);
int32_t *xstrides = m_new(int32_t, ULAB_MAX_DIMS);
int32_t *ystrides = m_new(int32_t, ULAB_MAX_DIMS);
if(!ndarray_can_broadcast(ndarray_x, ndarray_y, &ndim, shape, xstrides, ystrides)) {
mp_raise_ValueError(translate("operands could not be broadcast together"));
m_del(size_t, shape, ULAB_MAX_DIMS);
m_del(int32_t, xstrides, ULAB_MAX_DIMS);
m_del(int32_t, ystrides, ULAB_MAX_DIMS);
}
uint8_t *xarray = (uint8_t *)ndarray_x->array;
uint8_t *yarray = (uint8_t *)ndarray_y->array;
ndarray_obj_t *results = ndarray_new_dense_ndarray(ndim, shape, NDARRAY_FLOAT);
mp_float_t *rarray = (mp_float_t *)results->array;
...
After the new strides have been calculated, the iteration loop is identical to what we discussed in the previous section.
There are many operations that reduce the number of dimensions of an ndarray
by 1, i.e., that remove an axis from the tensor. The drill is the same as before, with the exception that first we have to remove the strides
and shape
that corresponds to the axis along which we intend to contract. The numerical_reduce_axes
function from numerical.c does that.
static void numerical_reduce_axes(ndarray_obj_t *ndarray, int8_t axis, size_t *shape, int32_t *strides) {
// removes the values corresponding to a single axis from the shape and strides array
uint8_t index = ULAB_MAX_DIMS - ndarray->ndim + axis;
if((ndarray->ndim == 1) && (axis == 0)) {
index = 0;
shape[ULAB_MAX_DIMS - 1] = 0;
return;
}
for(uint8_t i = ULAB_MAX_DIMS - 1; i > 0; i--) {
if(i > index) {
shape[i] = ndarray->shape[i];
strides[i] = ndarray->strides[i];
} else {
shape[i] = ndarray->shape[i-1];
strides[i] = ndarray->strides[i-1];
}
}
}
Once the reduced strides
and shape
are known, we place the axis in question in the innermost loop, and wrap it with the loops, whose coordinates are in the strides
, and shape
arrays. The RUN_STD
macro from numerical.h is a good example. The macro is expanded in the numerical_sum_mean_std_ndarray
function.
static mp_obj_t numerical_sum_mean_std_ndarray(ndarray_obj_t *ndarray, mp_obj_t axis, uint8_t optype, size_t ddof) {
uint8_t *array = (uint8_t *)ndarray->array;
size_t *shape = m_new(size_t, ULAB_MAX_DIMS);
memset(shape, 0, sizeof(size_t)*ULAB_MAX_DIMS);
int32_t *strides = m_new(int32_t, ULAB_MAX_DIMS);
memset(strides, 0, sizeof(uint32_t)*ULAB_MAX_DIMS);
int8_t ax = mp_obj_get_int(axis);
if(ax < 0) ax += ndarray->ndim;
if((ax < 0) || (ax > ndarray->ndim - 1)) {
mp_raise_ValueError(translate("index out of range"));
}
numerical_reduce_axes(ndarray, ax, shape, strides);
uint8_t index = ULAB_MAX_DIMS - ndarray->ndim + ax;
ndarray_obj_t *results = NULL;
uint8_t *rarray = NULL;
...
Here is the macro for the three-dimensional case:
#define RUN_STD(ndarray, type, array, results, r, shape, strides, index, div) do {
size_t k = 0;
do {
size_t l = 0;
do {
RUN_STD1((ndarray), type, (array), (results), (r), (index), (div));
(array) -= (ndarray)->strides[(index)] * (ndarray)->shape[(index)];
(array) += (strides)[ULAB_MAX_DIMS - 1];
l++;
} while(l < (shape)[ULAB_MAX_DIMS - 1]);
(array) -= (strides)[ULAB_MAX_DIMS - 2] * (shape)[ULAB_MAX_DIMS-2];
(array) += (strides)[ULAB_MAX_DIMS - 3];
k++;
} while(k < (shape)[ULAB_MAX_DIMS - 2]);
} while(0)
In RUN_STD
, we simply move our pointers; the calculation itself happens in the RUN_STD1
macro below. (Note that this is the implementation of the numerically stable Welford algorithm.)
#define RUN_STD1(ndarray, type, array, results, r, index, div)
({
mp_float_t M, m, S = 0.0, s = 0.0;
M = m = *(mp_float_t *)((type *)(array));
for(size_t i=1; i < (ndarray)->shape[(index)]; i++) {
(array) += (ndarray)->strides[(index)];
mp_float_t value = *(mp_float_t *)((type *)(array));
m = M + (value - M) / (mp_float_t)i;
s = S + (value - M) * (value - m);
M = m;
S = s;
}
(array) += (ndarray)->strides[(index)];
*(r)++ = MICROPY_FLOAT_C_FUN(sqrt)((ndarray)->shape[(index)] * s / (div));
})
When in an operation the dtype
s of two arrays are different, the result’s dtype
will be decided by the following upcasting rules:
- Operations with two
ndarray
s of the samedtype
preserve theirdtype
, even when the results overflow. - if either of the operands is a float, the result automatically becomes a float
- otherwise
uint8
+int8
=>int16
,uint8
+int16
=>int16
uint8
+uint16
=>uint16
int8
+int16
=>int16
int8
+uint16
=>uint16
(in numpy, the result is aint32
)uint16
+int16
=>float
(in numpy, the result is aint32
)
- When one operand of a binary operation is a generic scalar
micropython
variable, i.e.,mp_obj_int
, ormp_obj_float
, it will be converted to a linear array of length 1, and with the smallestdtype
that can accommodate the variable in question. After that the broadcasting rules apply, as described in the section Iterating over two ndarrays simultaneously: broadcasting
Upcasting is resolved in place, wherever it is required. Notable examples can be found in ndarray_operators.c
An ndarray
can be indexed with three types of objects: integer scalars, slices, and another ndarray
, whose elements are either integer scalars, or Booleans. Since slice and integer indices can be thought of as modifications of the strides
, these indices return a view of the ndarray
. This statement does not hold for ndarray
indices, and therefore, the return a copy of the array.
The user
module is disabled by default, as can be seen from the last couple of lines of ulab.h
// user-defined module
#ifndef ULAB_USER_MODULE
#define ULAB_USER_MODULE (0)
#endif
The module contains a very simple function, user_dummy
, and this function is bound to the module itself. In other words, even if the module is enabled, one has to import
:
import ulab
from ulab import user
user.dummy_function(2.5)
which should just return 5.0. Even if numpy
-compatibility is required (i.e., if most functions are bound at the top level to ulab
directly), having to import
the module has a great advantage. Namely, only the user.h and user.c files have to be modified, thus it should be relatively straightforward to update your local copy from github.
Now, let us see, how we can add a more meaningful function.
In the General comments sections we have seen the type definition of an ndarray
. This structure can be generated by means of a couple of functions listed in ndarray.c.
The ndarray_new_ndarray
functions is called by all other array-generating functions. It takes the number of dimensions, ndim
, a uint8_t
, the shape
, a pointer to size_t
, the strides
, a pointer to int32_t
, and dtype
, another uint8_t
as its arguments, and returns a new array with all entries initialised to 0.
Assuming that ULAB_MAX_DIMS > 2
, a new dense array of dimension 3, of shape
(3, 4, 5), of strides
(1000, 200, 10), and dtype
uint16_t
can be generated by the following instructions
size_t *shape = m_new(size_t, ULAB_MAX_DIMS);
shape[ULAB_MAX_DIMS - 1] = 5;
shape[ULAB_MAX_DIMS - 2] = 4;
shape[ULAB_MAX_DIMS - 3] = 3;
int32_t *strides = m_new(int32_t, ULAB_MAX_DIMS);
strides[ULAB_MAX_DIMS - 1] = 10;
strides[ULAB_MAX_DIMS - 2] = 200;
strides[ULAB_MAX_DIMS - 3] = 1000;
ndarray_obj_t *new_ndarray = ndarray_new_ndarray(3, shape, strides, NDARRAY_UINT16);
The functions simply calculates the strides
from the shape
, and calls ndarray_new_ndarray
. Assuming that ULAB_MAX_DIMS > 2
, a new dense array of dimension 3, of shape
(3, 4, 5), and dtype
mp_float_t
can be generated by the following instructions
size_t *shape = m_new(size_t, ULAB_MAX_DIMS);
shape[ULAB_MAX_DIMS - 1] = 5;
shape[ULAB_MAX_DIMS - 2] = 4;
shape[ULAB_MAX_DIMS - 3] = 3;
ndarray_obj_t *new_ndarray = ndarray_new_dense_ndarray(3, shape, NDARRAY_FLOAT);
Since the dimensions of a linear array are known (1), the ndarray_new_linear_array
takes the length
, a size_t
, and the dtype
, an uint8_t
. Internally, ndarray_new_linear_array
generates the shape
array, and calls ndarray_new_dense_array
with ndim = 1
.
A linear array of length 100, and dtype
uint8
could be created by the function call
ndarray_obj_t *new_ndarray = ndarray_new_linear_array(100, NDARRAY_UINT8)
This function takes a tuple
, which should hold the lengths of the axes (in other words, the shape
), and the dtype
, and calls internally ndarray_new_dense_array
. A new ndarray
can be generated by calling
ndarray_obj_t *new_ndarray = ndarray_new_ndarray_from_tuple(shape, NDARRAY_FLOAT);
where shape
is a tuple.
This function crates a view, and takes the source, an ndarray
, the number of dimensions, an uint8_t
, the shape
, a pointer to size_t
, the strides
, a pointer to int32_t
, and the offset, an int32_t
as arguments. The offset is the number of bytes by which the void array
pointer is shifted. E.g., the python
statement
a = np.array([0, 1, 2, 3, 4, 5], dtype=uint8)
b = a[1::2]
produces the array
array([1, 3, 5], dtype=uint8)
which holds its data at position x0 + 1
, if a
’s pointer is at x0
. In this particular case, the offset is 1.
The array b
from the example above could be generated as
size_t *shape = m_new(size_t, ULAB_MAX_DIMS);
shape[ULAB_MAX_DIMS - 1] = 3;
int32_t *strides = m_new(int32_t, ULAB_MAX_DIMS);
strides[ULAB_MAX_DIMS - 1] = 2;
int32_t offset = 1;
uint8_t ndim = 1;
ndarray_obj_t *new_ndarray = ndarray_new_view(ndarray_a, ndim, shape, strides, offset);
The ndarray_copy_array
function can be used for copying the contents of an array. Note that the target array has to be created beforehand. E.g., a one-to-one copy can be gotten by
ndarray_obj_t *new_ndarray = ndarray_new_ndarray(source->ndim, source->shape, source->strides, source->dtype);
ndarray_copy_array(source, new_ndarray);
Note that the function cannot be used for forcing type conversion, i.e., the input and output types must be identical, because the function simply calls the memcpy
function. On the other hand, the input and output strides
do not necessarily have to be equal.
The ndarray_obj_t *new_ndarray = ...
instruction can be saved by calling the ndarray_copy_view
function with the single source
argument.
Having seen, how arrays can be generated and copied, it is time to look at how the data in an ndarray
can be accessed and modified.
For starters, let us suppose that the object in question comes from the user (i.e., via the micropython
interface), First, we have to acquire a pointer to the ndarray
by calling
ndarray_obj_t *ndarray = MP_OBJ_TO_PTR(object_in);
If it is not clear, whether the object is an ndarray
(e.g., if we want to write a function that can take ndarray
s, and other iterables as its argument), we find this out by evaluating
mp_obj_is_type(object_in, &ulab_ndarray_type)
which should return true
. Once the pointer is at our disposal, we can get a pointer to the underlying numerical array as discussed earlier, i.e.,
uint8_t *array = (uint8_t *)ndarray->array;
If you need to find out the dtype
of the array, you can get it by accessing the dtype
member of the ndarray
, i.e.,
ndarray->dtype
should be equal to B
, b
, H
, h
, or f
. The size of a single item is stored in the itemsize
member. This number should be equal to 1, if the dtype
is B
, or b
, 2, if the dtype
is H
, or h
, 4, if the dtype
is f
, and 8 for d
.
In the next section, we will construct a function that generates the element-wise square of a dense array, otherwise, raises a TypeError
exception. Dense arrays can easily be iterated over, since we do not have to care about the shape
and the strides
. If the array is sparse, the section Iterating over elements of a tensor should contain hints as to how the iteration can be implemented.
The function is listed under user.c. The user
module is bound to ulab
in ulab.c in the lines
#if ULAB_USER_MODULE
{ MP_ROM_QSTR(MP_QSTR_user), MP_ROM_PTR(&ulab_user_module) },
#endif
which assumes that at the very end of ulab.h the
// user-defined module
#ifndef ULAB_USER_MODULE
#define ULAB_USER_MODULE (1)
#endif
constant has been set to 1. After compilation, you can call a particular user
function in python
by importing the module first, i.e.,
from ulab import numpy as np
from ulab import user
user.some_function(...)
This separation of user-defined functions from the rest of the code ensures that the integrity of the main module and all its functions are always preserved. Even in case of a catastrophic failure, you can exclude the user
module, and start over.
And now the function:
static mp_obj_t user_square(mp_obj_t arg) {
// the function takes a single dense ndarray, and calculates the
// element-wise square of its entries
// raise a TypeError exception, if the input is not an ndarray
if(!mp_obj_is_type(arg, &ulab_ndarray_type)) {
mp_raise_TypeError(translate("input must be an ndarray"));
}
ndarray_obj_t *ndarray = MP_OBJ_TO_PTR(arg);
// make sure that the input is a dense array
if(!ndarray_is_dense(ndarray)) {
mp_raise_TypeError(translate("input must be a dense ndarray"));
}
// if the input is a dense array, create `results` with the same number of
// dimensions, shape, and dtype
ndarray_obj_t *results = ndarray_new_dense_ndarray(ndarray->ndim, ndarray->shape, ndarray->dtype);
// since in a dense array the iteration over the elements is trivial, we
// can cast the data arrays ndarray->array and results->array to the actual type
if(ndarray->dtype == NDARRAY_UINT8) {
uint8_t *array = (uint8_t *)ndarray->array;
uint8_t *rarray = (uint8_t *)results->array;
for(size_t i=0; i < ndarray->len; i++, array++) {
*rarray++ = (*array) * (*array);
}
} else if(ndarray->dtype == NDARRAY_INT8) {
int8_t *array = (int8_t *)ndarray->array;
int8_t *rarray = (int8_t *)results->array;
for(size_t i=0; i < ndarray->len; i++, array++) {
*rarray++ = (*array) * (*array);
}
} else if(ndarray->dtype == NDARRAY_UINT16) {
uint16_t *array = (uint16_t *)ndarray->array;
uint16_t *rarray = (uint16_t *)results->array;
for(size_t i=0; i < ndarray->len; i++, array++) {
*rarray++ = (*array) * (*array);
}
} else if(ndarray->dtype == NDARRAY_INT16) {
int16_t *array = (int16_t *)ndarray->array;
int16_t *rarray = (int16_t *)results->array;
for(size_t i=0; i < ndarray->len; i++, array++) {
*rarray++ = (*array) * (*array);
}
} else { // if we end up here, the dtype is NDARRAY_FLOAT
mp_float_t *array = (mp_float_t *)ndarray->array;
mp_float_t *rarray = (mp_float_t *)results->array;
for(size_t i=0; i < ndarray->len; i++, array++) {
*rarray++ = (*array) * (*array);
}
}
// at the end, return a micropython object
return MP_OBJ_FROM_PTR(results);
}
To summarise, the steps for implementing a function are
- If necessary, inspect the type of the input object, which is always a
mp_obj_t
object - If the input is an
ndarray_obj_t
, acquire a pointer to it by callingndarray_obj_t *ndarray = MP_OBJ_TO_PTR(arg);
- Create a new array, or modify the existing one; get a pointer to the data by calling
uint8_t *array = (uint8_t *)ndarray->array;
, or something equivalent - Once the new data have been calculated, return a
micropython
object by callingMP_OBJ_FROM_PTR(...)
.
The listing above contains the implementation of the function, but as such, it cannot be called from python
: it still has to be bound to the name space. This we do by first defining a function object in
MP_DEFINE_CONST_FUN_OBJ_1(user_square_obj, user_square);
micropython
defines a number of MP_DEFINE_CONST_FUN_OBJ_N
macros in obj.h. N
is always the number of arguments the function takes. We had a function definition static mp_obj_t user_square(mp_obj_t arg)
, i.e., we dealt with a single argument.
Finally, we have to bind this function object in the globals table of the user
module:
STATIC const mp_rom_map_elem_t ulab_user_globals_table[] = {
{ MP_OBJ_NEW_QSTR(MP_QSTR___name__), MP_OBJ_NEW_QSTR(MP_QSTR_user) },
{ MP_OBJ_NEW_QSTR(MP_QSTR_square), (mp_obj_t)&user_square_obj },
};
Thus, the three steps required for the definition of a user-defined function are
- The low-level implementation of the function itself
- The definition of a function object by calling MP_DEFINE_CONST_FUN_OBJ_N()
- Binding this function object to the namespace in the
ulab_user_globals_table[]