ndslice.algorithm #4652

9il · 2016-07-25T18:42:44Z

PR is available in Mir >=v0.16.0-alpha7
Benchmarks: https://github.com/libmir/mir/tree/master/benchmarks/ndslice
Docs: http://docs.mir.dlang.io/latest/mir_ndslice_algorithm.html

Difference with `std.algorithm`

It supports multidimensional iteration and map (ndMap) out of the box
It supports multidimensional zip analog - assumeSameStructure, which is faster
It allows to iterate efficiently multiple tensors without zip/assumeSameStructure
It allows to use vectorization if the last stride == 1 (two versions of algorithms with RT check).
It allows to select triangular part of the data or half of the data

ndslice.algorithm

ndCanFindNeedle and ndFindNeedle was removed.

ndslice.slice

optimize memory allocations using ndFold.
add opCmp using ndCmp

ndslice.package

Update example

Asked Questions

What if I want to implement my own algorithm that is not part of Phobos? It seems like I would want to be able to use LikePtr in such a situation to interface seamlessly with the rest of ndslice.

You can use random access range interface and sliced function. @LikePtr is for optimisation reasons. It may be exposed in the future, but I need to write a proper documentation of how it should be used first. The number of use cases for @LikePtr is very small, and all of them seems should be a part of the package.

To come back to the bigger picture: Do we really need nd versions of all "scalar" functions like each, all, any and so on?

Yes, but not all "scalar" functions should be adopted. Current list of functions is enough.
For example:

We do not need special sorting algorithms, because we need flattened slice anyway. If a slice is not flattened user may use byElement to get random access range of all elements or make a copy/index.
We don't need ndSwap(a, b) because we can just do ndEach!swap(a, b).
We don't need ndCount because it can be easily implemented with ndReduce.

I understand why map might be a bit non-trivial – it needs to preserve the input structure in the output. But wouldn't it be much more powerful and DRYer to offer a "flat" range interface for the operations for which structure is irrelevant? For example, in pseudocode ndAll!fun(a, b) == all!fun(zip(a.flattened, b.flattened)) (discounting for tuple expansion of the fun parameters). I suppose something like that probably even exists in ndslice already.

Yes, for performance reasons. flattened is called byElement, it is selector operator.

I'm sure this has crossed your mind while designing the functions – why did you opt to go for the special ndslice variants instead? Potentially using the extra structural information for extra performance? (If it only comes down to auto-vectorization, that shouldn't really matter, though.) You could also think about adding annotation forwarding support of some kind (extra members, UDAs, …) to the length-preserving std.algorithm functions to propagate along the structure information.

I add this module because performance reasons. N is dimension count:

byElement requires additional computations :
- One more add operations for forward access (empty, front, popFront).
- N-1 div/mod pairs plus 2*N-1 more add for random and backward access. div/mod are expensive.
byElement requires N+1 additional general purpose registers. This is bad for iterating few matrixes in the lockstep.
ndX implementations has multidimension do-while loop optimisation.

ndMap could also be implemented by reshape ∘ map ∘ reshape, I suppose.

This is true only for Slices, which have continues in-memory representation. Numpy just copies all ndarray, if the ndarray can not be reshaped.

The result of ndMap is a Slice. The result of map is the range, and if we want to use ndslice module with map, we need to sliced again. The real problem is that byElement has slow random access primitive to build a slice on top :

auto lazyMap = sl.byElement.map!fun.sliced(sl.shape);

auto lazyMap = sl.ndMap!fun;

In the same time #4647 provides vectorised operations like +=, this PR also have vectorised algorithms. In addition, Mir will have BLAS soon, see current benchmark. Mir has the same performance for large matrixes as OpenBLAS, this means that Mir's generic gemm SIMD kernels are optimal. For small matrixes I need to write generic SIMD transposition kernels for packing, but this is small effort comparing with gemm SIMD kernels.

Being very much a bystander only—so this might be entirely unfounded—it seems like there is some danger of ndslice bit-by-bit involving into its own little parallel universe. To put it in a slightly more cynical way, if we can't use standard ranges to implement any on a multidimensional array ourselves, we might need to revise our marketing materials.

I don't think so. ndslice extends our universe. We have different types of ranges. ndslice is just the next after random access range:

Any slice is random access range composed of elements with dimension equals to N-1.
byElement is random access range, with fast forward iteration (as much fast as possible with Range API).
Slice can be created on top of any random access range with length. For example DOK format for sparse tensors was implemented in Mir, just creating a tiny wrapper on standard associative arrays.
ndX functions provide optimal multidimensional iteration pattern. We don't need ndSwap for example, because we can just write ndEach!swap. So, please do not think that I am rewriting std.algorithm: current functions provide only optimal loop construction with different stop criteria.

9il · 2016-07-25T20:23:36Z

ping @John-Colvin @wilzbach: multidimensional map is here :-)

PetarKirov · 2016-07-26T06:01:33Z

std/experimental/ndslice/slice.d

@@ -2039,7 +2030,8 @@ struct Slice(size_t _N, _Range)
        }

        static if (doUnittest)
-        pure nothrow unittest
+        //pure nothrow


What prevents this test from being pure and nothrow?

Thanks, I forgot to uncomment :-)

PetarKirov · 2016-07-26T10:50:12Z

std/experimental/ndslice/slice.d

@@ -2934,7 +2941,7 @@ unittest

 private enum isSlicePointer(T) = isPointer!T || is(T : PtrShell!R, R);

-private struct LikePtr {}
+package struct LikePtr {}


Why is this attribute needed? Doesn't capability introspection suffice?

This is clear way and for internal use only. User defined ranges may brake introspection for pointer like behavior

What if I want to implement my own algorithm that is not part of Phobos? It seems like I would want to be able to use LikePtr in such a situation to interface seamlessly with the rest of ndslice.

@klickverbot You can use random access range interface and sliced function. @LikePtr is for optimisation reasons. It may be exposed in the future, but I need to write a proper documentation of how it should be used first. The number of use cases for @LikePtr is very small, and all of them seems should be a part of the package.

JackStouffer · 2016-07-26T16:06:51Z

First, is it really nessesary for all of the planned stuff to be in this one PR? You could build up the module PR by PR in order to make this easier to review. Even though it might seem odd to have a module with only one function in it, that would only be temporary and that's fine for std.experimental.

Secondly, I have a slight problem with the names of the planned functions. One of the main benefits of ndslice that I laid out in my article is that it feels like it's a part of the rest of the language because you don't have to use a whole bunch specialized functions in order to get the functionality you want. This is one of my main gripes with numpy: in your code you have numpy code, and then you have Python code, and they don't mix well.

With these planned functions, instead of doing the natural thing someSlice.map!func you have to remember to do someSlice.ndmap!func. This is easily remedied, just use the same names as the respective std.algorithm functions so it feels like you're still using the normal range functions; no special casing required.

dnadlinger · 2016-07-26T16:15:40Z

I would also be curious about the rationale behind the ndXXX naming scheme, which, without further context, looks a lot like a design mistake.

9il · 2016-07-26T16:22:11Z

First, is it really nessesary for all of the planned stuff to be in this one PR?

No, only ndslice.computation. The list is just a plan for set of PRs.

Secondly, I have a slight problem with the names of the planned functions. One of the main benefits of ndslice that I laid out in my article is that it feels like it's a part of the rest of the language because you don't have to use a whole bunch specialized functions in order to get the functionality you want. This is one of my main gripes with numpy: in your code you have numpy code, and then you have Python code, and they don't mix well.

With these planned functions, instead of doing the natural thing someSlice.map!func you have to remember to do someSlice.ndmap!func. This is easily remedied, just use the same names as the respective std.algorithm functions so it feels like you're still using the normal range functions; no special casing required.

Both map and ndmap are valid for ndslices, but they are doing different things. ndmap is not an optimised map for Slice, but multidimensional map. You can still use map for slices if you import all ndslice package. std.algorithm provides only single dimension logic. With new module ndslice would be consistent with other Phobos anyway. In NumPy you can not write fast code in python, if this code requires index access. In ndslice, you can do all things in single language. The module provides efficient multidimensional functional style subroutines.

9il · 2016-07-26T16:35:04Z

I would also be curious about the rationale behind the ndXXX naming scheme, which has the smell of a design mistake.

@klickverbot As was wrote for @JackStouffer both map and ndmap are valid:

Original matrix

0 1 2
3 4 5
6 7 8

`map` for 2D case

The result is random access range.

fun(0 1 2)
fun(3 4 5)
fun(6 7 8)

`ndmap` for 2D case

The result is 2D slice, all operations on slices can be used.

fun(0) fun(1) fun(2)
fun(3) fun(4) fun(5)
fun(6) fun(7) fun(8)

ndmap is the most convenient name for me, and I would like to avoid Camel case names for basic subroutines if possible. This is much more simpler to keep all logic in mind if a name is like chinese hieroglyph. This is common for math. For example, findRoot is better then fr, but spr2 is better then symmetricPackedRank2Operation. nd prefix is already used in the package name and ndarray function (the same name exists in NumPy).

9il · 2016-07-26T16:42:18Z

The naming logic is:

verb + ed - names that change lengths and strides for a slice (iteration module)
noun - an alternative type of Slice or iterator for the same data (selection module)
nd + noun - lazy computation (computation module)
nd + verb - computation (computation module)

wilzbach · 2016-07-26T23:44:20Z

std/experimental/ndslice/computation.d

+Macros:
+SUBREF = $(REF_ALTTEXT $(TT $2), $2, std,experimental, ndslice, $1)$(NBSP)
+T2=$(TR $(TDNW $(LREF $1)) $(TD $+))
+T4=$(TR $(TDNW $(LREF $1)) $(TD $2) $(TD $3) $(TD $4))


nitpick: this is an unused macro

dnadlinger · 2016-07-27T00:21:57Z

To put the following opinions into context, I'm a physicist by trade and rather more familiar with NumPy and Julia than I'd like, having also taught numerical programming classes before. I haven't had a chance to really familiarise myself with your ndslice work yet, though.

First of all, I get that iteration over multidimensional slices is difficult to reconcile with simple ranges, especially for "structure-preserving" operations like map. I was hoping for some clever design that marries slices more closely to the standard range algorithms, but maybe there is just none to be found. A couple of points, though:

How do I map over just some dimensions (e.g. rows or columns in a 2D matrix)?

For example, […] spr2 is better then symmetricPackedRank2Operation.

I'll have to strongly disagree on that. spr2 is a horrible name. It only works because it is part of a certain set of operations that have been around long enough that the people couldn't do better when naming them, and which some people have memorised. I'd much rather see descriptive names for the raw kernels (although your alternative is of course exaggerated), and then a high-level interface that removes the details easily inferred from argument types/count from the names – maybe even converting it into operators altogether.

ndmap is the most convenient name for me, and I would like to avoid Camel case names for basic subroutines if possible.

I'm missing the argument for breaking with Phobos conventions, which provide uniformity for users and hence decreases the cognitive overload, as they can guess the name (CamelCase).

wilzbach · 2016-07-27T00:27:43Z

ping @John-Colvin @wilzbach: multidimensional map is here :-)

Looks great - I started to play with it & it feels exactly how I imagined it. Great job!

Both map and ndmap are valid for ndslices, but they are doing different things. ndmap is not an optimised map for Slice, but multidimensional map.

👍 from my side. Maybe you should add that further abstractions can be done with pack and other high-level selection routines.

ndfold (multiple functions and seeds)
ndreduce (multiple arguments, single seed)

This could be very confusing. Can't you have two overloads?

How do I map over just some dimensions (e.g. rows or columns in a 2D matrix)?

You can use packed slices, see my comment to the unittest ;-)

9il · 2016-07-27T06:32:11Z

How do I map over just some dimensions (e.g. rows or columns in a 2D matrix)?

ndslice is much more powerful then NumPy and Julia in case of dimension abstraction. It has zero cost abstraction for tensors composed of tensors (composed of tensors and etc). See the last example for diagonal, (3D, diagonal plain). You can use pack to pack dimensions (tensor of tensors) and transposed, swapped, and evertPack to swap dimension.

9il · 2016-07-27T06:39:11Z

I am open for alternative function names
Please suggest your variants!

9il · 2016-07-27T06:41:49Z

ndfold (multiple functions and seeds)
ndreduce (multiple arguments, single seed)
This could be very confusing. Can't you have two overloads?

The reason that have different argument order:
ndfold - first argument is a slice
ndreduce first argument is a seed

9il · 2016-08-03T19:39:55Z

You need a random access range, not a RoR.

9il · 2016-08-03T19:41:06Z

Your range in not flat and it is not random access.

JackStouffer · 2016-08-03T19:43:23Z

RoR are not supported of course
Your range in not flat and it is not random access.

Ok, seems like I need to allocate an array. Thanks.

9il · 2016-08-09T13:06:59Z

@klickverbot I have added Select enumeration, which can be used to reverse or transpose data, and can be used for many matrix subroutines, for example to check if a matrix is symmetric. So noit has 5 significant differences comparing with std.algorithm:

It supports multidimensional iteration and map out of the box
It supports zip analog - assumeSameStructure, which is faster
It allows to iterate efficiently multiple tensors without zip
It allows to add vectorization if the last stride == 1 (two versions of algorithms with RT check).
It allows to select triangular part of the data or half of the data

wilzbach · 2016-08-23T18:49:03Z

On the subject of naming ndMap vs. map. I fleshed out my point about map with a template constraint a little more in the link below. The way I did it allows both map and ndMap to be called.

I recall there was quite a discussion on the name issue and I think @9il made a strong point for nd, but there was no consensus on it. Could the advocates of map please rise again and make their point?

jmh530 · 2016-08-23T19:06:38Z

@wilzbach I think @JackStouffer above made the point originally and most succinctly.

Secondly, I have a slight problem with the names of the planned functions. One of the main benefits of ndslice that I laid out in my article is that it feels like it's a part of the rest of the language because you don't have to use a whole bunch specialized functions in order to get the functionality you want. This is one of my main gripes with numpy: in your code you have numpy code, and then you have Python code, and they don't mix well.

Think about the impact on code refactoring. Want to change from slices to ndslices, now you have to change every place you use an std.algorithm function.

Also, suppose you're writing some generic code that should work on any input range. This naming approach requires template specialization in order to get it work with ndslices.

Personally, I would prefer what is in ndslice.algorithm to eventually be a part of std.algorithm. That way I can just import std.algorithm : map and go to town.

Nevertheless, I think a those favoring naming things ndMap, etc, made a good point with respect to the fact that the behavior of map is different for ndMap. Thus, I think there should be some kind of way to toggle between them (perhaps there's a better way than what I suggested above).

wilzbach · 2016-08-24T02:11:31Z

Personally, I would prefer what is in ndslice.algorithm to eventually be a part of std.algorithm. That way I can just import std.algorithm : map and go to town.

I agree, that would be quite sweet, but I think there is still the problem of multi-dimensional mapping of ndslice.map and one-dimensional map in the std.algorithm.iteration.map case. However if as @jmh530 suggested we can make the assumption that a user always wants to use the multidimensional ndslice map for slices, this could work well. For example with pack a user could choose the dimensions he wants to use for n-mapping? In practice this could then look like this:

#!/usr/bin/env dub
/+ dub.sdl:
name "mir_test"
dependency "mir" version="~>0.16.0-beta2"
+/

void main() {
    import mir.ndslice;
    import std.stdio;
    import std.algorithm;

    auto a = iotaSlice(2, 3).ndMap!("a + a");
    writeln(a); // [[1, 2, 3], [4, 5, 6]]

    auto b = iotaSlice(2, 3).pack!1.ndMap!(a => a.maxPos.front); 
    writeln(b); // [2, 5]
}

@9il what do you think about this?

I think I can remember a discussion on a different PR where it was agreed that internally using std.experimental is absolutely okay.

9il · 2016-08-24T04:31:24Z

@9il what do you think about this?

This would not work for all other functions. API can be found here http://docs.mir.dlang.io/latest/mir_ndslice_algorithm.html

9il · 2016-09-14T11:18:44Z

#4781 contains ndMap renamed to mapSlice and located in ndslice.selection. This PR will be closed until DMD has new pragmas. Please help to move forward with new ndslice PRs!

9il added Blocking Other Work ndslice labels Jul 25, 2016

9il mentioned this pull request Jul 25, 2016

multi-dimensional folding libmir/mir#209

Closed

9il removed the Blocking Other Work label Jul 26, 2016

9il changed the title ~~add ndmap~~ [WIP] ndslice.computation Jul 26, 2016

9il changed the title ~~[WIP] ndslice.computation~~ [WIP] ndslice.computation & ndslice.searching Jul 26, 2016

PetarKirov reviewed Jul 26, 2016
View reviewed changes

9il changed the title ~~[WIP] ndslice.computation & ndslice.searching~~ [WIP] ndslice.computation Jul 26, 2016

PetarKirov reviewed Jul 26, 2016
View reviewed changes

wilzbach reviewed Jul 26, 2016
View reviewed changes

9il added the Needs Review label Jul 27, 2016

9il changed the title ~~[WIP] ndslice.computation~~ ndslice.computation Jul 27, 2016

9il added the @andralex Approval from Andrei is required label Jul 28, 2016

9il changed the title ~~ndslice.computation~~ [WIP] ndslice.algorithm Jul 28, 2016

9il removed @andralex Approval from Andrei is required Needs Review labels Jul 28, 2016

9il added WIP Work In Progress - not ready for review or pulling and removed WIP Work In Progress - not ready for review or pulling labels Aug 4, 2016

9il added 13 commits August 9, 2016 13:30

add ndslice.algorithm

9825987

add imports

e3f3e90

fix link

463363b

fix style

25396f8

minor optimisation

d2f2734

add fast math flag to lambdas

1bb49f8

fix internal

7aaeb74

fix tests

fa9ac7e

add Select

748a41e

update docs

89508a3

fix imports

b1f435b

update unittest

99ef5d4

update style

00bce06

9il removed the WIP Work In Progress - not ready for review or pulling label Aug 9, 2016

update docs

02bdfc9

make anyEmpty const

0337a1a

9il mentioned this pull request Sep 14, 2016

add mapSlice and fix Issue 16501 #4781

Merged

9il closed this Sep 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ndslice.algorithm #4652

ndslice.algorithm #4652

9il commented Jul 25, 2016 •

edited

Loading

9il commented Jul 25, 2016

PetarKirov Jul 26, 2016

9il Jul 26, 2016

PetarKirov Jul 26, 2016

9il Jul 26, 2016

dnadlinger Jul 29, 2016

9il Jul 31, 2016

JackStouffer commented Jul 26, 2016

dnadlinger commented Jul 26, 2016 •

edited

Loading

9il commented Jul 26, 2016

9il commented Jul 26, 2016 •

edited

Loading

9il commented Jul 26, 2016 •

edited

Loading

wilzbach Jul 26, 2016

dnadlinger commented Jul 27, 2016 •

edited

Loading

wilzbach commented Jul 27, 2016

9il commented Jul 27, 2016

9il commented Jul 27, 2016 •

edited

Loading

9il commented Jul 27, 2016

9il commented Aug 3, 2016 •

edited

Loading

9il commented Aug 3, 2016

JackStouffer commented Aug 3, 2016

9il commented Aug 9, 2016 •

edited

Loading

wilzbach commented Aug 23, 2016

jmh530 commented Aug 23, 2016

wilzbach commented Aug 24, 2016

9il commented Aug 24, 2016

9il commented Sep 14, 2016

ndslice.algorithm #4652

ndslice.algorithm #4652

Conversation

9il commented Jul 25, 2016 • edited Loading

Difference with std.algorithm

ndslice.algorithm

ndslice.slice

ndslice.package

See Also

Asked Questions

9il commented Jul 25, 2016

PetarKirov Jul 26, 2016

Choose a reason for hiding this comment

9il Jul 26, 2016

Choose a reason for hiding this comment

PetarKirov Jul 26, 2016

Choose a reason for hiding this comment

9il Jul 26, 2016

Choose a reason for hiding this comment

dnadlinger Jul 29, 2016

Choose a reason for hiding this comment

9il Jul 31, 2016

Choose a reason for hiding this comment

JackStouffer commented Jul 26, 2016

dnadlinger commented Jul 26, 2016 • edited Loading

9il commented Jul 26, 2016

9il commented Jul 26, 2016 • edited Loading

Original matrix

map for 2D case

ndmap for 2D case

9il commented Jul 26, 2016 • edited Loading

wilzbach Jul 26, 2016

Choose a reason for hiding this comment

dnadlinger commented Jul 27, 2016 • edited Loading

wilzbach commented Jul 27, 2016

9il commented Jul 27, 2016

9il commented Jul 27, 2016 • edited Loading

9il commented Jul 27, 2016

9il commented Aug 3, 2016 • edited Loading

9il commented Aug 3, 2016

JackStouffer commented Aug 3, 2016

9il commented Aug 9, 2016 • edited Loading

wilzbach commented Aug 23, 2016

jmh530 commented Aug 23, 2016

wilzbach commented Aug 24, 2016

9il commented Aug 24, 2016

9il commented Sep 14, 2016

9il commented Jul 25, 2016 •

edited

Loading

Difference with `std.algorithm`

dnadlinger commented Jul 26, 2016 •

edited

Loading

9il commented Jul 26, 2016 •

edited

Loading

`map` for 2D case

`ndmap` for 2D case

9il commented Jul 26, 2016 •

edited

Loading

dnadlinger commented Jul 27, 2016 •

edited

Loading

9il commented Jul 27, 2016 •

edited

Loading

9il commented Aug 3, 2016 •

edited

Loading

9il commented Aug 9, 2016 •

edited

Loading