Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndslice.algorithm #4652

Closed
wants to merge 15 commits into from
Closed

ndslice.algorithm #4652

wants to merge 15 commits into from

Conversation

9il
Copy link
Member

@9il 9il commented Jul 25, 2016

PR is available in Mir >=v0.16.0-alpha7
Benchmarks: https://github.com/libmir/mir/tree/master/benchmarks/ndslice
Docs: http://docs.mir.dlang.io/latest/mir_ndslice_algorithm.html

Difference with std.algorithm

  1. It supports multidimensional iteration and map (ndMap) out of the box
  2. It supports multidimensional zip analog - assumeSameStructure, which is faster
  3. It allows to iterate efficiently multiple tensors without zip/assumeSameStructure
  4. It allows to use vectorization if the last stride == 1 (two versions of algorithms with RT check).
  5. It allows to select triangular part of the data or half of the data

ndslice.algorithm

  • ndMap (multidimensional map type of Slice(N, xxx))
  • ndFold (single tensor, multiple functions and seeds)
  • ndReduce (single seed, multiple tensors)
    • Add vectorization and fastmath flags
  • ndEach (no seed, multiple tensors)
    • Add vectorization and fastmath flags
  • ndAll (multiple tensors)
  • ndAny (multiple tensors)
  • ndFind (multiple tensors) [reworked]
  • ndCmp (multidimensional propagation)
  • ndEqual (multiple tensors)

ndCanFindNeedle and ndFindNeedle was removed.

ndslice.slice

  • optimize memory allocations using ndFold.
  • add opCmp using ndCmp

ndslice.package

  • Update example

See Also

ldc-developers/ldc#1669

Asked Questions

What if I want to implement my own algorithm that is not part of Phobos? It seems like I would want to be able to use LikePtr in such a situation to interface seamlessly with the rest of ndslice.

You can use random access range interface and sliced function. @LikePtr is for optimisation reasons. It may be exposed in the future, but I need to write a proper documentation of how it should be used first. The number of use cases for @LikePtr is very small, and all of them seems should be a part of the package.

To come back to the bigger picture: Do we really need nd versions of all "scalar" functions like each, all, any and so on?

Yes, but not all "scalar" functions should be adopted. Current list of functions is enough.
For example:

  1. We do not need special sorting algorithms, because we need flattened slice anyway. If a slice is not flattened user may use byElement to get random access range of all elements or make a copy/index.
  2. We don't need ndSwap(a, b) because we can just do ndEach!swap(a, b).
  3. We don't need ndCount because it can be easily implemented with ndReduce.

I understand why map might be a bit non-trivial – it needs to preserve the input structure in the output. But wouldn't it be much more powerful and DRYer to offer a "flat" range interface for the operations for which structure is irrelevant? For example, in pseudocode ndAll!fun(a, b) == all!fun(zip(a.flattened, b.flattened)) (discounting for tuple expansion of the fun parameters). I suppose something like that probably even exists in ndslice already.

Yes, for performance reasons. flattened is called byElement, it is selector operator.

I'm sure this has crossed your mind while designing the functions – why did you opt to go for the special ndslice variants instead? Potentially using the extra structural information for extra performance? (If it only comes down to auto-vectorization, that shouldn't really matter, though.) You could also think about adding annotation forwarding support of some kind (extra members, UDAs, …) to the length-preserving std.algorithm functions to propagate along the structure information.

I add this module because performance reasons. N is dimension count:

  1. byElement requires additional computations :
    • One more add operations for forward access (empty, front, popFront).
    • N-1 div/mod pairs plus 2*N-1 more add for random and backward access. div/mod are expensive.
  2. byElement requires N+1 additional general purpose registers. This is bad for iterating few matrixes in the lockstep.
  3. ndX implementations has multidimension do-while loop optimisation.

ndMap could also be implemented by reshape ∘ map ∘ reshape, I suppose.

This is true only for Slices, which have continues in-memory representation. Numpy just copies all ndarray, if the ndarray can not be reshaped.

The result of ndMap is a Slice. The result of map is the range, and if we want to use ndslice module with map, we need to sliced again. The real problem is that byElement has slow random access primitive to build a slice on top :

auto lazyMap = sl.byElement.map!fun.sliced(sl.shape);
auto lazyMap = sl.ndMap!fun;

In the same time #4647 provides vectorised operations like +=, this PR also have vectorised algorithms. In addition, Mir will have BLAS soon, see current benchmark. Mir has the same performance for large matrixes as OpenBLAS, this means that Mir's generic gemm SIMD kernels are optimal. For small matrixes I need to write generic SIMD transposition kernels for packing, but this is small effort comparing with gemm SIMD kernels.

Being very much a bystander only—so this might be entirely unfounded—it seems like there is some danger of ndslice bit-by-bit involving into its own little parallel universe. To put it in a slightly more cynical way, if we can't use standard ranges to implement any on a multidimensional array ourselves, we might need to revise our marketing materials.

I don't think so. ndslice extends our universe. We have different types of ranges. ndslice is just the next after random access range:

  1. Any slice is random access range composed of elements with dimension equals to N-1.
  2. byElement is random access range, with fast forward iteration (as much fast as possible with Range API).
  3. Slice can be created on top of any random access range with length. For example DOK format for sparse tensors was implemented in Mir, just creating a tiny wrapper on standard associative arrays.
  4. ndX functions provide optimal multidimensional iteration pattern. We don't need ndSwap for example, because we can just write ndEach!swap. So, please do not think that I am rewriting std.algorithm: current functions provide only optimal loop construction with different stop criteria.

@9il
Copy link
Member Author

9il commented Jul 25, 2016

ping @John-Colvin @wilzbach: multidimensional map is here :-)

@9il 9il changed the title add ndmap [WIP] ndslice.computation Jul 26, 2016
@9il 9il changed the title [WIP] ndslice.computation [WIP] ndslice.computation & ndslice.searching Jul 26, 2016
@@ -2039,7 +2030,8 @@ struct Slice(size_t _N, _Range)
}

static if (doUnittest)
pure nothrow unittest
//pure nothrow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevents this test from being pure and nothrow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I forgot to uncomment :-)

@9il 9il changed the title [WIP] ndslice.computation & ndslice.searching [WIP] ndslice.computation Jul 26, 2016
@@ -2934,7 +2941,7 @@ unittest

private enum isSlicePointer(T) = isPointer!T || is(T : PtrShell!R, R);

private struct LikePtr {}
package struct LikePtr {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this attribute needed? Doesn't capability introspection suffice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clear way and for internal use only. User defined ranges may brake introspection for pointer like behavior

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I want to implement my own algorithm that is not part of Phobos? It seems like I would want to be able to use LikePtr in such a situation to interface seamlessly with the rest of ndslice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klickverbot You can use random access range interface and sliced function. @LikePtr is for optimisation reasons. It may be exposed in the future, but I need to write a proper documentation of how it should be used first. The number of use cases for @LikePtr is very small, and all of them seems should be a part of the package.

@JackStouffer
Copy link
Member

First, is it really nessesary for all of the planned stuff to be in this one PR? You could build up the module PR by PR in order to make this easier to review. Even though it might seem odd to have a module with only one function in it, that would only be temporary and that's fine for std.experimental.

Secondly, I have a slight problem with the names of the planned functions. One of the main benefits of ndslice that I laid out in my article is that it feels like it's a part of the rest of the language because you don't have to use a whole bunch specialized functions in order to get the functionality you want. This is one of my main gripes with numpy: in your code you have numpy code, and then you have Python code, and they don't mix well.

With these planned functions, instead of doing the natural thing someSlice.map!func you have to remember to do someSlice.ndmap!func. This is easily remedied, just use the same names as the respective std.algorithm functions so it feels like you're still using the normal range functions; no special casing required.

@dnadlinger
Copy link
Member

dnadlinger commented Jul 26, 2016

I would also be curious about the rationale behind the ndXXX naming scheme, which, without further context, looks a lot like a design mistake.

@9il
Copy link
Member Author

9il commented Jul 26, 2016

First, is it really nessesary for all of the planned stuff to be in this one PR?

No, only ndslice.computation. The list is just a plan for set of PRs.

Secondly, I have a slight problem with the names of the planned functions. One of the main benefits of ndslice that I laid out in my article is that it feels like it's a part of the rest of the language because you don't have to use a whole bunch specialized functions in order to get the functionality you want. This is one of my main gripes with numpy: in your code you have numpy code, and then you have Python code, and they don't mix well.

With these planned functions, instead of doing the natural thing someSlice.map!func you have to remember to do someSlice.ndmap!func. This is easily remedied, just use the same names as the respective std.algorithm functions so it feels like you're still using the normal range functions; no special casing required.

Both map and ndmap are valid for ndslices, but they are doing different things. ndmap is not an optimised map for Slice, but multidimensional map. You can still use map for slices if you import all ndslice package. std.algorithm provides only single dimension logic. With new module ndslice would be consistent with other Phobos anyway. In NumPy you can not write fast code in python, if this code requires index access. In ndslice, you can do all things in single language. The module provides efficient multidimensional functional style subroutines.

@9il
Copy link
Member Author

9il commented Jul 26, 2016

I would also be curious about the rationale behind the ndXXX naming scheme, which has the smell of a design mistake.

@klickverbot As was wrote for @JackStouffer both map and ndmap are valid:

Original matrix

0 1 2
3 4 5
6 7 8

map for 2D case

The result is random access range.

fun(0 1 2)
fun(3 4 5)
fun(6 7 8)

ndmap for 2D case

The result is 2D slice, all operations on slices can be used.

fun(0) fun(1) fun(2)
fun(3) fun(4) fun(5)
fun(6) fun(7) fun(8)

ndmap is the most convenient name for me, and I would like to avoid Camel case names for basic subroutines if possible. This is much more simpler to keep all logic in mind if a name is like chinese hieroglyph. This is common for math. For example, findRoot is better then fr, but spr2 is better then symmetricPackedRank2Operation. nd prefix is already used in the package name and ndarray function (the same name exists in NumPy).

@9il
Copy link
Member Author

9il commented Jul 26, 2016

The naming logic is:

  • verb + ed - names that change lengths and strides for a slice (iteration module)
  • noun - an alternative type of Slice or iterator for the same data (selection module)
  • nd + noun - lazy computation (computation module)
  • nd + verb - computation (computation module)

Macros:
SUBREF = $(REF_ALTTEXT $(TT $2), $2, std,experimental, ndslice, $1)$(NBSP)
T2=$(TR $(TDNW $(LREF $1)) $(TD $+))
T4=$(TR $(TDNW $(LREF $1)) $(TD $2) $(TD $3) $(TD $4))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this is an unused macro

@dnadlinger
Copy link
Member

dnadlinger commented Jul 27, 2016

To put the following opinions into context, I'm a physicist by trade and rather more familiar with NumPy and Julia than I'd like, having also taught numerical programming classes before. I haven't had a chance to really familiarise myself with your ndslice work yet, though.

First of all, I get that iteration over multidimensional slices is difficult to reconcile with simple ranges, especially for "structure-preserving" operations like map. I was hoping for some clever design that marries slices more closely to the standard range algorithms, but maybe there is just none to be found. A couple of points, though:


How do I map over just some dimensions (e.g. rows or columns in a 2D matrix)?


For example, […] spr2 is better then symmetricPackedRank2Operation.

I'll have to strongly disagree on that. spr2 is a horrible name. It only works because it is part of a certain set of operations that have been around long enough that the people couldn't do better when naming them, and which some people have memorised. I'd much rather see descriptive names for the raw kernels (although your alternative is of course exaggerated), and then a high-level interface that removes the details easily inferred from argument types/count from the names – maybe even converting it into operators altogether.


ndmap is the most convenient name for me, and I would like to avoid Camel case names for basic subroutines if possible.

I'm missing the argument for breaking with Phobos conventions, which provide uniformity for users and hence decreases the cognitive overload, as they can guess the name (CamelCase).

@wilzbach
Copy link
Member

ping @John-Colvin @wilzbach: multidimensional map is here :-)

Looks great - I started to play with it & it feels exactly how I imagined it. Great job!

Both map and ndmap are valid for ndslices, but they are doing different things. ndmap is not an optimised map for Slice, but multidimensional map.

👍 from my side. Maybe you should add that further abstractions can be done with pack and other high-level selection routines.

ndfold (multiple functions and seeds)
ndreduce (multiple arguments, single seed)

This could be very confusing. Can't you have two overloads?

How do I map over just some dimensions (e.g. rows or columns in a 2D matrix)?

You can use packed slices, see my comment to the unittest ;-)

@9il
Copy link
Member Author

9il commented Jul 27, 2016

How do I map over just some dimensions (e.g. rows or columns in a 2D matrix)?

ndslice is much more powerful then NumPy and Julia in case of dimension abstraction. It has zero cost abstraction for tensors composed of tensors (composed of tensors and etc). See the last example for diagonal, (3D, diagonal plain). You can use pack to pack dimensions (tensor of tensors) and transposed, swapped, and evertPack to swap dimension.

@9il
Copy link
Member Author

9il commented Jul 27, 2016

I am open for alternative function names
Please suggest your variants!

@9il
Copy link
Member Author

9il commented Jul 27, 2016

ndfold (multiple functions and seeds)
ndreduce (multiple arguments, single seed)
This could be very confusing. Can't you have two overloads?

The reason that have different argument order:
ndfold - first argument is a slice
ndreduce first argument is a seed

@9il 9il changed the title [WIP] ndslice.computation ndslice.computation Jul 27, 2016
@9il 9il added the @andralex Approval from Andrei is required label Jul 28, 2016
@9il 9il changed the title ndslice.computation [WIP] ndslice.algorithm Jul 28, 2016
@9il 9il removed @andralex Approval from Andrei is required Needs Review labels Jul 28, 2016
@9il
Copy link
Member Author

9il commented Aug 3, 2016

You need a random access range, not a RoR.

@9il
Copy link
Member Author

9il commented Aug 3, 2016

Your range in not flat and it is not random access.

@JackStouffer
Copy link
Member

RoR are not supported of course
Your range in not flat and it is not random access.

Ok, seems like I need to allocate an array. Thanks.

@9il 9il added WIP Work In Progress - not ready for review or pulling and removed WIP Work In Progress - not ready for review or pulling labels Aug 4, 2016
@9il 9il removed the WIP Work In Progress - not ready for review or pulling label Aug 9, 2016
@9il
Copy link
Member Author

9il commented Aug 9, 2016

@klickverbot I have added Select enumeration, which can be used to reverse or transpose data, and can be used for many matrix subroutines, for example to check if a matrix is symmetric. So noit has 5 significant differences comparing with std.algorithm:

  1. It supports multidimensional iteration and map out of the box
  2. It supports zip analog - assumeSameStructure, which is faster
  3. It allows to iterate efficiently multiple tensors without zip
  4. It allows to add vectorization if the last stride == 1 (two versions of algorithms with RT check).
  5. It allows to select triangular part of the data or half of the data

@wilzbach
Copy link
Member

On the subject of naming ndMap vs. map. I fleshed out my point about map with a template constraint a little more in the link below. The way I did it allows both map and ndMap to be called.

I recall there was quite a discussion on the name issue and I think @9il made a strong point for nd, but there was no consensus on it. Could the advocates of map please rise again and make their point?

@jmh530
Copy link
Contributor

jmh530 commented Aug 23, 2016

@wilzbach I think @JackStouffer above made the point originally and most succinctly.

Secondly, I have a slight problem with the names of the planned functions. One of the main benefits of ndslice that I laid out in my article is that it feels like it's a part of the rest of the language because you don't have to use a whole bunch specialized functions in order to get the functionality you want. This is one of my main gripes with numpy: in your code you have numpy code, and then you have Python code, and they don't mix well.

Think about the impact on code refactoring. Want to change from slices to ndslices, now you have to change every place you use an std.algorithm function.

Also, suppose you're writing some generic code that should work on any input range. This naming approach requires template specialization in order to get it work with ndslices.

Personally, I would prefer what is in ndslice.algorithm to eventually be a part of std.algorithm. That way I can just import std.algorithm : map and go to town.

Nevertheless, I think a those favoring naming things ndMap, etc, made a good point with respect to the fact that the behavior of map is different for ndMap. Thus, I think there should be some kind of way to toggle between them (perhaps there's a better way than what I suggested above).

@wilzbach
Copy link
Member

Personally, I would prefer what is in ndslice.algorithm to eventually be a part of std.algorithm. That way I can just import std.algorithm : map and go to town.

I agree, that would be quite sweet, but I think there is still the problem of multi-dimensional mapping of ndslice.map and one-dimensional map in the std.algorithm.iteration.map case. However if as @jmh530 suggested we can make the assumption that a user always wants to use the multidimensional ndslice map for slices, this could work well. For example with pack a user could choose the dimensions he wants to use for n-mapping? In practice this could then look like this:

#!/usr/bin/env dub
/+ dub.sdl:
name "mir_test"
dependency "mir" version="~>0.16.0-beta2"
+/

void main() {
    import mir.ndslice;
    import std.stdio;
    import std.algorithm;

    auto a = iotaSlice(2, 3).ndMap!("a + a");
    writeln(a); // [[1, 2, 3], [4, 5, 6]]

    auto b = iotaSlice(2, 3).pack!1.ndMap!(a => a.maxPos.front); 
    writeln(b); // [2, 5]
}

@9il what do you think about this?

I think I can remember a discussion on a different PR where it was agreed that internally using std.experimental is absolutely okay.

@9il
Copy link
Member Author

9il commented Aug 24, 2016

@9il what do you think about this?

This would not work for all other functions. API can be found here http://docs.mir.dlang.io/latest/mir_ndslice_algorithm.html

@9il
Copy link
Member Author

9il commented Sep 14, 2016

#4781 contains ndMap renamed to mapSlice and located in ndslice.selection. This PR will be closed until DMD has new pragmas. Please help to move forward with new ndslice PRs!

@9il 9il closed this Sep 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@andralex Approval from Andrei is required Blocking Other Work Enhancement
Projects
None yet
8 participants