added xtiny and tests #591

ukoethe · 2017-12-14T20:53:44Z

Finally, I was able to finish the PR for tiny arrays. It doesn't yet contain any of the arithmetic or algebraic functions, just the class xtiny and its base classes. Five types of tiny arrays are currently implemented:

fixed size, own memory
dynamic size, own memory (I integrated most of @wolfv's small_vector here)
fixed size, view to external memory (typically via a stored iterator)
dynamic size, view to external memory (likewise)
dynamic size, derived from xbuffer_adaptor

All types of xtiny support the STL interface, are fully interoperable with each other and can be constructed from std::vector and std::array. I hope you like it.

wolfv · 2017-12-15T14:19:32Z

cool! I'll check it out later, and on Monday.

wolfv · 2017-12-18T08:59:57Z

include/xtensor/xtiny.hpp

+        using base_type::assign;
+
+        template <class T>
+        void assign(std::initializer_list<T> v);


The initializer list should only take value_type and not template T, otherwise implicit conversion costs 2ns :) I spent a day wondering about a benchmark and it was that, hehe.

Having it templated on T is important to avoid type conversion warnings in code like this:

xtiny<unsigned, 3> vi{1, 2, 3}; // T = int xtiny<float, 3> vf{1.0, 2.0, 3.0}; // T = double

However, we can provide an overload with initializer_list<value_type>. Or does the compiler automatically convert the initializer list arguments?

I'm pretty sure that std::vector etc. do not have a templated initializer_list constructor, so I don't think we should have one. And the conversion warnings in this case would be legit, so I think it's fine to just remove it.

In my test, the compiler basically used the initializer_list() overload, and then did the type conversion upon assignment (instead of statically at compile time) and that costs time in the benchmark.

Dropping the templated variant would be very unfortunate:

template <class REAL> void foo(xtiny<REAL, 2> & v) { v.assign({-0.5, 0.5}); // doesn't compile when REAL=float v.assign({static_cast<REAL>(-0.5), static_cast<REAL>(0.5)}); // UGLY!! }

that's really weird. E.g. this compiles fine with clang:

std::vector<float> a({1.,2.,3.});

Is it a compiler setting or MSVC?

Ah, I made a mistake. Never mind.

wolfv · 2017-12-18T09:03:48Z

include/xtensor/xtiny.hpp

+        void deallocate();
+
+        allocator_type m_allocator;
+        size_type m_size;


did you have the chance to benchmark this against storing begin/end pointers? At least llvm's small vector stores begin and end pointers, and it might be faster in certain cases.

I didn't benchmark this specific code. But I've consistently found that loops over integers are as fast (or even faster) as loops over iterators. For example, one would think that

auto iter = v.begin(); for(size_t k=0, k < v.size(); ++k, ++iter)

is slower than

for(auto iter = v.begin(); iter != v.end(); ++iter)

because one increments two variables. But actually the two versions are equally fast, or the first is faster because compilers seem to be better at optimizing integer loops. However, please confirm these statements by independent measurements.

ukoethe · 2017-12-18T09:30:21Z

BTW, I think the design can also accommodate a fixed vector:

template <index_t ... M>
class xtiny_impl<index_t, (index_t)sizeof...(M), fixed_impl<M...>>
: public fixed_impl<M...>
, tags::xtiny_tag
{
};

template <index_t ... M>
using xshape_fixed = xtiny<index_t, (index_t)sizeof...(M), fixed_impl<M...>>;

wolfv · 2017-12-18T09:30:25Z

include/xtensor/xtiny.hpp

+
+    using index_t = std::ptrdiff_t;
+
+    static const index_t runtime_size  = -1;


I think replacing static const with constexpr static would be nice.

wolfv

I've looked through the xtiny.hpp file and added my comments!

wolfv · 2017-12-18T09:33:25Z

include/xtensor/xtiny.hpp

+
+    template <class V1, index_t N1, class R1, class V2, index_t N2, class R2>
+    inline bool
+    operator==(xtiny<V1, N1, R1> const & l,


I think the following functions should be modified such that the loops are replaced with std::equal and instead of re-implementing almost the same functionality replace with a call to == e.g. != -> !operator==(a, b) etc.

Have you ever benchmarked if !operator==(a, b) is as fast as the explicit implementation? I haven't, but the negation is clearly an extra operation that the compiler may or may not optimize away.

The same applies to the call std::equal(l.begin(), l.end(), r.begin(), r.end()): Checking if both sequences have equal length requires extra work when std::equal is not explicitly specialized for random access iterators. Moreover, my code version will optimize out this check entirely if the array sizes are known at compile time.

We have this version in xstorage (uvector).

https://github.com/QuantStack/xtensor/blob/6d6aebbc4f71d436881cf7f82178842a3354c600/include/xtensor/xstorage.hpp#L485-L495

I can do benchmarks tomorrow.

I'll wait for your benchmarks before taking action.

Another reason for the explicit loop in operator==(xtiny, xtiny) is symmetry with operator==(xtiny, scalar). AFAIK, there is no variant of std::equal that compares an array with a scalar, and I wanted the two functions to look similar.

I've uploaded the benchmark here: https://gist.github.com/wolfv/a64526ceb94dd37b4a5f2f5dbcf76779 maybe you have time to check the results on MSVC?

I've modified the benchmark so that it modifies a random index each loop in an otherwise equal array to create some variance in the distance where the arrays are not equal. This incurs a heavy cost though ...

These are the latest results:

-------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- bench_compare_eq1/4 77 ns 77 ns 8853946 bench_compare_eq1/8 78 ns 78 ns 8857423 bench_compare_eq1/64 91 ns 91 ns 7479546 bench_compare_eq1/512 183 ns 183 ns 3750081 bench_compare_eq1/1000 273 ns 273 ns 2567153 bench_compare_eq2/4 77 ns 77 ns 8866009 bench_compare_eq2/8 79 ns 79 ns 8777499 bench_compare_eq2/64 112 ns 112 ns 6227843 bench_compare_eq2/512 350 ns 350 ns 2027685 bench_compare_eq2/1000 588 ns 588 ns 1163340

Great! I'll run your benchmarks later today.

Just fyi I found this implementation

auto ita = a.begin(); auto itb = b.begin(); auto resit = result.begin(); auto n = std::distance(ita, a.end()); for (std::size_t i = 0; i < n; ++i) { *resit = (*ita + *itb); ++resit; ++ita; ++itb; }

on std::arrays to be faster than the integer-access based (gcc 7.2). The speed difference was 3ns vs 1ns.

Here are the benchmark results on Windows:

Run on (8 X 1993 MHz CPU s), Visual Studio 2015 SIMD SIZE: 2 ---------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------- bench_compare_eq1/4 507 ns 479 ns 1725827 bench_compare_eq1/8 390 ns 389 ns 1725827 bench_compare_eq1/64 415 ns 413 ns 1547293 bench_compare_eq1/512 829 ns 834 ns 897430 bench_compare_eq1/1000 1506 ns 1502 ns 498572 bench_compare_eq2/4 347 ns 348 ns 1794860 bench_compare_eq2/8 399 ns 392 ns 1869646 bench_compare_eq2/64 544 ns 534 ns 1547293 bench_compare_eq2/512 1204 ns 1189 ns 498572 bench_compare_eq2/1000 1844 ns 1877 ns 373929 bench_compare_fill/4 7 ns 7 ns 112178768 bench_compare_fill/8 10 ns 10 ns 64102153 bench_compare_fill/64 58 ns 57 ns 11217877 bench_compare_fill/512 307 ns 307 ns 2136738 bench_compare_fill/1000 579 ns 577 ns 1000000 bench_compare_fill2/4 14 ns 15 ns 44871507 bench_compare_fill2/8 15 ns 15 ns 44871507 bench_compare_fill2/64 41 ns 41 ns 18696461 bench_compare_fill2/512 273 ns 273 ns 2804469 bench_compare_fill2/1000 508 ns 515 ns 1000000 ---------------------------------------------------------------- SIMD SIZE: 4 ---------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------- bench_compare_eq1/4 416 ns 365 ns 1794860 bench_compare_eq1/8 371 ns 368 ns 1950935 bench_compare_eq1/64 407 ns 409 ns 1869646 bench_compare_eq1/512 791 ns 782 ns 897430 bench_compare_eq1/1000 1382 ns 1391 ns 560894 bench_compare_eq2/4 357 ns 352 ns 2039614 bench_compare_eq2/8 366 ns 368 ns 1950935 bench_compare_eq2/64 449 ns 441 ns 1661908 bench_compare_eq2/512 1063 ns 1046 ns 641022 bench_compare_eq2/1000 1638 ns 1606 ns 407923 bench_compare_fill/4 7 ns 7 ns 112178768 bench_compare_fill/8 10 ns 10 ns 74785845 bench_compare_fill/64 56 ns 57 ns 11217877 bench_compare_fill/512 268 ns 269 ns 2492862 bench_compare_fill/1000 511 ns 515 ns 1000000 bench_compare_fill2/4 15 ns 15 ns 44871507 bench_compare_fill2/8 14 ns 14 ns 49857230 bench_compare_fill2/64 37 ns 38 ns 18696461 bench_compare_fill2/512 242 ns 245 ns 2991434 bench_compare_fill2/1000 533 ns 528 ns 1447468

wolfv · 2017-12-18T09:34:52Z

include/xtensor/xtiny.hpp

+    template <class T, std::size_t M>
+    inline
+    xtiny<V, N, R>::xtiny(std::array<T, M> const & v)
+    : base_type(v.begin(), v.end())


maybe use cbegin / cend everywhere?

Where's the difference?

It is more expressive. We use cbegin, cend wherever possible across the code base. When we find an instance where it is not the case, we change it.

wolfv · 2017-12-18T09:40:21Z

include/xtensor/xtiny.hpp

+                                            ? static_size-1
+                                            : runtime_size;
+        xtiny<value_type, res_size> res(size()-1, dont_init);
+        for(size_type k=0; k<m; ++k)


Could probably be replaced with a combination of std::copy and std::move_backward

I generally find one-line loops more readable than calls to standard functions. I also have more trust in the compiler's optimization abilities when size() is known at compile time (but didn't measure recently if there is any difference).

wolfv · 2017-12-18T09:42:39Z

include/xtensor/xtiny.hpp

+    inline void
+    xtiny_impl<V, N, R>::assign(IT begin, IT end)
+    {
+        std::ignore = end;


nice trick! this is to avoid unused parameter warnings, right?

wolfv · 2017-12-18T09:43:06Z

include/xtensor/xtiny.hpp

+    xtiny_impl<V, runtime_size, R>::assign(size_type n, const value_type& v)
+    {
+        XTENSOR_ASSERT_MSG(n == size(), "xtiny_impl::assign(n, v): size mismatch.");
+        for(size_type k=0; k<size(); ++k)


use std::fill?

my benchmarks shows little difference between iterator based std::fill vs. for loop (number 1 is std::fill and number 2 is for loop):

---------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------- bench_compare_fill/4 2 ns 2 ns 402986315 bench_compare_fill/8 3 ns 3 ns 215037132 bench_compare_fill/64 19 ns 18 ns 36827484 bench_compare_fill/512 141 ns 141 ns 4734883 bench_compare_fill/1000 270 ns 270 ns 2587655 bench_compare_fill2/4 3 ns 3 ns 243319874 bench_compare_fill2/8 3 ns 3 ns 232418915 bench_compare_fill2/64 28 ns 28 ns 24789837 bench_compare_fill2/512 141 ns 141 ns 4891461 bench_compare_fill2/1000 272 ns 271 ns 2550597

wolfv · 2017-12-18T09:44:17Z

include/xtensor/xtiny.hpp

+
+    template <class V, class R>
+    constexpr inline auto
+    xtiny_impl<V, runtime_size, R>::shape() const -> index_type


I think the shape function is confusing and unnecessary (at the current iteration of xtiny).

ukoethe · 2017-12-19T18:26:21Z

I implemented your suggestions and uploaded a revised version.

wolfv · 2017-12-20T13:34:55Z

include/xtensor/xtiny.hpp

+        {
+            if(!may_use_uninitialized_memory)
+            {
+                for(size_type k=0; k<m_size; ++k)


maybe we could reuse detail::safe_destroy_deallocate from xstorage here?

I considered this, but then found that I'd first like to change safe_init_allocate():

// rename current version // (*_init_* is a misnomer, because the function doesn't init trivial types) safe_allocate_no_init(A& alloc, typename A::size_type size) // add new version with custom initial value safe_allocate_init(A& alloc, typename A::size_type size, typename A::value_type const & v = typename A::value_type())

Then I could use safe_allocate_* and safe_destroy_deallocate symmetrically.

I looked up where safe_init_allocate is called in the current code base, and found this:

pointer tmp = safe_init_allocate(al, rhs.m_size); if (xtrivially_default_constructible<value_type>::value) { std::uninitialized_copy(rhs.m_data.get(), rhs.m_data.get() + rhs.m_size, tmp); } else { std::copy(rhs.m_data.get(), rhs.m_data.get() + rhs.m_size, tmp); }

This code is awkward, because std::uninitialized_copy works for arbitrary types, not just trivial ones. The above could simply be replaced with:

pointer tmp = al.allocate(rhs.m_size); std::uninitialized_copy(rhs.m_data.get(), rhs.m_data.get() + rhs.m_size, tmp);

I didn't write that code, but maybe @JohanMabille has some insights here?

uninitialized_copy calls the constructor, which has already been called by safe_init_allocate when the type is not trivially default constructible (hence the safe in the name). So using uninitialized_copy everywhere may end with a double call to the constructor with all the bad consequences it can have.

I don't suggest to use uninitialized_copy everywhere. But in the particular code snippet above, non-trivial objects are needlessly initialized twice, first with placement new, then with copy assignment. In contrast, alloc.allocate(size); followed by uninitialized_copy() always calls placement new exactly once, as it should be.

Sorry for the word "everywhere", I meant in both branches of the "if" of the previous code snippet. I agree that the code can be simplified. We loose the "symmetric" appearance of the code (with the use of safe_destroy_deallocate), but the code is cleaner.

However, I don't understand why you would like to have two functions; you would have to test if the type is trivially default constructible to know which method to call, while the idea in safe_init_allocate is to automatically handle that and default construct the values only if it is needed. The "safe" here is related to the initialization, not the allocation, and it is safe to no initialize trivial types.

it is safe to not initialize trivial types.

Exactly that's why I thought that a name containing "no" (safe_no_init_allocate or safe_allocate_no_init or allocate_safe_no_init) would more explicitly express that trivial types are safely not initialized 😄 .

why you would like to have two functions

When I free memory with safe_destroy_deallocate (which I would like to do), I also want to allocate it with a corresponding allocate function (general rule: always allocate and free memory with corresponding function pairs). Since this must apply to all supported xtiny constructors, I actually need three allocate variants, one for safe (non-)initialization, value initialization, and copy initialization. In the hypothetical case that someone modifies the allocation/deallocation mechanism in xstorage, all functions in that file will hopefully be changed together, whereas a subsequent change in a seemingly unrelated file like xtiny is easily overlooked. Alternatively, xstorage could explicitly document a guarantee that incompatible changes of the allocation/deallocation mechanism will never occur.

wolfv

I've found the following statement on stackoverflow regarding storing pointers vs. a pointer + size/capacity:

https://stackoverflow.com/questions/30422205/why-the-libc-stdvector-internally-keeps-three-pointers-instead-of-one-pointe

In essence, storing pointers means optimizing for iterators. As we're trying to follow STL conventions I would say we should go for the same. What do you think?

wolfv · 2017-12-20T13:42:25Z

include/xtensor/xtiny.hpp

+            "xtiny_impl::assign(begin, end): size mismatch.");
+        for(size_type k=0; k<N; ++k, ++begin)
+        {
+            (*this)[k] = static_cast<value_type>(*begin);


i think we should use iterators here, too.
And I am unsure about the static_cast. Is that correct behavior compared with e.g. std::vector?

i think we should use iterators here, too.

I don't mind as long as the compiler manages to unroll these loops (or applies equally powerful optimization) when N is known at compile time.

And I am unsure about the static_cast. Is that correct behavior compared with e.g. std::vector?

In principle, std::copy could be called here, but it issues warnings when decltype(*begin) doesn't match value_type. Since xtiny is meant to be usable beyond shapes (e.g. for RGB pixels), mixed-type expressions are not rare.

The warning policies of current compilers w.r.t. automatic type conversion amount to the implicit rule "automatic type conversion is deprecated". From the image processing point of view, I don't agree with this rule, and add explicit casts where I think automatic conversion should be allowed.

wolfv · 2017-12-20T13:47:14Z

include/xtensor/xtiny.hpp

+    xtiny_impl<V, runtime_size, xbuffer_adaptor<CP, O, A>>::assign(size_type n, const value_type& v)
+    {
+        std::ignore = n;
+        XTENSOR_ASSERT_MSG(n == size(), "xtiny_impl::assign(n, v): size mismatch.");


I think assign is supposed to take care of resizing the container (this is the behavior from std::vector).
The following works:

std::vector<double> test; test.assign({1,2,3,4,5}); assert(test.size() == 5);

Yes, but recall that xtiny exists in order to unify the APIs of static and dynamic arrays. Since a static array's assign cannot resize, I had to stick to this behavior for dynamic arrays as well.

For the same reason, insert() and erase() don't work in-place, but return a new xtiny.

ukoethe · 2017-12-20T16:07:42Z

In essence, storing pointers means optimizing for iterators. As we're trying to follow STL conventions I would say we should go for the same. What do you think?

VIGRA was initially built on an iterator-based design. After several years, I realized that all data structures people care about support random access, and that users strongly prefer indexing. Here are some reasons:

Algorithms in the image processing/analysis literature always use indexing, and having the same convention in the code is a big debugging help.
If one works in multiple dimensions, indexing is a lot more readable than iterators. To some extend, this is even true in 1-D.
Many image processing algorithms need access patterns that are hard to map onto iterators. Scan-order access is just one of many possibilities. Simple counter-examples include flood filling, curve drawing, and edgel linking.
Compilers can produce much faster code when array dimensions are known at compile time. Coding conventions should encourage these optimizations.

This doesn't mean that iterators are not needed. But I think the desire to follow STL conventions is not enough to justify their primacy.

wolfv · 2017-12-20T17:35:49Z

include/xtensor/xtiny.hpp

+        using base_type::rbegin;
+        constexpr const_reverse_iterator crbegin() const;
+        reverse_iterator rend();
+        constexpr const_reverse_iterator rend() const;


I tried this, but I am quite confident it doesn't compile with C++14 (constexpr reverse iterator will be a C++17 feature).

It worked on all platforms covered in your test suite. (The current red light seems to be due to failing setup code in the gcc 6 environment -- the test doesn't even get to the point of compilation.)

yep. because constexpr variables are probably not covered by the test.

constexpr xtiny<...> test({1,2,3}); constexpr auto it = test.rbegin();

will likely not compile.

anyways, for the constexpr to have any effect we probably need to make some constructors constexpr as well ...

I accidentally found an interesting example of a hybrid constexpr class today. Might be a source for some inspiration for potential future improvements :)

https://github.com/akrzemi1/Optional/blob/master/optional.hpp#L319-L346

This already fails in the constructor line. I guess, when the constructor goes through, rbegin() will compile as well.

I mainly defined constexpr ... rbegin() to potentially accommodate your fixed_vector.

I'll make some experiments today to see how constexpr and base classes can be combined. Maybe I'll have a proposal ready tonight.

So far I can't get anything nice to work. I'm unsure if it's possible to get a constexpr std::array-like type to work ... the problem is that aggregate initialization with initializer list doesn't work when inheriting (as it's not an aggregate anymore).

I suggest to try the fixed_vector adapter I described here: #591 (comment). Then you only need a default constructor.

SylvainCorlay · 2018-01-02T17:31:13Z

Hi Ulrich,

Sorry for the radio silence.

We are still 100 percent committed to making xtensor the perfect library for VIGRA. We had another look at the PR today, and we came to the conclusion that we should probably not move forward with integrating tiny_array in xtensor.

It boils down to the design decisions that were necessary to reconcile dynamic and static sizes in a consistent fashion. For example, the push_back , pop_front (as well as other methods) return a copy of the container. We understand the reasons for that choice but we think that it breaks the principle of least surprise. While we agree that some decisions in the STL are questionable (such as size_t), we should stick to the container interface.

Also, it is really unfortunate that you spent so much time on a PR that is ultimately not getting merged. We apologize for that. On the other side, we have not had many third-party contributions proposing changes with so much depth and sophistication. This discussion has been very helpful to figure out performance problems in the library. The idea of stack-allocated dynamic containers for small sizes, and the enabling of arithmetics on shapes also came out of these discussions.

Would you be ok to collaborate on the two other PRs regarding fixed shape and small_vector?

ukoethe · 2018-01-02T20:29:23Z

Hi Sylvain,

if this your final decision, I'll probably drop out.

I spent an enormous amount of time attempting to find out what you want via trial and error, apparently without success. I'm still willing to implement concrete and constructive suggestions, but your feedback so far was mainly negative.

It is an acceptable decision that you don't want to break the STL interface for push_back etc. However, you should make a constructive suggestion, e.g. how to rename the copying function variants. As I've repeatedly said, having separate APIs for dynamic and static arrays is exactly the mistake I want to correct in future Vigra versions. If xtensor perpetuates this mistake, it becomes a lot less attractive as a basis for Vigra.

I had many ideas about contributions to xtensor. I've partly considered xtiny also as a test if and how collaboration with you was possible. It's sad that it didn't work out so far.

ukoethe · 2018-01-03T00:36:10Z

How about renaming

insert => inserted
erase => erased
push_back => appended
push_front => prepended

DerThorsten · 2018-01-05T12:34:32Z

I ageee that push_back that returns a copy is surprising, but I cannot see any problem with the renaming.
I am also a fan of the unified API for dynamic/non-dynamic!

wolfv · 2018-01-08T12:38:46Z

How about I spend the day tomorrow using your PR and making sure it works well as a template parameter for the shape type?
So for the case of vigra, you could just define DEFAULT_SHAPE_CONTAINER as tiny_array and have everything ready to go?

stuarteberg · 2018-01-08T19:18:52Z

Sorry to "butt in" to this thread, but FWIW: I think signed shape types are very nice (perhaps even a necessity) for ND image processing, for the same reasons that Ullrich has mentioned elsewhere. Also, a unified API for dynamic and non-dynamic shapes would also make xtensor more approachable, I think.

Disclaimer: Perhaps I'm biased -- I'm a VIGRA user and occasional collaborator with both Ullrich and Thorsten.

That said, I'm cheering from the sidelines for the possibility that xtensor can become the basis for future VIGRA development, even if it requires some compromises.

ukoethe · 2018-01-09T02:29:07Z

@wolfv sounds good. I'm confident that it will work because I already tried this in my very first PR #374, and xtiny has only improved in the meantime 😄

wolfv · 2018-01-09T14:09:36Z

Hi @ukoethe,

so I did some experiments (very basic ones), and all that was necessary is adding some free function overloads for xtiny types:

#include "xtensor/xio.hpp"
#include "xtensor/xarray.hpp"
#include "xtensor/xtiny.hpp"

using namespace xt;

template <class T, layout_type L = layout_type::row_major>
using xspecial = xarray_container<xt::uvector<T>, L, xtiny<std::size_t>>;

namespace xt
{
    template <class T, ptrdiff_t I>
    inline bool resize_container(xtiny<T, I>& v, typename xtiny<T, I>::size_type s)
    {
        return I == s;
    }

    template <class T>
    inline bool resize_container(xtiny<T, -1>& v, typename xtiny<T, -1>::size_type s)
    {
        v.resize(s);
        return true;
    }

    template <class I, ptrdiff_t L, class... S>
    struct xview_shape_type<xtiny<I, L>, S...>
    {
        using type = xtiny<I, L - integral_count<S...>() + newaxis_count<S...>()>;
    };

    template <class I, class... S>
    struct xview_shape_type<xtiny<I, -1>, S...>
    {
        using type = xtiny<I, -1>;
    };
}

template <class T, std::size_t I, layout_type L = layout_type::row_major>
using xspecialfixed = xarray_container<xt::uvector<T>, L, xtiny<std::size_t, I>>;

int main()
{
    xspecial<double> a = {{1,2,3}, {4,5,6}};
    std::cout << a << std::endl;

    a.reshape({6});
    std::cout << a << std::endl;

    xspecialfixed<double, 2> b = {{1,2,3}, {4,5,6}};
    std::cout << b << std::endl;

    b.reshape({3, 2});
    std::cout << b << std::endl;

    std::cout << "\n\nPrinting shapes: " << std::endl;
    std::cout << b.shape() << std::endl;
    std::cout << a.shape() << std::endl;
}

And it seems to work beautifully! :)

wolfv · 2018-01-09T14:14:13Z

Functions also work, however, I think promote_shape_type needs to be overloaded for xtiny to register positive values as std::array replacements.

SylvainCorlay · 2018-01-09T14:22:30Z

Quick note on promote_shape. Xtiny-based shapes will probably need several specializations for all the cases to be covered.

ukoethe · 2018-01-09T14:59:39Z

Xtiny-based shapes will probably need several specializations of promote_shape for all the cases to be covered.

Are these specializations supposed to go into xtiny.hpp?

wolfv · 2018-01-09T15:18:55Z

I think so. Or, if you would want to seperate xtiny, and xtensor<-> xtiny specific stuff, maybe adding another file xtensor_xtiny_compat.hpp would be good?

There is a chance we could also do some simplifications regarding the interface. Using decltype and detecting members we could for example add an overload which detects a constexpr static std::size_t static_size member of an statically dimensioned data-structure and then reuse that across all the static arrays (that we "control") and have only one other overload for std::array. (Did that make sense? We use the void_t trick to detect aligned allocators at compile time in xsimd here:

https://github.com/QuantStack/xsimd/blob/d62b356a219f7ec6eafefb80acb0f716ae045eca/include/xsimd/memory/xsimd_alignment.hpp#L58-L80

SylvainCorlay · 2018-02-16T14:46:27Z

Now we do have

statically-shaped tensors.
stack-allocated sequence use instead of std::vector for shape types. (svector).

The svector should eventually be used instead of uvector for the storage type of most tensor container.

ukoethe · 2018-02-16T19:13:51Z

Thanks for considering the PR. I guess our discussions did have some influence on your design of statically-shaped tensors and svector, so it was not all in vain. I'll look at what I can do with off-the-shelf xtensor and your suggested xspecial typedef.

wolfv · 2018-02-19T13:09:21Z

Hi Ullrich, of course it influenced a lot and got us thinking about this optimization and, as you know, we looked at your PR a lot for inspiration. I also mentioned that in this blog post: https://medium.com/@wolfv/faster-xtensor-87fcbe7f5293

We still need to add the operations on shapes using xadaptors.
I hope we can keep you on board!

added xtiny and tests

7252b2d

ukoethe added 3 commits December 15, 2017 16:51

improved xtiny tests

86c6247

minor fixes in xtiny

a9354d0

gcc compatibility fixes

9718594

wolfv reviewed Dec 18, 2017

View reviewed changes

implemented suggestions by @wolfv

e8c4672

wolfv reviewed Dec 20, 2017

View reviewed changes

SylvainCorlay mentioned this pull request Feb 16, 2018

WIP: tiny_array #577

Closed

SylvainCorlay closed this Feb 16, 2018


		using index_t = std::ptrdiff_t;

		static const index_t runtime_size = -1;

added xtiny and tests #591

added xtiny and tests #591

Uh oh!

Conversation

ukoethe commented Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfv commented Dec 15, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ukoethe commented Dec 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wolfv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ukoethe Dec 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wolfv Dec 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ukoethe commented Dec 14, 2017 •

edited

Loading

ukoethe Dec 18, 2017 •

edited

Loading

wolfv Dec 19, 2017 •

edited

Loading

ukoethe Dec 19, 2017 •

edited

Loading

ukoethe Dec 20, 2017 •

edited

Loading

JohanMabille Dec 20, 2017 •

edited

Loading

ukoethe Dec 20, 2017 •

edited

Loading

JohanMabille Dec 20, 2017 •

edited

Loading