Skip to content

Conversation

@ukoethe
Copy link
Contributor

@ukoethe ukoethe commented Dec 14, 2017

Finally, I was able to finish the PR for tiny arrays. It doesn't yet contain any of the arithmetic or algebraic functions, just the class xtiny and its base classes. Five types of tiny arrays are currently implemented:

  • fixed size, own memory
  • dynamic size, own memory (I integrated most of @wolfv's small_vector here)
  • fixed size, view to external memory (typically via a stored iterator)
  • dynamic size, view to external memory (likewise)
  • dynamic size, derived from xbuffer_adaptor

All types of xtiny support the STL interface, are fully interoperable with each other and can be constructed from std::vector and std::array. I hope you like it.

@wolfv
Copy link
Member

wolfv commented Dec 15, 2017

cool! I'll check it out later, and on Monday.

using base_type::assign;

template <class T>
void assign(std::initializer_list<T> v);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initializer list should only take value_type and not template T, otherwise implicit conversion costs 2ns :) I spent a day wondering about a benchmark and it was that, hehe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it templated on T is important to avoid type conversion warnings in code like this:

xtiny<unsigned, 3> vi{1, 2, 3};    // T = int
xtiny<float, 3> vf{1.0, 2.0, 3.0}; // T = double

However, we can provide an overload with initializer_list<value_type>. Or does the compiler automatically convert the initializer list arguments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that std::vector etc. do not have a templated initializer_list constructor, so I don't think we should have one. And the conversion warnings in this case would be legit, so I think it's fine to just remove it.

In my test, the compiler basically used the initializer_list() overload, and then did the type conversion upon assignment (instead of statically at compile time) and that costs time in the benchmark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping the templated variant would be very unfortunate:

template <class REAL>
void foo(xtiny<REAL, 2> & v)
{
    v.assign({-0.5, 0.5});  // doesn't compile when REAL=float

    v.assign({static_cast<REAL>(-0.5), static_cast<REAL>(0.5)}); // UGLY!!
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's really weird. E.g. this compiles fine with clang:

std::vector<float> a({1.,2.,3.});

Is it a compiler setting or MSVC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I made a mistake. Never mind.

void deallocate();

allocator_type m_allocator;
size_type m_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you have the chance to benchmark this against storing begin/end pointers? At least llvm's small vector stores begin and end pointers, and it might be faster in certain cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't benchmark this specific code. But I've consistently found that loops over integers are as fast (or even faster) as loops over iterators. For example, one would think that

auto iter = v.begin();
for(size_t k=0, k < v.size(); ++k, ++iter)

is slower than

for(auto iter = v.begin(); iter != v.end(); ++iter)

because one increments two variables. But actually the two versions are equally fast, or the first is faster because compilers seem to be better at optimizing integer loops. However, please confirm these statements by independent measurements.

@ukoethe
Copy link
Contributor Author

ukoethe commented Dec 18, 2017

BTW, I think the design can also accommodate a fixed vector:

template <index_t ... M>
class xtiny_impl<index_t, (index_t)sizeof...(M), fixed_impl<M...>>
: public fixed_impl<M...>
, tags::xtiny_tag
{
};

template <index_t ... M>
using xshape_fixed = xtiny<index_t, (index_t)sizeof...(M), fixed_impl<M...>>;


using index_t = std::ptrdiff_t;

static const index_t runtime_size = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think replacing static const with constexpr static would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

Copy link
Member

@wolfv wolfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked through the xtiny.hpp file and added my comments!


template <class V1, index_t N1, class R1, class V2, index_t N2, class R2>
inline bool
operator==(xtiny<V1, N1, R1> const & l,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following functions should be modified such that the loops are replaced with std::equal and instead of re-implementing almost the same functionality replace with a call to == e.g. != -> !operator==(a, b) etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you ever benchmarked if !operator==(a, b) is as fast as the explicit implementation? I haven't, but the negation is clearly an extra operation that the compiler may or may not optimize away.

The same applies to the call std::equal(l.begin(), l.end(), r.begin(), r.end()): Checking if both sequences have equal length requires extra work when std::equal is not explicitly specialized for random access iterators. Moreover, my code version will optimize out this check entirely if the array sizes are known at compile time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait for your benchmarks before taking action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason for the explicit loop in operator==(xtiny, xtiny) is symmetry with operator==(xtiny, scalar). AFAIK, there is no variant of std::equal that compares an array with a scalar, and I wanted the two functions to look similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've uploaded the benchmark here: https://gist.github.com/wolfv/a64526ceb94dd37b4a5f2f5dbcf76779 maybe you have time to check the results on MSVC?

I've modified the benchmark so that it modifies a random index each loop in an otherwise equal array to create some variance in the distance where the arrays are not equal. This incurs a heavy cost though ...

These are the latest results:

--------------------------------------------------------------
Benchmark                       Time           CPU Iterations
--------------------------------------------------------------
bench_compare_eq1/4            77 ns         77 ns    8853946
bench_compare_eq1/8            78 ns         78 ns    8857423
bench_compare_eq1/64           91 ns         91 ns    7479546
bench_compare_eq1/512         183 ns        183 ns    3750081
bench_compare_eq1/1000        273 ns        273 ns    2567153
bench_compare_eq2/4            77 ns         77 ns    8866009
bench_compare_eq2/8            79 ns         79 ns    8777499
bench_compare_eq2/64          112 ns        112 ns    6227843
bench_compare_eq2/512         350 ns        350 ns    2027685
bench_compare_eq2/1000        588 ns        588 ns    1163340

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'll run your benchmarks later today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fyi I found this implementation

auto ita = a.begin();
auto itb = b.begin();
auto resit = result.begin();
auto n = std::distance(ita, a.end());
for (std::size_t i = 0; i < n; ++i)
{
    *resit = (*ita + *itb);
    ++resit;
    ++ita;
    ++itb;
}

on std::arrays to be faster than the integer-access based (gcc 7.2). The speed difference was 3ns vs 1ns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the benchmark results on Windows:

Run on (8 X 1993 MHz CPU s), Visual Studio 2015
SIMD SIZE: 2
----------------------------------------------------------------
Benchmark                         Time           CPU Iterations
----------------------------------------------------------------
bench_compare_eq1/4             507 ns        479 ns    1725827
bench_compare_eq1/8             390 ns        389 ns    1725827
bench_compare_eq1/64            415 ns        413 ns    1547293
bench_compare_eq1/512           829 ns        834 ns     897430
bench_compare_eq1/1000         1506 ns       1502 ns     498572
bench_compare_eq2/4             347 ns        348 ns    1794860
bench_compare_eq2/8             399 ns        392 ns    1869646
bench_compare_eq2/64            544 ns        534 ns    1547293
bench_compare_eq2/512          1204 ns       1189 ns     498572
bench_compare_eq2/1000         1844 ns       1877 ns     373929
bench_compare_fill/4              7 ns          7 ns  112178768
bench_compare_fill/8             10 ns         10 ns   64102153
bench_compare_fill/64            58 ns         57 ns   11217877
bench_compare_fill/512          307 ns        307 ns    2136738
bench_compare_fill/1000         579 ns        577 ns    1000000
bench_compare_fill2/4            14 ns         15 ns   44871507
bench_compare_fill2/8            15 ns         15 ns   44871507
bench_compare_fill2/64           41 ns         41 ns   18696461
bench_compare_fill2/512         273 ns        273 ns    2804469
bench_compare_fill2/1000        508 ns        515 ns    1000000
----------------------------------------------------------------
SIMD SIZE: 4
----------------------------------------------------------------
Benchmark                         Time           CPU Iterations
----------------------------------------------------------------
bench_compare_eq1/4             416 ns        365 ns    1794860
bench_compare_eq1/8             371 ns        368 ns    1950935
bench_compare_eq1/64            407 ns        409 ns    1869646
bench_compare_eq1/512           791 ns        782 ns     897430
bench_compare_eq1/1000         1382 ns       1391 ns     560894
bench_compare_eq2/4             357 ns        352 ns    2039614
bench_compare_eq2/8             366 ns        368 ns    1950935
bench_compare_eq2/64            449 ns        441 ns    1661908
bench_compare_eq2/512          1063 ns       1046 ns     641022
bench_compare_eq2/1000         1638 ns       1606 ns     407923
bench_compare_fill/4              7 ns          7 ns  112178768
bench_compare_fill/8             10 ns         10 ns   74785845
bench_compare_fill/64            56 ns         57 ns   11217877
bench_compare_fill/512          268 ns        269 ns    2492862
bench_compare_fill/1000         511 ns        515 ns    1000000
bench_compare_fill2/4            15 ns         15 ns   44871507
bench_compare_fill2/8            14 ns         14 ns   49857230
bench_compare_fill2/64           37 ns         38 ns   18696461
bench_compare_fill2/512         242 ns        245 ns    2991434
bench_compare_fill2/1000        533 ns        528 ns    1447468

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

template <class T, std::size_t M>
inline
xtiny<V, N, R>::xtiny(std::array<T, M> const & v)
: base_type(v.begin(), v.end())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use cbegin / cend everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's the difference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more expressive. We use cbegin, cend wherever possible across the code base. When we find an instance where it is not the case, we change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

? static_size-1
: runtime_size;
xtiny<value_type, res_size> res(size()-1, dont_init);
for(size_type k=0; k<m; ++k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably be replaced with a combination of std::copy and std::move_backward

Copy link
Contributor Author

@ukoethe ukoethe Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally find one-line loops more readable than calls to standard functions. I also have more trust in the compiler's optimization abilities when size() is known at compile time (but didn't measure recently if there is any difference).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

inline void
xtiny_impl<V, N, R>::assign(IT begin, IT end)
{
std::ignore = end;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice trick! this is to avoid unused parameter warnings, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

xtiny_impl<V, runtime_size, R>::assign(size_type n, const value_type& v)
{
XTENSOR_ASSERT_MSG(n == size(), "xtiny_impl::assign(n, v): size mismatch.");
for(size_type k=0; k<size(); ++k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::fill?

Copy link
Member

@wolfv wolfv Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my benchmarks shows little difference between iterator based std::fill vs. for loop (number 1 is std::fill and number 2 is for loop):

----------------------------------------------------------------
Benchmark                         Time           CPU Iterations
----------------------------------------------------------------
bench_compare_fill/4              2 ns          2 ns  402986315
bench_compare_fill/8              3 ns          3 ns  215037132
bench_compare_fill/64            19 ns         18 ns   36827484
bench_compare_fill/512          141 ns        141 ns    4734883
bench_compare_fill/1000         270 ns        270 ns    2587655
bench_compare_fill2/4             3 ns          3 ns  243319874
bench_compare_fill2/8             3 ns          3 ns  232418915
bench_compare_fill2/64           28 ns         28 ns   24789837
bench_compare_fill2/512         141 ns        141 ns    4891461
bench_compare_fill2/1000        272 ns        271 ns    2550597

Copy link
Contributor Author

@ukoethe ukoethe Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


template <class V, class R>
constexpr inline auto
xtiny_impl<V, runtime_size, R>::shape() const -> index_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the shape function is confusing and unnecessary (at the current iteration of xtiny).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@ukoethe
Copy link
Contributor Author

ukoethe commented Dec 19, 2017

I implemented your suggestions and uploaded a revised version.

{
if(!may_use_uninitialized_memory)
{
for(size_type k=0; k<m_size; ++k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could reuse detail::safe_destroy_deallocate from xstorage here?

Copy link
Contributor Author

@ukoethe ukoethe Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but then found that I'd first like to change safe_init_allocate():

// rename current version
// (*_init_* is a misnomer, because the function doesn't init trivial types)
safe_allocate_no_init(A& alloc, typename A::size_type size)

// add new version with custom initial value
safe_allocate_init(A& alloc, typename A::size_type size, 
                   typename A::value_type const & v =  typename A::value_type())

Then I could use safe_allocate_* and safe_destroy_deallocate symmetrically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked up where safe_init_allocate is called in the current code base, and found this:

pointer tmp = safe_init_allocate(al, rhs.m_size);
if (xtrivially_default_constructible<value_type>::value)
{
    std::uninitialized_copy(rhs.m_data.get(), rhs.m_data.get() + rhs.m_size, tmp);
}
else
{
    std::copy(rhs.m_data.get(), rhs.m_data.get() + rhs.m_size, tmp);
}

This code is awkward, because std::uninitialized_copy works for arbitrary types, not just trivial ones. The above could simply be replaced with:

pointer tmp = al.allocate(rhs.m_size);
std::uninitialized_copy(rhs.m_data.get(), rhs.m_data.get() + rhs.m_size, tmp);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write that code, but maybe @JohanMabille has some insights here?

Copy link
Member

@JohanMabille JohanMabille Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uninitialized_copy calls the constructor, which has already been called by safe_init_allocate when the type is not trivially default constructible (hence the safe in the name). So using uninitialized_copy everywhere may end with a double call to the constructor with all the bad consequences it can have.

Copy link
Contributor Author

@ukoethe ukoethe Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't suggest to use uninitialized_copy everywhere. But in the particular code snippet above, non-trivial objects are needlessly initialized twice, first with placement new, then with copy assignment. In contrast, alloc.allocate(size); followed by uninitialized_copy() always calls placement new exactly once, as it should be.

Copy link
Member

@JohanMabille JohanMabille Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the word "everywhere", I meant in both branches of the "if" of the previous code snippet. I agree that the code can be simplified. We loose the "symmetric" appearance of the code (with the use of safe_destroy_deallocate), but the code is cleaner.

However, I don't understand why you would like to have two functions; you would have to test if the type is trivially default constructible to know which method to call, while the idea in safe_init_allocate is to automatically handle that and default construct the values only if it is needed. The "safe" here is related to the initialization, not the allocation, and it is safe to no initialize trivial types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is safe to not initialize trivial types.

Exactly that's why I thought that a name containing "no" (safe_no_init_allocate or safe_allocate_no_init or allocate_safe_no_init) would more explicitly express that trivial types are safely not initialized 😄 .

why you would like to have two functions

When I free memory with safe_destroy_deallocate (which I would like to do), I also want to allocate it with a corresponding allocate function (general rule: always allocate and free memory with corresponding function pairs). Since this must apply to all supported xtiny constructors, I actually need three allocate variants, one for safe (non-)initialization, value initialization, and copy initialization. In the hypothetical case that someone modifies the allocation/deallocation mechanism in xstorage, all functions in that file will hopefully be changed together, whereas a subsequent change in a seemingly unrelated file like xtiny is easily overlooked. Alternatively, xstorage could explicitly document a guarantee that incompatible changes of the allocation/deallocation mechanism will never occur.

Copy link
Member

@wolfv wolfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found the following statement on stackoverflow regarding storing pointers vs. a pointer + size/capacity:

https://stackoverflow.com/questions/30422205/why-the-libc-stdvector-internally-keeps-three-pointers-instead-of-one-pointe

In essence, storing pointers means optimizing for iterators. As we're trying to follow STL conventions I would say we should go for the same. What do you think?

"xtiny_impl::assign(begin, end): size mismatch.");
for(size_type k=0; k<N; ++k, ++begin)
{
(*this)[k] = static_cast<value_type>(*begin);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should use iterators here, too.
And I am unsure about the static_cast. Is that correct behavior compared with e.g. std::vector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should use iterators here, too.

I don't mind as long as the compiler manages to unroll these loops (or applies equally powerful optimization) when N is known at compile time.

And I am unsure about the static_cast. Is that correct behavior compared with e.g. std::vector?

In principle, std::copy could be called here, but it issues warnings when decltype(*begin) doesn't match value_type. Since xtiny is meant to be usable beyond shapes (e.g. for RGB pixels), mixed-type expressions are not rare.

The warning policies of current compilers w.r.t. automatic type conversion amount to the implicit rule "automatic type conversion is deprecated". From the image processing point of view, I don't agree with this rule, and add explicit casts where I think automatic conversion should be allowed.

xtiny_impl<V, runtime_size, xbuffer_adaptor<CP, O, A>>::assign(size_type n, const value_type& v)
{
std::ignore = n;
XTENSOR_ASSERT_MSG(n == size(), "xtiny_impl::assign(n, v): size mismatch.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assign is supposed to take care of resizing the container (this is the behavior from std::vector).
The following works:

std::vector<double> test;
test.assign({1,2,3,4,5});
assert(test.size() == 5);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but recall that xtiny exists in order to unify the APIs of static and dynamic arrays. Since a static array's assign cannot resize, I had to stick to this behavior for dynamic arrays as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reason, insert() and erase() don't work in-place, but return a new xtiny.

@ukoethe
Copy link
Contributor Author

ukoethe commented Dec 20, 2017

In essence, storing pointers means optimizing for iterators. As we're trying to follow STL conventions I would say we should go for the same. What do you think?

VIGRA was initially built on an iterator-based design. After several years, I realized that all data structures people care about support random access, and that users strongly prefer indexing. Here are some reasons:

  • Algorithms in the image processing/analysis literature always use indexing, and having the same convention in the code is a big debugging help.
  • If one works in multiple dimensions, indexing is a lot more readable than iterators. To some extend, this is even true in 1-D.
  • Many image processing algorithms need access patterns that are hard to map onto iterators. Scan-order access is just one of many possibilities. Simple counter-examples include flood filling, curve drawing, and edgel linking.
  • Compilers can produce much faster code when array dimensions are known at compile time. Coding conventions should encourage these optimizations.

This doesn't mean that iterators are not needed. But I think the desire to follow STL conventions is not enough to justify their primacy.

using base_type::rbegin;
constexpr const_reverse_iterator crbegin() const;
reverse_iterator rend();
constexpr const_reverse_iterator rend() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, but I am quite confident it doesn't compile with C++14 (constexpr reverse iterator will be a C++17 feature).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked on all platforms covered in your test suite. (The current red light seems to be due to failing setup code in the gcc 6 environment -- the test doesn't even get to the point of compilation.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. because constexpr variables are probably not covered by the test.

constexpr xtiny<...> test({1,2,3});
constexpr auto it = test.rbegin();

will likely not compile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyways, for the constexpr to have any effect we probably need to make some constructors constexpr as well ...

I accidentally found an interesting example of a hybrid constexpr class today. Might be a source for some inspiration for potential future improvements :)

https://github.com/akrzemi1/Optional/blob/master/optional.hpp#L319-L346

Copy link
Contributor Author

@ukoethe ukoethe Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already fails in the constructor line. I guess, when the constructor goes through, rbegin() will compile as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly defined constexpr ... rbegin() to potentially accommodate your fixed_vector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make some experiments today to see how constexpr and base classes can be combined. Maybe I'll have a proposal ready tonight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I can't get anything nice to work. I'm unsure if it's possible to get a constexpr std::array-like type to work ... the problem is that aggregate initialization with initializer list doesn't work when inheriting (as it's not an aggregate anymore).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to try the fixed_vector adapter I described here: #591 (comment). Then you only need a default constructor.

@SylvainCorlay
Copy link
Member

Hi Ulrich,

Sorry for the radio silence.

We are still 100 percent committed to making xtensor the perfect library for VIGRA. We had another look at the PR today, and we came to the conclusion that we should probably not move forward with integrating tiny_array in xtensor.

It boils down to the design decisions that were necessary to reconcile dynamic and static sizes in a consistent fashion. For example, the push_back , pop_front (as well as other methods) return a copy of the container. We understand the reasons for that choice but we think that it breaks the principle of least surprise. While we agree that some decisions in the STL are questionable (such as size_t), we should stick to the container interface.

Also, it is really unfortunate that you spent so much time on a PR that is ultimately not getting merged. We apologize for that. On the other side, we have not had many third-party contributions proposing changes with so much depth and sophistication. This discussion has been very helpful to figure out performance problems in the library. The idea of stack-allocated dynamic containers for small sizes, and the enabling of arithmetics on shapes also came out of these discussions.

Would you be ok to collaborate on the two other PRs regarding fixed shape and small_vector?

@ukoethe
Copy link
Contributor Author

ukoethe commented Jan 2, 2018

Hi Sylvain,

if this your final decision, I'll probably drop out.

I spent an enormous amount of time attempting to find out what you want via trial and error, apparently without success. I'm still willing to implement concrete and constructive suggestions, but your feedback so far was mainly negative.

It is an acceptable decision that you don't want to break the STL interface for push_back etc. However, you should make a constructive suggestion, e.g. how to rename the copying function variants. As I've repeatedly said, having separate APIs for dynamic and static arrays is exactly the mistake I want to correct in future Vigra versions. If xtensor perpetuates this mistake, it becomes a lot less attractive as a basis for Vigra.

I had many ideas about contributions to xtensor. I've partly considered xtiny also as a test if and how collaboration with you was possible. It's sad that it didn't work out so far.

@ukoethe
Copy link
Contributor Author

ukoethe commented Jan 3, 2018

How about renaming

insert => inserted
erase => erased
push_back => appended
push_front => prepended

@DerThorsten
Copy link
Contributor

I ageee that push_back that returns a copy is surprising, but I cannot see any problem with the renaming.
I am also a fan of the unified API for dynamic/non-dynamic!

@wolfv
Copy link
Member

wolfv commented Jan 8, 2018

How about I spend the day tomorrow using your PR and making sure it works well as a template parameter for the shape type?
So for the case of vigra, you could just define DEFAULT_SHAPE_CONTAINER as tiny_array and have everything ready to go?

@stuarteberg
Copy link
Contributor

Sorry to "butt in" to this thread, but FWIW: I think signed shape types are very nice (perhaps even a necessity) for ND image processing, for the same reasons that Ullrich has mentioned elsewhere. Also, a unified API for dynamic and non-dynamic shapes would also make xtensor more approachable, I think.

Disclaimer: Perhaps I'm biased -- I'm a VIGRA user and occasional collaborator with both Ullrich and Thorsten.

That said, I'm cheering from the sidelines for the possibility that xtensor can become the basis for future VIGRA development, even if it requires some compromises.

@ukoethe
Copy link
Contributor Author

ukoethe commented Jan 9, 2018

@wolfv sounds good. I'm confident that it will work because I already tried this in my very first PR #374, and xtiny has only improved in the meantime 😄

@wolfv
Copy link
Member

wolfv commented Jan 9, 2018

Hi @ukoethe,

so I did some experiments (very basic ones), and all that was necessary is adding some free function overloads for xtiny types:

#include "xtensor/xio.hpp"
#include "xtensor/xarray.hpp"
#include "xtensor/xtiny.hpp"

using namespace xt;

template <class T, layout_type L = layout_type::row_major>
using xspecial = xarray_container<xt::uvector<T>, L, xtiny<std::size_t>>;

namespace xt
{
    template <class T, ptrdiff_t I>
    inline bool resize_container(xtiny<T, I>& v, typename xtiny<T, I>::size_type s)
    {
        return I == s;
    }

    template <class T>
    inline bool resize_container(xtiny<T, -1>& v, typename xtiny<T, -1>::size_type s)
    {
        v.resize(s);
        return true;
    }

    template <class I, ptrdiff_t L, class... S>
    struct xview_shape_type<xtiny<I, L>, S...>
    {
        using type = xtiny<I, L - integral_count<S...>() + newaxis_count<S...>()>;
    };

    template <class I, class... S>
    struct xview_shape_type<xtiny<I, -1>, S...>
    {
        using type = xtiny<I, -1>;
    };
}

template <class T, std::size_t I, layout_type L = layout_type::row_major>
using xspecialfixed = xarray_container<xt::uvector<T>, L, xtiny<std::size_t, I>>;

int main()
{
    xspecial<double> a = {{1,2,3}, {4,5,6}};
    std::cout << a << std::endl;

    a.reshape({6});
    std::cout << a << std::endl;

    xspecialfixed<double, 2> b = {{1,2,3}, {4,5,6}};
    std::cout << b << std::endl;

    b.reshape({3, 2});
    std::cout << b << std::endl;

    std::cout << "\n\nPrinting shapes: " << std::endl;
    std::cout << b.shape() << std::endl;
    std::cout << a.shape() << std::endl;
}

And it seems to work beautifully! :)

@wolfv
Copy link
Member

wolfv commented Jan 9, 2018

Functions also work, however, I think promote_shape_type needs to be overloaded for xtiny to register positive values as std::array replacements.

@SylvainCorlay
Copy link
Member

Quick note on promote_shape. Xtiny-based shapes will probably need several specializations for all the cases to be covered.

@ukoethe
Copy link
Contributor Author

ukoethe commented Jan 9, 2018

Xtiny-based shapes will probably need several specializations of promote_shape for all the cases to be covered.

Are these specializations supposed to go into xtiny.hpp?

@wolfv
Copy link
Member

wolfv commented Jan 9, 2018

I think so. Or, if you would want to seperate xtiny, and xtensor<-> xtiny specific stuff, maybe adding another file xtensor_xtiny_compat.hpp would be good?

There is a chance we could also do some simplifications regarding the interface. Using decltype and detecting members we could for example add an overload which detects a constexpr static std::size_t static_size member of an statically dimensioned data-structure and then reuse that across all the static arrays (that we "control") and have only one other overload for std::array. (Did that make sense? We use the void_t trick to detect aligned allocators at compile time in xsimd here:

https://github.com/QuantStack/xsimd/blob/d62b356a219f7ec6eafefb80acb0f716ae045eca/include/xsimd/memory/xsimd_alignment.hpp#L58-L80

@SylvainCorlay SylvainCorlay mentioned this pull request Feb 16, 2018
@SylvainCorlay
Copy link
Member

Now we do have

  • statically-shaped tensors.
  • stack-allocated sequence use instead of std::vector for shape types. (svector).

The svector should eventually be used instead of uvector for the storage type of most tensor container.

@ukoethe
Copy link
Contributor Author

ukoethe commented Feb 16, 2018

Thanks for considering the PR. I guess our discussions did have some influence on your design of statically-shaped tensors and svector, so it was not all in vain. I'll look at what I can do with off-the-shelf xtensor and your suggested xspecial typedef.

@wolfv
Copy link
Member

wolfv commented Feb 19, 2018

Hi Ullrich, of course it influenced a lot and got us thinking about this optimization and, as you know, we looked at your PR a lot for inspiration. I also mentioned that in this blog post: https://medium.com/@wolfv/faster-xtensor-87fcbe7f5293

We still need to add the operations on shapes using xadaptors.
I hope we can keep you on board!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants