Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deep_copy of View of non-trivially copyable type leads to byte-wise copy #6986

Open
maartenarnst opened this issue May 3, 2024 · 2 comments
Labels
Question For Kokkos internal and external contributors and users

Comments

@maartenarnst
Copy link
Contributor

Kokkos currently implements the deep_copy of aView as a byte-wise copy under conditions summarized in this comment:

These conditions are made rigorous in the code that follows in Kokkos_CopyViews.hpp by using type traits.

It is notable that these conditions do not guard for the value type of the View to be trivially copyable. This current behavior of deep_copy for non-trivially copyable value types may be surprising. Especially between assignable memory spaces, it may be expected that the copy constructor/copy assignment operator is called.

The behavior of std::copy is different. It carries out the copy as a byte-wise copy only for trivially copyable value types.

If this current behavior is not intended, a solution may consist of leading the case of non-trivially copyable value types to an implementation that uses the ViewCopy functor:

  • template <class ViewTypeA, class ViewTypeB, class Layout, class ExecSpace,
    typename iType>
    struct ViewCopy<ViewTypeA, ViewTypeB, Layout, ExecSpace, 1, iType> {
    ViewTypeA a;
    ViewTypeB b;
    using policy_type = Kokkos::RangePolicy<ExecSpace, Kokkos::IndexType<iType>>;
    using value_type = typename ViewTypeA::value_type;
    ViewCopy(const ViewTypeA& a_, const ViewTypeB& b_,
    const ExecSpace space = ExecSpace())
    : a(a_), b(b_) {
    Kokkos::parallel_for("Kokkos::ViewCopy-1D",
    policy_type(space, 0, a.extent(0)), *this);
    }
    KOKKOS_INLINE_FUNCTION
    void operator()(const iType& i0) const {
    a(i0) = static_cast<value_type>(b(i0));
    };
    };

We have tested this behavior by using a helper class that we can make non trivially copyable and that can count the number of times its special functions are called:

TEST(Copy, vector)
{
    constexpr size_t size = 10;

    //! Constructor
    tester_t::reset();

    const std::vector<tester_t> vector(size);

    EXPECT_EQ(tester_t::getConstructorCount(), size);
   
    //! Copy constructor
    tester_t::reset();

    std::vector<tester_t> vector_copy(vector);

    EXPECT_EQ(tester_t::getConstructorCount()    , 0);
    EXPECT_EQ(tester_t::getCopyConstructorCount(), size);
    EXPECT_EQ(tester_t::getCopyAssignmentCount(),  0);
    
    //! @c std::copy
    tester_t::reset();

    std::copy(vector.cbegin(), vector.cend(), vector_copy.begin());

    EXPECT_EQ(tester_t::getConstructorCount(),     0);
    EXPECT_EQ(tester_t::getCopyConstructorCount(), 0);
    EXPECT_EQ(tester_t::getCopyAssignmentCount(), size);
 }
 
template <typename T>
class CopyViewsTest : public ::testing::Test {};

using CopyViewsTestTypes = ::testing::Types<
    std::pair<Kokkos::DefaultExecutionSpace, Kokkos::DefaultExecutionSpace>
#if defined(KOKKOS_ENABLE_CUDA) || defined(KOKKOS_ENABLE_HIP)
    , std::pair<Kokkos::DefaultHostExecutionSpace, Kokkos::DefaultExecutionSpace>
    , std::pair<Kokkos::DefaultExecutionSpace, Kokkos::DefaultHostExecutionSpace>
#endif
>;

TYPED_TEST_SUITE(CopyViewsTest, CopyViewsTestTypes);

TYPED_TEST(CopyViewsTest, view)
{
    using src_space = typename TypeParam::first_type;
    using dst_space = typename TypeParam::second_type;

    constexpr size_t size = 10;

    //! Constructor
    tester_t::reset();

    const Kokkos::View<tester_t*, src_space> view("view", size);

    EXPECT_EQ(tester_t::getConstructorCount(), size);
    
    tester_t::reset();

    const Kokkos::View<tester_t*, dst_space> view_copy(Kokkos::view_alloc(Kokkos::WithoutInitializing, "view"), size);

    //! @c Kokkos::deep_copy
    Kokkos::deep_copy(view_copy, view);

    EXPECT_EQ(tester_t::getConstructorCount(),     0);
    EXPECT_EQ(tester_t::getCopyConstructorCount(), 0);
    EXPECT_EQ(tester_t::getCopyAssignmentCount(),  0);
}

The code of the helper class, called tester_t in the snippet, is in attachment:

Joint work with @romintomasetti. Issue created after a brief discussion with @dalg24.

@masterleinad
Copy link
Contributor

See #2021.

@crtrott
Copy link
Member

crtrott commented May 3, 2024

Basically we are in somewhat of a bind here. In order to move across memory-space boundaries we may have no option other than to men-cpy but trivially-copyable is too harsh a condition. This is the same issue for which SYCL introduced its device copyable trait you can opt in too. Maybe we should consider doing this for Kokkos 5, i.e. require it. For now I don't want to guarantee different semantic behavior depending on the memory spaces you copy between. So effectively mem-cpy it is. We should document this properly though. There are other similar thorny issues. For example the functors we use must be "device copyable" because we effectively have to move them to some place on the GPU. That move is NOT done via a copy constructor. Furthermore that object is not destructed either.

And while I get the desire to treat this rigorously: practically no one has complained about it in 12 years of Kokkos deployment - and fixing it properly means a bunch of extra boilerplate everyone has to write or significant performance costs.

Here is a sketch for a fix:

  • Introduce Kokkos::mem_copyable which users can specialize or which looks for Foo::kokkos_mem_copyable
  • deep_copy would use assignment unless that trait is true, using a dual copy through device accessible buffer if necessary (i.e. we keep some HostPinnedSpace buffer around or so)
  • Unless the trait is set for a functor (or maybe an opt in as a exec policy trait) we also stage the functor through a host pinned buffer, and keep it alive as necessary (similar to the life time extension of our constant cache copied functors).

All in all I would expect 10%-30% percent slow down for apps which don't do the proper opt-in based on what some of these mechanism cost (e.g. the large functor launch mechanism) and the extra cost in latency.

@ajpowelsnl ajpowelsnl added the Question For Kokkos internal and external contributors and users label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question For Kokkos internal and external contributors and users
Projects
None yet
Development

No branches or pull requests

4 participants