[QST] Composing operators #686

HugoPhibbs · 2024-08-02T02:28:49Z

Hi. I couldn't find anything in the docs about this, but what is the best way to compose operators?

I've found that I get sometimes get indeterminacy based on whether I call run() on an operator immediately, or until its used somewhere down the line in another operator.

Basically, I'm getting results that are similar to this situation:

// A and A_prime are otherwise identical - e.g. same shape, type etc
(A = matx::remap<0>(B, idx).run();
A_prime = matx::remap<0>(B, idx);

// Some code further below has different results depending on whether I use A or A_prime - even though they should be identical,. E.g:
(C = matx::vector_norm(A, {2}, matx::NormOrder::L2)).run()
(C_prime = matx::vector_norm(A_prime, {2}, matx::NormOrder::L2)).run() // Different - produces wrong results

I didn't want to send my full code bc it may be confusing, but if its necessary to debug this, I can provide it.

Thx.

The text was updated successfully, but these errors were encountered:

HugoPhibbs · 2024-08-02T02:39:58Z

I may as well provide all the code.

Its a work in progress haha, so pls excuse it if is contains gross inefficiencies.

Assume k = 2, m = 2000, n = 70,000, d = 784

matx::tensor_t<float, 2>  GsDBSCAN::findDistancesMatX(matx::tensor_t<float, 2> &X_t, matx::tensor_t<int, 2> &A_t, matx::tensor_t<int, 2> &B_t, float alpha) {
    const int k = A_t.Shape()[1] / 2;
    const int m = B_t.Shape()[1];

    const int n = X_t.Shape()[0];
    const int d = X_t.Shape()[1];

    const int batchSize = GsDBSCAN::findDistanceBatchSize(alpha, n, d, k, m); // Around 300

    auto AFlat_t = matx::reshape(A_t, {n * 2 * k});

    auto ABatchFlat_t = matx::make_tensor<int>( {batchSize * 2 * k});
    auto BBatch_t = matx::make_tensor<int>( {ABatchFlat_t.Size(0), m});

    auto XBatch_t = matx::make_tensor<float>( {2*batchSize*k*m, d});
    auto XSubset_t = matx::make_tensor<float>( {batchSize, d});
    auto YBatch_t = matx::make_tensor<float>({batchSize, 2*k*m, d});
    auto distancesBatch_t = matx::make_tensor<float>({batchSize, 2 * k * m});

    auto distances_t = matx::make_tensor<float>({n, 2*k*m});

    for (int i = 0; i < n; i += batchSize) {
        int maxBatchIdx = i + batchSize - 1; // Index within X along the ROWS

        (XSubset_t = matx::slice(X_t, {i, 0}, {maxBatchIdx + 1, matx::matxEnd})).run();
//        XSubset_t = matx::slice(X_t, {i, 0}, {maxBatchIdx + 1, matx::matxEnd});

        (ABatchFlat_t = matx::slice(AFlat_t, {i * 2 * k}, {(maxBatchIdx + 1) * 2 * k})).run();
//        ABatchFlat_t = matx::slice(AFlat_t, {i * 2 * k}, {(maxBatchIdx + 1) * 2 * k});

        (BBatch_t = matx::remap<0>(B_t, ABatchFlat_t)).run();
//        BBatch_t = matx::remap<0>(B_t, ABatchFlat_t);

        auto BBatch_t_flat = matx::flatten(BBatch_t);

        (XBatch_t = matx::remap<0>(X_t, BBatch_t_flat)).run();
//        XBatch_t = matx::remap<0>(X_t, BBatch_t_flat);

        auto XBatchReshaped_t = matx::reshape(XBatch_t, {batchSize, 2*k*m, d});

        auto XSubsetReshaped_t = matx::reshape(XSubset_t, {batchSize, 1, d});

        (YBatch_t = XBatchReshaped_t - matx::repmat(XSubsetReshaped_t, {1, 2*k*m, 1})).run(); // Repmat is a workaround for minusing naively incompatibhle tensor shapes

        (distancesBatch_t = matx::vector_norm(YBatch_t, {2}, matx::NormOrder::L2)).run();

        (matx::slice(distances_t, {i, 0}, {maxBatchIdx + 1, matx::matxEnd}) = distancesBatch_t).run();
    }

    return distances_t;
}

Essentially what I find is that my distances_t tensor varies greatly depending on the run() lines I uncomment. E.g. If I do:

 // (XSubset_t = matx::slice(X_t, {i, 0}, {maxBatchIdx + 1, matx::matxEnd})).run();
 XSubset_t = matx::slice(X_t, {i, 0}, {maxBatchIdx + 1, matx::matxEnd});

Then the new distances_t tensor is different - failing the tests I've written to check it's contents.

FYI these results are different after calling cudaDeviceSynchronize()

luitjens · 2024-08-02T02:58:26Z

This line copies a permutation of B into A based on the indices in idx.

(A = matx::remap<0>(B, idx).run();

This line does nothing since you are not storing the output into an operator.

A_prime = matx::remap<0>(B, idx);

You need to construct the operator and pass it into the next operator to chain these together. That is your line should like something like this:

auto A_primeOp = (A_prime = matx::remap<0>(B, idx));

Now A_primeOp can be ran directly like this:

A_primeOp.run();

Or passed into another operation like this:

(n = norm(A_primeOp)).run();

the advantage of chaining together operators into other operators is that you get fewer kernel calls which generally leads to significantly less launch latency and memory traffic. Each call to .run() indicates a single kernel call.

luitjens · 2024-08-02T03:01:39Z

also I didn't inspect your code closely to see if you have this issue but another possible explanation would be if you were trying to write to indices that other threads read to in the same kernel. Something like this is not allowed:

(A = transpose(A)).run();

The reason is each element of the tensor get's processed by a different thread. This can lead to a race condition between the read of A(i,j) and the write of A(i,j).

HugoPhibbs · 2024-08-02T03:05:40Z

Ok, to make sure I understand, A_prime = matx::remap<0>(B, idx) returns an operator? - that you then need to store? Hence why we need auto A_primeOp = (A_prime = matx::remap<0>(B, idx)); instead?

luitjens · 2024-08-02T03:10:27Z

Right, you store the operator to A_primeOp. At this point nothing has happened yet. Once you call run on the operator it will run. However instead of calling run on the operator you can pass that into another operator which will then return a new operator composed of the earlier operator and the new one. Once you call run on the outermost operator all the inner operators will run in the correct order.

…

On Thu, Aug 1, 2024, 9:06 PM Hugo Phibbs ***@***.***> wrote: Ok, to make sure I understand, A_prime = matx::remap<0>(B, idx) returns an *operator*? - that you then need to store? Hence why we need auto A_primeOp = (A_prime = matx::remap<0>(B, idx)); instead? — Reply to this email directly, view it on GitHub <#686 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABSFS4UELJBIE7P3MZ7GIK3ZPLZRTAVCNFSM6AAAAABL3V2SA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRUGQZDINZTHA> . You are receiving this because you commented.Message ID: ***@***.***>

HugoPhibbs · 2024-08-02T03:12:25Z

Ok thx, for the below:

auto A_primeOp = (A_prime = matx::remap<0>(B, idx));

is A_prime more or less a dummy variable then - assuming that the scope of A_prime ends after using norm?

I.e. what I'm trying to say, is that if you wish to do deferred execution of the operators, do you more or less have to use a dummy variable? - assuming that's what A_prime is?

luitjens · 2024-08-02T03:17:13Z

I'm not sure what you mean by a dummy variable here. A_primeOp is an object with a compiler deduced type which stores all in information required to run the operation.

…

On Thu, Aug 1, 2024, 9:12 PM Hugo Phibbs ***@***.***> wrote: Ok for the below: auto A_primeOp = (A_prime = matx::remap<0>(B, idx)); is A_prime more or less a dummy variable then?~ — Reply to this email directly, view it on GitHub <#686 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABSFS4WK4BCFD54APFAJ723ZPL2K7AVCNFSM6AAAAABL3V2SA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRUGQZTAMZUGE> . You are receiving this because you commented.Message ID: ***@***.***>

cliffburdick · 2024-08-02T03:19:06Z

Ok thx, for the below:

auto A_primeOp = (A_prime = matx::remap<0>(B, idx));

is A_prime more or less a dummy variable then?

I.e. what I'm trying to say, is that if you wish to do deferred execution of the operators, do you more or less have to use a dummy variable? - assuming that's what A_prime is?

In that example A_prime is a tensor that you're storing to, and A_primeOp is a variable describing the operation of storing into that tensor. When you call run() that is triggering the statement to do what your operator is describing, which is storing the output into that tensor.

In reality you likely wouldn't store that statement into a variable and instead would just do (A_prime = matx::remap<0>(B, idx)).run()

HugoPhibbs · 2024-08-02T03:23:03Z

@cliffburdick ok thx. But if I wanted optimality interms of not having to launch the kernel multiple times etc. I would be best storing the statement right?

cliffburdick · 2024-08-02T03:31:40Z

@cliffburdick ok thx. But if I wanted optimality interms of not having to launch the kernel multiple times etc. I would be best storing the statement right?

Yes. See this example here where several lines are composed, and only the final line runs the kernel.

HugoPhibbs · 2024-08-02T03:52:49Z

Ok thx, I followed the auto op = ... pattern for the entire loop body.

But now I've got a small error that I feel like shouldn't be a problem...

Would you know why the below doesn't work?

auto YBatch_t_op = (XBatchReshaped_t_op - matx::repmat(XSubsetReshaped_t_op, {1, 2*k*m, 1})); 

auto YBatch_t_norm_op = matx::vector_norm(YBatch_t_op, {2}, matx::NormOrder::L2); // Gives 'no instance of constructor' error

(matx::slice(distances_t, {i, 0}, {maxBatchIdx + 1, matx::matxEnd}) = YBatch_t_norm_op).run();

cliffburdick · 2024-08-02T03:56:25Z

Can you paste the exact error?

HugoPhibbs · 2024-08-02T03:57:17Z

Yep its quite long though.

/usr/local/include/matx/operators/norm.h(163): error: no instance of constructor "matx::detail::NormOp<OpA, NormType>::NormOp [with OpA=matx::detail::matxBinaryOp<matx::detail::ReshapeOp<3, matx::detail::RemapOp<0, matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::matx_allocator<float>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, matx::detail::FlattenOp<matx::detail::RemapOp<0, matx::tensor_t<int, 2, matx::basic_storage<matx::raw_pointer_buffer<int, matx::matx_allocator<int>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, matx::detail::SliceOp<1, matx::detail::ReshapeOp<1, matx::tensor_t<int, 2, matx::basic_storage<matx::raw_pointer_buffer<int, matx::matx_allocator<int>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, cuda::std::__4::array<long long, 1UL>>>>>>, cuda::std::__4::array<long long, 3UL>>, matx::detail::RepMatOp<matx::detail::ReshapeOp<3, matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::matx_allocator<float>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, cuda::std::__4::array<long long, 3UL>>, 3>, matx::detail::BinOp<float, float, matx::detail::SubF<float, float>>>, NormType=matx::detail::NormTypeVector]" matches the argument list
            argument types are: (matx::detail::PermuteOp<matx::detail::matxBinaryOp<matx::detail::ReshapeOp<3, matx::detail::RemapOp<0, matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::matx_allocator<float>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, matx::detail::FlattenOp<matx::detail::RemapOp<0, matx::tensor_t<int, 2, matx::basic_storage<matx::raw_pointer_buffer<int, matx::matx_allocator<int>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, matx::detail::SliceOp<1, matx::detail::ReshapeOp<1, matx::tensor_t<int, 2, matx::basic_storage<matx::raw_pointer_buffer<int, matx::matx_allocator<int>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, cuda::std::__4::array<long long, 1UL>>>>>>, cuda::std::__4::array<long long, 3UL>>, matx::detail::RepMatOp<matx::detail::ReshapeOp<3, matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::matx_allocator<float>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, cuda::std::__4::array<long long, 3UL>>, 3>, matx::detail::BinOp<float, float, matx::detail::SubF<float, float>>>>, matx::NormOrder)
          detected during instantiation of "auto matx::vector_norm(const Op &, const int (&)[D], matx::NormOrder) [with Op=matx::detail::matxBinaryOp<matx::detail::ReshapeOp<3, matx::detail::RemapOp<0, matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::matx_allocator<float>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, matx::detail::FlattenOp<matx::detail::RemapOp<0, matx::tensor_t<int, 2, matx::basic_storage<matx::raw_pointer_buffer<int, matx::matx_allocator<int>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, matx::detail::SliceOp<1, matx::detail::ReshapeOp<1, matx::tensor_t<int, 2, matx::basic_storage<matx::raw_pointer_buffer<int, matx::matx_allocator<int>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, cuda::std::__4::array<long long, 1UL>>>>>>, cuda::std::__4::array<long long, 3UL>>, matx::detail::RepMatOp<matx::detail::ReshapeOp<3, matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::matx_allocator<float>>>, matx::tensor_desc_t<cuda::std::__4::array<matx::index_t, 2UL>, cuda::std::__4::array<matx::index_t, 2UL>, 2>>, cuda::std::__4::array<long long, 3UL>>, 3>, matx::detail::BinOp<float, float, matx::detail::SubF<float, float>>>, D=1]"

cliffburdick · 2024-08-02T04:12:22Z

Can you please paste all the code like you did above? I want to make sure this error makes sense for the type

HugoPhibbs · 2024-08-02T04:24:38Z

Yep sure thing.

I removed virtually all the initializations from before the loop - they're now not necessary..

matx::tensor_t<float, 2>  GsDBSCAN::findDistancesMatX(matx::tensor_t<float, 2> &X_t, matx::tensor_t<int, 2> &A_t, matx::tensor_t<int, 2> &B_t, float alpha) {
    const int k = A_t.Shape()[1] / 2;
    const int m = B_t.Shape()[1];

    const int n = X_t.Shape()[0];
    const int d = X_t.Shape()[1];

    const int batchSize = GsDBSCAN::findDistanceBatchSize(alpha, n, d, k, m);

    auto AFlat_t = matx::flatten(A_t);

    auto distances_t = matx::make_tensor<float>({n, 2*k*m});

    for (int i = 0; i < n; i += batchSize) {
        int maxBatchIdx = i + batchSize - 1; // Index within X along the ROWS

        auto XSubset_t_op = matx::slice(X_t, {i, 0}, {maxBatchIdx + 1, matx::matxEnd});

        auto ABatchFlat_t_op = matx::slice(AFlat_t, {i * 2 * k}, {(maxBatchIdx + 1) * 2 * k});

        auto BBatch_t_op = matx::remap<0>(B_t, ABatchFlat_t_op);

        auto XBatch_t_op = matx::remap<0>(X_t, matx::flatten(BBatch_t_op));

        auto XBatchReshaped_t_op = matx::reshape(XBatch_t_op, {batchSize, 2*k*m, d});

        auto XSubsetReshaped_t_op = matx::reshape(XSubset_t_op, {batchSize, 1, d});

        auto YBatch_t_op = (XBatchReshaped_t_op - matx::repmat(XSubsetReshaped_t_op, {1, 2*k*m, 1})); // Repmat is a workaround for minusing naively incompatibhle tensor shapes

        auto YBatch_t_norm_op = matx::vector_norm(YBatch_t_op, {2}, matx::NormOrder::L2);

        (matx::slice(distances_t, {i, 0}, {maxBatchIdx + 1, matx::matxEnd}) = YBatch_t_norm_op).run();
    }

    return distances_t;
}

cliffburdick · 2024-08-02T04:30:20Z

Thanks. I'm not at a computer right now but I'll take a look when I have a chance unless someone else beats me to it.

HugoPhibbs · 2024-08-02T04:37:59Z

Ok thx v much

…

On Fri, 2 Aug 2024, 4:30 pm Cliff Burdick, ***@***.***> wrote: Thanks. I'm not at a computer right now but I'll take a look when I have a chance unless someone else beats me to it. — Reply to this email directly, view it on GitHub <#686 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATOET6LMYPKUOPRBM6WIN5TZPMDPDAVCNFSM6AAAAABL3V2SA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRUGUZDKNBQGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

luitjens · 2024-08-02T15:49:17Z

It looks like we have a template mismatch here. I think I understand what is going on. First look here:

https://github.com/NVIDIA/MatX/blob/main/include/matx/operators/norm.h#L162-L163

template <typename Op, int D>
auto vector_norm(const Op &op, const int (&dims)[D],
NormOrder order = NormOrder::NONE) {
auto perm = detail::getPermuteDimsOp::Rank()(dims);
auto permop = permute(op, perm);
return detail::NormOp<Op, detail::NormTypeVector>(permop, order);
}

What I believe is happening is this: Op is one type but permop is another type. This leads to the type passed into detail::NormOp<Op... being different than permop causing this error.

One fix would be to replace this:

return detail::NormOp<Op, detail::NormTypeVector>(permop, order);

with this:

return detail::NormOp(permop, order);

If that doesn't work then we can probably replace <Op...> with <decltype(permop)...>

HugoPhibbs · 2024-08-03T00:47:39Z

@luitjens ok thx. First fix didn't seem to work, but replacing <Op...> with <decltype(permop)...> did

luitjens · 2024-08-03T13:35:15Z

Instead of closing this we should leave it open until we get a PR which fixes the issue.

…

On Fri, Aug 2, 2024, 8:48 PM Hugo Phibbs ***@***.***> wrote: Closed #686 <#686> as completed. — Reply to this email directly, view it on GitHub <#686 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABSFS4WOJ33EDCG6H26WEIDZPRAHLAVCNFSM6AAAAABL3V2SA2VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTG42DOOJZGE3DOMY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Fixes #686

luitjens · 2024-08-07T16:07:58Z

fixed in #696

HugoPhibbs mentioned this issue Aug 2, 2024

[FEA] Indexing one tensor with another tensor #679

Closed

HugoPhibbs closed this as completed Aug 3, 2024

HugoPhibbs reopened this Aug 4, 2024

luitjens added a commit that referenced this issue Aug 6, 2024

fix type for permuted ops in norm.

304cbaa

Fixes #686

luitjens mentioned this issue Aug 6, 2024

fix type for permuted ops in norm. #696

Merged

luitjens added a commit that referenced this issue Aug 6, 2024

reproducer for #686

60fdd95

cliffburdick closed this as completed in #696 Aug 7, 2024

cliffburdick closed this as completed in 6835d2e Aug 7, 2024

luitjens added a commit that referenced this issue Aug 16, 2024

reproducer for #686

f7a6f73

luitjens added a commit that referenced this issue Aug 30, 2024

reproducer for #686

90bf114

cliffburdick pushed a commit that referenced this issue Sep 24, 2024

reproducer for #686

0b1434f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Composing operators #686

[QST] Composing operators #686

HugoPhibbs commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024

luitjens commented Aug 2, 2024 •

edited

Loading

luitjens commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024 •

edited

Loading

luitjens commented Aug 2, 2024 via email

HugoPhibbs commented Aug 2, 2024 •

edited

Loading

luitjens commented Aug 2, 2024 via email

cliffburdick commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024

cliffburdick commented Aug 2, 2024 •

edited

Loading

HugoPhibbs commented Aug 2, 2024 •

edited

Loading

cliffburdick commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024

cliffburdick commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024 •

edited

Loading

cliffburdick commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024 via email

luitjens commented Aug 2, 2024 •

edited

Loading

HugoPhibbs commented Aug 3, 2024

luitjens commented Aug 3, 2024 via email

luitjens commented Aug 7, 2024

[QST] Composing operators #686

[QST] Composing operators #686

Comments

HugoPhibbs commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024

luitjens commented Aug 2, 2024 • edited Loading

luitjens commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024 • edited Loading

luitjens commented Aug 2, 2024 via email

HugoPhibbs commented Aug 2, 2024 • edited Loading

luitjens commented Aug 2, 2024 via email

cliffburdick commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024

cliffburdick commented Aug 2, 2024 • edited Loading

HugoPhibbs commented Aug 2, 2024 • edited Loading

cliffburdick commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024

cliffburdick commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024 • edited Loading

cliffburdick commented Aug 2, 2024

HugoPhibbs commented Aug 2, 2024 via email

luitjens commented Aug 2, 2024 • edited Loading

HugoPhibbs commented Aug 3, 2024

luitjens commented Aug 3, 2024 via email

luitjens commented Aug 7, 2024

luitjens commented Aug 2, 2024 •

edited

Loading

HugoPhibbs commented Aug 2, 2024 •

edited

Loading

HugoPhibbs commented Aug 2, 2024 •

edited

Loading

cliffburdick commented Aug 2, 2024 •

edited

Loading

HugoPhibbs commented Aug 2, 2024 •

edited

Loading

HugoPhibbs commented Aug 2, 2024 •

edited

Loading

luitjens commented Aug 2, 2024 •

edited

Loading