CUDA: Add chunked processing for winding number algorithms #4195

oitel · 2025-02-25T11:22:07Z

No description provided.

Grantim · 2025-02-25T11:45:19Z

source/MRViewer/MRCudaAccessor.cpp

+size_t CudaAccessor::fromGridMemory( const Mesh& mesh, const Vector3i& )
 {
-    return fastWindingNumberMeshMemory( mesh ) + size_t( dims.x ) * dims.y * dims.z * sizeof( float );
+    return fastWindingNumberMeshMemory( mesh );
 }

-size_t CudaAccessor::fromVectorMemory( const Mesh& mesh, size_t inputSize )
+size_t CudaAccessor::fromVectorMemory( const Mesh& mesh, size_t )
 {
-    return fastWindingNumberMeshMemory( mesh ) + inputSize * ( sizeof( float ) + sizeof( Vector3f ) );
+    return fastWindingNumberMeshMemory( mesh );
 }

 size_t CudaAccessor::selfIntersectionsMemory( const Mesh& mesh )
 {
-    return fastWindingNumberMeshMemory( mesh ) + mesh.topology.faceSize() * sizeof( float );
+    return fastWindingNumberMeshMemory( mesh );
 }


should we add here some minimum fixed amount?

Grantim · 2025-02-25T11:48:26Z

source/MRMesh/MRChunkIterator.h

+    /// chunk index
+    size_t index;


do we need to store index in chunk?

It is useful for logging and progress callbacks.

Grantim · 2025-02-25T11:49:39Z

source/MRMesh/MRChunkIterator.cpp

+
+    const auto size = totalSize - overlap; // otherwise the last chunk's size may be smaller or equal to the overlap i.e. fully in the previous chunk
+    const auto step = chunkSize - overlap;
+    return ( size / step ) + !!( size % step ); // integer variant of `std::ceil( a / b )`


return (size + step - 1) / step;

Grantim · 2025-02-25T11:51:26Z

source/MRCuda/MRCudaFastWindingNumber.cuh

+    const Dipole* __restrict__ dipoles;
+    const Node3* __restrict__ nodes;
+    const float3* __restrict__ meshPoints;
+    const FaceToThreeVerts* __restrict__ faces;


should we initialize it with nulltprs?

Grantim · 2025-02-25T11:53:47Z

source/MRCuda/MRCudaFastWindingNumber.cu


    const auto q = ( meshPoints[face.verts[0]] + meshPoints[face.verts[1]] + meshPoints[face.verts[2]] ) / 3.0f;
-    processPoint( q, resVec[index], dipoles, nodes, meshPoints, faces, beta, index );
+    processPoint( q, resVec[index], dipoles, nodes, meshPoints, faces, beta );


should we pass faceIndex instead of index here (please compare to cpu version)

Grantim · 2025-02-25T11:59:19Z

source/MRCuda/MRCudaFastWindingNumber.cpp

-        res.resize( size );
-        CUDA_LOGE_RETURN_UNEXPECTED( data_->cudaPoints.fromVector( points ) );
+        // TODO: allow user to set the upper limit
+        const auto maxBufferBytes = getCudaAvailableMemory();


lets use ~70-80% of max available memory just in case

Grantim · 2025-02-25T12:00:57Z

source/MRCuda/MRCudaFastWindingNumber.cpp

+        DynamicArray<float3> cudaPoints;
+        CUDA_LOGE_RETURN_UNEXPECTED( cudaPoints.resize( bufferSize ) );


in this case we use 3 floats, while bufferSize was counted for one float

oitel added 18 commits February 24, 2025 16:17

WIP

43e6d03

WIP

b5e528d

WIP

3386f0d

WIP

b626480

Fix

ca0a7f9

Fix

a4d6bb2

Use all available memory

e426e90

Refactor data struct

982ad60

Migrate other winding number functions

1459541

Restore original methods

fce042c

Add restrict qualifiers

420f5bd

Restore formatting

351d27e

Merge branch 'master' into feature/cuda_buffer_slice

4075891

Update comments

63eb96e

Make ChunkIterator compatible with STL algorithms

1d9d0ef

Add unit tests for ChunkIterator

833dbb4

Fixup

1ce7a6f

Change memory limits

fe6bdfd

oitel requested a review from Grantim February 25, 2025 11:23

Add missing VS project items

3d4283c

Grantim reviewed Feb 25, 2025

View reviewed changes

oitel added 9 commits February 25, 2025 13:34

Fix algo

78a2c16

Fix algo

5eb51e3

Limit CUDA buffer size

afa4907

Fix float3 buffer size

e25b37c

Initialize struct fields with zeroes

f7ccfc3

Fix MSVC build

8dc1ab9

Set minimum buffer size limits

99287c4

Get rid of index field

7886b17

Fix test

eda0b75

oitel added 4 commits February 25, 2025 14:47

Remove excess declaration

b30a189

Add comment

905e11d

Fix MSVC build

f92af35

Merge branch 'master' into feature/cuda_buffer_slice

177fb4b

Grantim approved these changes Feb 25, 2025

View reviewed changes

Fix MSVC build

63aee71

oitel merged commit 937c0df into master Feb 25, 2025
32 checks passed

oitel deleted the feature/cuda_buffer_slice branch February 25, 2025 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: Add chunked processing for winding number algorithms #4195

CUDA: Add chunked processing for winding number algorithms #4195

Uh oh!

oitel commented Feb 25, 2025

Uh oh!

Grantim Feb 25, 2025

Uh oh!

Grantim Feb 25, 2025

Uh oh!

oitel Feb 25, 2025

Uh oh!

Grantim Feb 25, 2025

Uh oh!

Grantim Feb 25, 2025

Uh oh!

Grantim Feb 25, 2025

Uh oh!

Grantim Feb 25, 2025

Uh oh!

Grantim Feb 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		DynamicArray<float3> cudaPoints;
		CUDA_LOGE_RETURN_UNEXPECTED( cudaPoints.resize( bufferSize ) );

CUDA: Add chunked processing for winding number algorithms #4195

CUDA: Add chunked processing for winding number algorithms #4195

Uh oh!

Conversation

oitel commented Feb 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants