-
Notifications
You must be signed in to change notification settings - Fork 98
CUDA: Add chunked processing for winding number algorithms #4195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
source/MRViewer/MRCudaAccessor.cpp
Outdated
| size_t CudaAccessor::fromGridMemory( const Mesh& mesh, const Vector3i& ) | ||
| { | ||
| return fastWindingNumberMeshMemory( mesh ) + size_t( dims.x ) * dims.y * dims.z * sizeof( float ); | ||
| return fastWindingNumberMeshMemory( mesh ); | ||
| } | ||
|
|
||
| size_t CudaAccessor::fromVectorMemory( const Mesh& mesh, size_t inputSize ) | ||
| size_t CudaAccessor::fromVectorMemory( const Mesh& mesh, size_t ) | ||
| { | ||
| return fastWindingNumberMeshMemory( mesh ) + inputSize * ( sizeof( float ) + sizeof( Vector3f ) ); | ||
| return fastWindingNumberMeshMemory( mesh ); | ||
| } | ||
|
|
||
| size_t CudaAccessor::selfIntersectionsMemory( const Mesh& mesh ) | ||
| { | ||
| return fastWindingNumberMeshMemory( mesh ) + mesh.topology.faceSize() * sizeof( float ); | ||
| return fastWindingNumberMeshMemory( mesh ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add here some minimum fixed amount?
source/MRMesh/MRChunkIterator.h
Outdated
| /// chunk index | ||
| size_t index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to store index in chunk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is useful for logging and progress callbacks.
source/MRMesh/MRChunkIterator.cpp
Outdated
|
|
||
| const auto size = totalSize - overlap; // otherwise the last chunk's size may be smaller or equal to the overlap i.e. fully in the previous chunk | ||
| const auto step = chunkSize - overlap; | ||
| return ( size / step ) + !!( size % step ); // integer variant of `std::ceil( a / b )` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (size + step - 1) / step;
| const Dipole* __restrict__ dipoles; | ||
| const Node3* __restrict__ nodes; | ||
| const float3* __restrict__ meshPoints; | ||
| const FaceToThreeVerts* __restrict__ faces; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we initialize it with nulltprs?
|
|
||
| const auto q = ( meshPoints[face.verts[0]] + meshPoints[face.verts[1]] + meshPoints[face.verts[2]] ) / 3.0f; | ||
| processPoint( q, resVec[index], dipoles, nodes, meshPoints, faces, beta, index ); | ||
| processPoint( q, resVec[index], dipoles, nodes, meshPoints, faces, beta ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we pass faceIndex instead of index here (please compare to cpu version)
| res.resize( size ); | ||
| CUDA_LOGE_RETURN_UNEXPECTED( data_->cudaPoints.fromVector( points ) ); | ||
| // TODO: allow user to set the upper limit | ||
| const auto maxBufferBytes = getCudaAvailableMemory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets use ~70-80% of max available memory just in case
| DynamicArray<float3> cudaPoints; | ||
| CUDA_LOGE_RETURN_UNEXPECTED( cudaPoints.resize( bufferSize ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case we use 3 floats, while bufferSize was counted for one float
No description provided.