[issue-858] GPU implementation of 911 vertices by NicolasJPosey · Pull Request #880 · UWB-Biocomputing/Graphitti

NicolasJPosey · 2025-08-27T18:35:43Z

Closes #858

Description

This PR introduces a GPU implementation of the NG911 model in connect with Nicolas Posey's master's capstone.

A small 911 test file was added to the GPU list in RunTest.sh file to allow for automated regression testing of this new implementation. Documentation has also been added to help with the process of implementing new mirrored GPU implementations from existing, domain-specific CPU implementations (see CpuGpuArchitecture.md). Lastly, documentation of the existing small 911 test file and a new medium 911 test file has been added to help start the process of documenting important configuration information of existing regression test files (see RegressionTestDocumentation.md).

Checklist (Mandatory for new features)

Added Documentation
Added Unit Tests

Testing (Mandatory for all changes)

GPU Test: test-medium-connected.xml Passed
GPU Test: test-large-long.xml Passed

Adds support for two uint64_t argument functions so that loadEpochInputs can be registered and called from the OperationManager class.

…-911vertices-gpu-implementation

Added a data member for storing the total number of events that are read into the InputManager. This allows us to define vector capacities based on the number of events being simulated.

AllVertices now has a non-virtual loadEpochInputs method. This calls two virtual methods, one for loading the epoch inputs and other for copy the inputs to the GPU. The default behavior for both is to do nothing.

…od instead of a connections method This method makes more sense to be a behavior of vertices as the behavior is also needed to be run on the GPU.

…-911vertices-gpu-implementation

We need a dynamically sized array use we use a vector instead of an array. But we want the implement to be easily mirrored on the GPU so we interact with the vector like we would with an array.

…ent call The push back call was not easy to mirror on the GPU. We already have examples of using the EventBuffer and insertEvent call on the GPU so we change to use this implementation. This also allows us to make it clear what size the buffer should be, again helping the mirrored GPU implementation.

This allows us to remove the resize calls which we don't want to do on the GPU. Also added a DoubleEventBuffer to use in place of RecordableVector<double>.

…-911vertices-gpu-implementation

The correct pattern is to first copy the device pointers to the CPU and then the values to the CPU data members. It happens to be the same that the number of bytes for a uint64_t and a uint64_t pointer are the same. However, if this pattern is repeated for a type like float, an illegal memory error is thrown.

The GPU noise array only works for numVertices >= 100 and that are a multiple of 100. Otherwise, an invalid kernel configuration error is thrown which masks other possible errors.

Having asserts in kernels can cause them to fail silence. Using print statements and returning is a better way to fail inside kernels.

Clean up of commented out code, unnecessary extra variables, and unused methods.

The update in vertex creation to make each vertex have the same sized data member for the GPU made it so that we would never get a dropped call due to large queue sizes. The logic was changed were we interact with vertex queues in PSAPs and RESPs to act like the size is equal to the number of trunks which was the original implementation size.

RecordableVectors are cleared after each epoch if they are the dynamic type. The size is not reset. We need subscript operator access for droppedCalls so the type much be constant which does not clear the vector after each epoch.

The current implementation for generating noise on a device has some assumptions that break for the 911 GPU model. To get noise support for 911, we implemented a way to have vertices specific how many noise elements they will need. A method was then implemented in GPUModel that rounds the input up to the nearest multiple of 100.

…-911vertices-gpu-implementation

NicolasJPosey · 2026-02-15T22:26:10Z

Testing/RegressionTesting/GoodOutput/Gpu/test-tiny-out.xml

Change due to allowing for noise when using less than 100 vertices in GPUModel.cpp

Simulator/Core/GPUModel.cpp

stiber

Two docs items.

docs/Developer/CpuGpuArchitecture.md

…ation relative to GPU

* [issue-723] 911 edges GPU implementation (#867) * Use vector of char for consistency with neuro * Move setEdgeClassID method down into AllNeuroEdges since it's not used in 911 edges * Initial implementation of 911 edges for GPU * Make public explicitly * Fix clang format issue * Another try at a clang fix * Manually release array memory from stack on device copy This is to prevent a segmentation fault due to a stack overflow from the array declarations when running large graphs * Expand vertex type map reported in debug log * Resolve post merge changes * Update GPU results with updates from PR 877 * [issue-858] GPU implementation of 911 vertices (#880) * Add loadEpochInputs to OperationManager Adds support for two uint64_t argument functions so that loadEpochInputs can be registered and called from the OperationManager class. * Add vertices device struct to 911 class * Add total number of events data member to InputManager Added a data member for storing the total number of events that are read into the InputManager. This allows us to define vector capacities based on the number of events being simulated. * Initial CPU GPU architecture documentation * Refactor of loadEpochInputs to support loading inputs to GPU AllVertices now has a non-virtual loadEpochInputs method. This calls two virtual methods, one for loading the epoch inputs and other for copy the inputs to the GPU. The default behavior for both is to do nothing. * Refactor getEdgeToClosestResponder method to be a All911Vertices method instead of a connections method This method makes more sense to be a behavior of vertices as the behavior is also needed to be run on the GPU. * Forgot CPU code * Some GPU implementations but is incomplete * Remove reserve call since RecordableVector doesn't implement it * Refactor internal vector use in PSAP and RESP advance logic We need a dynamically sized array use we use a vector instead of an array. But we want the implement to be easily mirrored on the GPU so we interact with the vector like we would with an array. * Convert call metrics to EventBuffers and swap push back for insert event call The push back call was not easy to mirror on the GPU. We already have examples of using the EventBuffer and insertEvent call on the GPU so we change to use this implementation. This also allows us to make it clear what size the buffer should be, again helping the mirrored GPU implementation. * Replace numeric bool with actual bool for readability * Change vector type from RecordableVector to EventBuffer This allows us to remove the resize calls which we don't want to do on the GPU. Also added a DoubleEventBuffer to use in place of RecordableVector<double>. * Bug fix for copying spike histories from device The correct pattern is to first copy the device pointers to the CPU and then the values to the CPU data members. It happens to be the same that the number of bytes for a uint64_t and a uint64_t pointer are the same. However, if this pattern is repeated for a type like float, an illegal memory error is thrown. * Add a guard and debugging message for GPU random noise The GPU noise array only works for numVertices >= 100 and that are a multiple of 100. Otherwise, an invalid kernel configuration error is thrown which masks other possible errors. * Updates to support copying to and from GPU * Support for copying to and from GPU and make type float for now * Add GPU 911 vertices to make list * Implementation runs but results aren't quite right * Fix case sensative copying of call responder types * Fix bug using wrong size for queue length and utilization histories * Remove debugging printfs and replace asserts with printfs for errors Having asserts in kernels can cause them to fail silence. Using print statements and returning is a better way to fail inside kernels. * General cleanup Clean up of commented out code, unnecessary extra variables, and unused methods. * Free the array used to determine available servers and units in kernels. * Readd support for getting dropped calls The update in vertex creation to make each vertex have the same sized data member for the GPU made it so that we would never get a dropped call due to large queue sizes. The logic was changed were we interact with vertex queues in PSAPs and RESPs to act like the size is equal to the number of trunks which was the original implementation size. * Fix error if a dropped call is found after the first epoch RecordableVectors are cleared after each epoch if they are the dynamic type. The size is not reset. We need subscript operator access for droppedCalls so the type much be constant which does not clear the vector after each epoch. * Support for noise in 911 models The current implementation for generating noise on a device has some assumptions that break for the 911 GPU model. To get noise support for 911, we implemented a way to have vertices specific how many noise elements they will need. A method was then implemented in GPUModel that rounds the input up to the nearest multiple of 100. * Add assert for random number thread count * Add support for using noise to simulate attempted redials Because only caller regions simulate attempted redials, we add a vector to map the caller region vertex IDs to the noise array on the device. This allows us to use the existing noise algorithm with larger graphs since we can only generate noise for up to 10000 vertices. * Fix isFull error message to show right buffer size * Fix bug with waiting queue check If the number of trunks and servers is equal and the queue is full, capacity minus busy servers is negative. Since dstQueueSize is of type uint64_t, it can't be negative. The comparison then gives a false positive that the queue is not full. Fix is to cast the size to an int so that the right comparison is done. * Debugging statements for memory analysis * Add some larger 911 graphs * Updates to history to support less memory usage on GPU The call metrics account for the vast majority of the physical memory used by the GPU. By resizing each to a smaller value, we can fit larger graphs on the GPU by using more epochs with smaller steps per epoch. * Fix firing rate value Firing rate should actually be equal to 1 since we can have at most 1 call per second. * Fix issue using wrong buffer size The buffer size used for a CircularBuffer is 1 more than the capacity passed into the constructor. When we construct the buffer, we pass in the number of trunks but were effectively using 1 less during the simulation. * Fix getting front index when we want end index for queue length calculation * Add back in random redial attempt * More updates to reduce memory usage Metrics that used totalNumberOfEvents and totalTimeSteps were using more memory than needed. These were changed to maxEventsPerEpoch and stepsPerEpoch respectively. Also changed copyTo and copyFrom in All911Edges to use heap memory to prevent stack overflows with large graphs. * Fix bug with vertex queue size The buffer inside the CircularBuffer implementation is 1 larger than the capacity set at construction. VertexQueues are CircularBuffers so we add 1 where we use the buffer size. * Another CircularBuffer size bug fix Fixed allocation, copyTo, and copyFrom for VertexQueues. They are CircularBuffers which internally have a buffer that is 1 more than the capacity. The sizes used were updated to be 1 more than the stepsPerEpoch to match the construction capacity. * Fix firing rate and change epoch parameters to reduce memory Memory is mostly dependent on epoch duration so we decrease that parameter and increase the number of epochs parameter by the same factor. This keeps the total time steps constant but reduces memory usage. We can only have 1 call per step so the max firing rate should be 1. * Add an approximate state wide, month long configuration * General cleanup and adding of comments * GPU Optimizations Remove some branching and make changes to reduce amount of register usage. * Dataset updates * Timing adds, documentation, and updates * Add regression testing documentation markdown * Update after changing Abandoned and QueueLength history types * Add small 911 test to regression script * Add larger 911 test This corresponds to Dataset A in Posey capstone report. * Remove testing datasets * Remove temp timing changes * Correct how 2D arrays are copied from device to host * Add noise state logging for debugging * Noise is now generated and used for graphs with less than 100 vertices * Fix formatting * Try another clang fix * Try to fix clang in function node file * clang fix attempt * more clang * clang * Clean up and port some optimizations to CPU * clang formatting * Rename GPU documentation file * Remove trivial example and rewrite to clarify design of CPU implementation relative to GPU

NicolasJPosey added 30 commits June 23, 2025 11:23

Add loadEpochInputs to OperationManager

08dd027

Adds support for two uint64_t argument functions so that loadEpochInputs can be registered and called from the OperationManager class.

Add vertices device struct to 911 class

a34356f

Merge remote-tracking branch 'origin/PoseyDevelopment' into issue-858…

b02e61a

…-911vertices-gpu-implementation

Add total number of events data member to InputManager

aeef7d4

Added a data member for storing the total number of events that are read into the InputManager. This allows us to define vector capacities based on the number of events being simulated.

Initial CPU GPU architecture documentation

f58ad17

Refactor of loadEpochInputs to support loading inputs to GPU

b456abc

AllVertices now has a non-virtual loadEpochInputs method. This calls two virtual methods, one for loading the epoch inputs and other for copy the inputs to the GPU. The default behavior for both is to do nothing.

Refactor getEdgeToClosestResponder method to be a All911Vertices meth…

8b54e96

…od instead of a connections method This method makes more sense to be a behavior of vertices as the behavior is also needed to be run on the GPU.

Forgot CPU code

6340556

Some GPU implementations but is incomplete

6cbb595

Merge remote-tracking branch 'origin/PoseyDevelopment' into issue-858…

9209e53

…-911vertices-gpu-implementation

Remove reserve call since RecordableVector doesn't implement it

bab8265

Refactor internal vector use in PSAP and RESP advance logic

4eba0c1

We need a dynamically sized array use we use a vector instead of an array. But we want the implement to be easily mirrored on the GPU so we interact with the vector like we would with an array.

Replace numeric bool with actual bool for readability

bba175c

Change vector type from RecordableVector to EventBuffer

46f4000

This allows us to remove the resize calls which we don't want to do on the GPU. Also added a DoubleEventBuffer to use in place of RecordableVector<double>.

Merge remote-tracking branch 'origin/PoseyDevelopment' into issue-858…

03f598d

…-911vertices-gpu-implementation

Add a guard and debugging message for GPU random noise

d8b74c2

The GPU noise array only works for numVertices >= 100 and that are a multiple of 100. Otherwise, an invalid kernel configuration error is thrown which masks other possible errors.

Updates to support copying to and from GPU

b055214

Support for copying to and from GPU and make type float for now

c3893e5

Add GPU 911 vertices to make list

3fcf848

Implementation runs but results aren't quite right

944c941

Fix case sensative copying of call responder types

45d2f31

Fix bug using wrong size for queue length and utilization histories

ba7d73a

Remove debugging printfs and replace asserts with printfs for errors

815e5b3

Having asserts in kernels can cause them to fail silence. Using print statements and returning is a better way to fail inside kernels.

General cleanup

1b9541e

Clean up of commented out code, unnecessary extra variables, and unused methods.

Free the array used to determine available servers and units in kernels.

df7d22b

stiber assigned NicolasJPosey Jan 23, 2026

NicolasJPosey added 10 commits January 30, 2026 17:31

Merge remote-tracking branch 'origin/PoseyDevelopment' into issue-858…

032ad01

…-911vertices-gpu-implementation

Correct how 2D arrays are copied from device to host

f1d252a

Add noise state logging for debugging

e32985d

Noise is now generated and used for graphs with less than 100 vertices

8e6bee0

Fix formatting

6219a8f

Try another clang fix

8a11cb7

Try to fix clang in function node file

2e7ef29

clang fix attempt

ff729f0

more clang

1e3104e

clang

c90f9ae

NicolasJPosey commented Feb 15, 2026

View reviewed changes

Simulator/Core/GPUModel.cpp Show resolved Hide resolved

NicolasJPosey added 2 commits February 15, 2026 18:18

Clean up and port some optimizations to CPU

44cc254

clang formatting

4f1bb1c

NicolasJPosey added documentation Improvements or additions to documentation enhancement New feature or request GPU NG911 labels Feb 16, 2026

NicolasJPosey requested a review from stiber February 16, 2026 18:47

stiber linked an issue Feb 17, 2026 that may be closed by this pull request

Create GPU Implementation of NG911 Vertices #858

Open

stiber requested changes Mar 6, 2026

View reviewed changes

docs/Developer/CpuGpuArchitecture.md Outdated Show resolved Hide resolved

docs/Developer/CpuGpuArchitecture.md Outdated Show resolved Hide resolved

NicolasJPosey added 2 commits March 11, 2026 12:58

Rename GPU documentation file

6db008b

Remove trivial example and rewrite to clarify design of CPU implement…

df305dc

…ation relative to GPU

NicolasJPosey marked this pull request as ready for review March 12, 2026 04:26

NicolasJPosey requested a review from stiber March 12, 2026 04:27

stiber approved these changes Mar 12, 2026

View reviewed changes

NicolasJPosey merged commit f7e7fe1 into PoseyDevelopment Mar 12, 2026
2 checks passed

NicolasJPosey deleted the issue-858-911vertices-gpu-implementation branch March 13, 2026 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[issue-858] GPU implementation of 911 vertices#880

[issue-858] GPU implementation of 911 vertices#880
NicolasJPosey merged 71 commits intoPoseyDevelopmentfrom
issue-858-911vertices-gpu-implementation

NicolasJPosey commented Aug 27, 2025 •

edited

Loading

Uh oh!

NicolasJPosey Feb 15, 2026

Uh oh!

Uh oh!

stiber left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NicolasJPosey commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist (Mandatory for new features)

Testing (Mandatory for all changes)

Uh oh!

NicolasJPosey Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stiber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NicolasJPosey commented Aug 27, 2025 •

edited

Loading