-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading weights into TinyCUDA #6
Comments
Hi there, tiny-cuda-nn does let you load pre-computed model weights. You can use
These methods expect a CPU pointer to densely laid out network parameters. So: first layer, followed by the hidden layers, followed by the output layer. All in row major memory order. (Depending on how your tensors were laid out when you were training your model, you might have to sneak in a transposition... which would mean column major after all. Sorry about this confusion -- I don't think there's a common standard that could be easily used here.) Lastly, note that tiny-cuda-nn does not support biases, so if you haven't already, you'll need to make sure that the pre-trained model purely uses weight matrices + activations. |
Thanks for the detailed response! I've now had a chance to give it a shot, but I can't seem to get the number of params to match: Here's the Keras summary of the Tensorflow implementation of my network:
And here is my config for TinyCudaNN:
When I call
So my Tensorflow implementation has 31,424 parameters, while the TinyCudaNN version has 32,768. When I run the
It looks like my problem stems from the input and output layers. I'm not sure where 48 is coming from (I would think it would be 39, as that's the size of my input layer). Similarly I would think the 16 in the final layer would be 4, as that's my specified number of output neurons. Is there anything going on behind the scenes that could be causing this discrepancy? |
Hi there, many apologies, I totally forgot to explain the following detail: the hardware matrix multipliers (TensorCores) operate on 16x16 matrix chunks, so the input and output layers are padded to the nearest multiple of 16. For the input layer (after encoding): the padded dimensions get a value of 1 (not zero) to help the first layer of the neural network implicitly learn a bias term. For the output layer: any padded dimensions are trimmed away when calling So the following needs to change on the Keras side:
By the way, I noticed that in your config you encode the first 3 dimensions with the "otype": "Composite",
"nested": [
{
"n_dims_to_encode": 3, // Spatial dims
"otype": "Frequency",
"n_frequencies": 6
},
{
// Number of remaining linear dims is automatically derived
"otype": "Identity"
}
] (You can reverse the two nested encodings if it's the first 3 dimensions you would like to pass through.) |
Another implementation detail: I'll have a think about how to expose all this information more elegantly in the future... |
Thanks! One step closer... I've got my network fixed and the weights loaded, but now I'm hitting some issues with
I've checked and the I'm not sure what's going on with the all the I've tried running w/
However, all of the above errors aren't fatal and the program continues execution. But when I try to copy the prediction matrix from the device to the CPU to read it, I get another illegal memory access error, this time fatal:
Do you have any advice on how I should go about debugging this? Could it be caused by running out of memory (I'm running on a T4 GPU). Thanks again for the help! |
I think you identified the root cause correctly using The reason the program execution continues at first is because destructors aren't permitted to throw exceptions and thus just print the errors they're getting. (to allow the rest of the program to clean up) I recommend double-checking the inputs to the encoding/ |
I had forgotten to copy the input vector to the device. Silly silly... Thanks again for the help! Closing this for now |
Hi! I'm very excited by TinyCUDA and I'd like to test it out for an inference task on a pre-trained model. I have the network weights as a .npy file and I'd ideally like to load them into the fully fused MLP. From a quick scan of the codebase it looks like there isn't any way to load pre-computed model weights (please correct me if I'm wrong). Do you have any advice on how I could go about accomplishing this?
The text was updated successfully, but these errors were encountered: