-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need Help with a Softmax Warning in TensorFlow 2.16 #67758
Comments
I am experiencing the same issue when implementing my own transformer encoder decoder. So far, i am still missing positional encoding and some masking layers, i don't know whether those would affect it in any way. Here is my code, showing the same warning with tf-nightly, and python 3.11.9
UserWarning: You are using a softmax over axis 3 of a tensor of shape (2, 8, 1, 1). This axis has size 1. The softmax operation will always return the value 1, which is likely not what you intended. Did you mean to use a sigmoid instead? Output shape: (2, 4, 10000) |
Hi @sp00N221 , |
Hey, Thank you for taking the time to review my issue. I've had nothing but problems with my task over the past few days. I had a combination of a TransformerBlock and LSTM layers. Coupled with Optuna, it was probably just too many variables and possibilities, causing the model to become unstable. I have now switched to this task: def objective(trial, features, target):
With this, I have no problems. |
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
No
Source
source
TensorFlow version
2.16.1
Custom code
Yes
OS platform and distribution
Windows 11
Mobile device
No response
Python version
3.12.3
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
Hey everyone,
I'm running into a bit of a headache with TensorFlow 2.16 and could really use some help. I'm getting this annoying warning about a Softmax operation over an axis with size 1. This pops up when I'm using a custom TransformerBlock layer that includes MultiHeadAttention.
What I've Tried:
Debugging Dimensions:
Added print statements to check tensor shapes at different stages.
Used tf.squeeze to remove dimensions of size 1 before passing the tensor to MultiHeadAttention.
What I Need:
Is this a bug in TensorFlow 2.16? If yes, any workarounds or patches?
Best practices for handling tensor dimensions in MultiHeadAttention to avoid this?
Should I downgrade or wait for an update? If yes, which version should I try?
Additional Info:
Using LSTM and GRU layers followed by the custom TransformerBlock.
Running on Windows with Python 3.12.
Any help or pointers would be greatly appreciated! Thanks!
Standalone code to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: