-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradients do not exist for variables during training #7
Comments
First I would like to request not opening new issues if you already have an issue open with the same problem. I will try to help you to some extent, even though as already mentioned here as well, PrognosAIs is not supported anymore. You probably are better off using some other package or otherwise indicate why specifically you want to run this code so I can try to help with that. For example to try and reproduce our exact experiments. Another note as there seem to be spammers that make use of this issue: do not download anything from any links. Some questions:
|
I have some clinical data and I am trying to train the model using my customized dataset. I am using this code for training.
|
|
|
Based on the error message and this information I'm afraid that the issue is that your GPU card cannot handle the training: the model is quite large and the RTX4060 simply does not have enough memory unfortunately. One thing you can try is to reduce the batch size to 1, if you hadn't already done so, but I'm afraid that even that is probably not enough to make the model fit. It's not so much the batches as the model that takes up memory. Other options is to make sure that your GPU supports mixed precision and training takes advantage of it. Check whether you see the message "GPU supports a mixed float16 policy" or "GPU support float16 precision policy" in the logs. Then try to set the float policy in the config to "mixed" or "float16" to force it. Unfortunately, that's the only advice I can give you. For evaluation you should be fine to run the model, but training of the model required 8 RTX2080Ti's, each with 11GB of memory. Training on a single RTX4060 is therefore going to be a bit of a challenge if you want to train the exact same model. You can of course reduce the number of filters per layer or the number of layers, but then the model is not the same. In conclusion: this is not a problem with the code, but unfortunately a hardware limitation, therefore I'm closing the issue. |
I am using your custom_definition.py as a model for training and I am facing this error when training start.
Moreover, when Tensorflow with GPU is compiled there is issue of Memory Leakage as well. Can you please send me your requirements.txt file on which you trained your model?
The text was updated successfully, but these errors were encountered: