-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run extremely slow on 1080Ti #29
Comments
Hi, Thank you for the error report. While it is running, does nvidia-smi show any GPU utilization from ModelAngelo? If not, could you make sure that the conda environment's pytorch is able to access the GPU? Best, |
This is very interesting. What happens when you type "which model_angelo". Is it installed in the right conda environment? If you manually pass "--device 0" to "model_angelo build", does it use the GPU? |
|
Could you please send me the log file? |
2022-12-20 at 22:19:59 | ERROR | Error in ModelAngelo File "/home/vamsee/anaconda3/envs/model_angelo/bin/model_angelo", line 33, in
zipfile.BadZipFile: File is not a zip file |
Could you email me the log file itself at kjamali@mrc-lmb.cam.ac.uk There seems to be an issue during installation |
Sent |
UPDATE: Tried several things.
In the process, I noticed, there was an HTTP request error. This has happened with anaconda and miniconda and apparently is well documented. Found some workarounds for it and the HTTP requests disappeared.
I did notice something interesting though. Every time I start the run, there is a small spike in power usage and volatile memory usage (20W, 20%) of the GPU which then drops back down to base levels (~9W and 0%) in about 10-15 seconds. Unsure of what to do next. I've tried to eliminate all possible variables. If you think of any more please suggest. Willing to try. |
Hi, From the log file, it seems that the weight download did not happen correctly. Could you please delete the folder '/home/vamsee/.cache/torch' and try again? Sorry for the inconvenience |
Hi Kiarash, No inconvenience at all. Not sure your suggestion will help anymore. I've reinstalled it several times included reinstalling the operating system itself (Check my previous update for details). I feel like I am missing something but not sure what. -Vamsee |
Does it always fail with the same message in the log? I think this has to do with the HTTP request failing. We could try manually downloading the weights, I could give you commands for that if the failure is always at the same point |
The HTTP error is internal to anaconda/miniconda and has been well documented across the web. I've disable the SSL settings for it and since then I haven't gotten the HTTP errors. I would like to try downloading the weights manually though. Maybe that'll help. Please send them to me. Thanks. |
UPDATE: I installed it on our HPC and this time, it was marginally faster. It got through the whole initial process and then went through the GNN iterations. After the initial run, it seems to be quite fast (~5-10 minutes) for all the consequent runs. If this is only related to the weights not getting downloaded, I'm wondering why that is the case. |
Sorry for the late replies, I am currently out of office for a couple of weeks. Yes, it was downloading weights before. I am unsure why it was so slow for you, but the runs after that are actually indicative of the model building speed. |
It took about 8hrs for 50% of the run to finish. Incredibly slow. Installation initially had failed for pytorch so removed the env and recreated. Started fresh and it all went smoothly the next time. No errors.
model_angelo build -h
provides the appropriate help outputCommand provided includes map, fasta file and output. Speed doesn't change if I provide the device number or not.
Computer specs
OS - Linuxmint 20.1 (Ulyssa)
Cuda - 11.7
Driver version - 515.43.04
I've also tried the installation on a 3090 machine also (runs redhat, driver version 515,76, cuda 11.7). Same result.
The text was updated successfully, but these errors were encountered: