-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fatal error when cloning (sometimes) #184
Comments
@okemppin This definitely makes me think this is a filesystem or other system fault. Really all mepo is doing at this point is a You said you have enough disk space. How about your inodes? How are you doing on those? |
@mathomp4 I was able to do a regular git clone for the GEOSgcm fixture at that time, and mepo did clone some files (the fixture I think) successfully before the crash. I should have tested the same command on a different server as a sanity check but I didn't think of it, but I also thought it was a discover file or other system issue that was somehow only afflicting me and not the other people who tried it. I have plenty of inode allowance free so that is likely not the explanation. Unfortunately I have never seen this issue before or after that day so I have no idea on how to reproduce it. |
@okemppin So weird. My only other thought is sometimes I see weird things on discover if the TMPDIR space (or I guess if it happens again, try and record the head node it happened on. Then we can maybe try and figure why that node is different? |
@mathomp4 Will do. Thanks! |
@okemppin And thanks to you for using |
When I tried to clone the model on August 3 this happened:
nobackup/geos-new> mepo clone -b feature/esherman/gocart2g_aerochemUpdates_v10.17.6 git@github.com:GEOS-ESM/GEOSgcm.git
env | (t) v3.2.0
cmake | (t) v3.3.9
ecbuild | (t) geos/v1.0.6
fatal: cannot change to '/gpfsm/dnb32/okemppin/geos-new/./src/Shared/@NCEP_Shared': No such file or directory
The issue happened 10-20 times during the day when I tried (including after fresh logins). During that time other colleagues verified that they can run the exact same command, and I also verified I get the same error when trying a different branch. This was not in a batch session but a regular discover login. I have used mepo clone -b in the past without issues. There was also plenty of space available on the disk.
Later the same day (past 5pm) everything worked without any issues without me changing anything from the past tries and I managed to clone the model.
The text was updated successfully, but these errors were encountered: