Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal error when cloning (sometimes) #184

Open
okemppin opened this issue Aug 9, 2021 · 5 comments
Open

Fatal error when cloning (sometimes) #184

okemppin opened this issue Aug 9, 2021 · 5 comments
Assignees

Comments

@okemppin
Copy link

okemppin commented Aug 9, 2021

When I tried to clone the model on August 3 this happened:

nobackup/geos-new> mepo clone -b feature/esherman/gocart2g_aerochemUpdates_v10.17.6 git@github.com:GEOS-ESM/GEOSgcm.git
env | (t) v3.2.0
cmake | (t) v3.3.9
ecbuild | (t) geos/v1.0.6
fatal: cannot change to '/gpfsm/dnb32/okemppin/geos-new/./src/Shared/@NCEP_Shared': No such file or directory

The issue happened 10-20 times during the day when I tried (including after fresh logins). During that time other colleagues verified that they can run the exact same command, and I also verified I get the same error when trying a different branch. This was not in a batch session but a regular discover login. I have used mepo clone -b in the past without issues. There was also plenty of space available on the disk.

Later the same day (past 5pm) everything worked without any issues without me changing anything from the past tries and I managed to clone the model.

@mathomp4
Copy link
Member

mathomp4 commented Aug 9, 2021

@okemppin This definitely makes me think this is a filesystem or other system fault. Really all mepo is doing at this point is a git clone so it's like git clone failed?

You said you have enough disk space. How about your inodes? How are you doing on those?

@mathomp4 mathomp4 self-assigned this Aug 9, 2021
@okemppin
Copy link
Author

okemppin commented Aug 9, 2021

@mathomp4 I was able to do a regular git clone for the GEOSgcm fixture at that time, and mepo did clone some files (the fixture I think) successfully before the crash. I should have tested the same command on a different server as a sanity check but I didn't think of it, but I also thought it was a discover file or other system issue that was somehow only afflicting me and not the other people who tried it. I have plenty of inode allowance free so that is likely not the explanation.

Unfortunately I have never seen this issue before or after that day so I have no idea on how to reproduce it.

@mathomp4
Copy link
Member

mathomp4 commented Aug 9, 2021

@okemppin So weird. My only other thought is sometimes I see weird things on discover if the TMPDIR space (or /tmp) is full. It's possible git is using a temp space for some actions and if it is full, maybe random commands fail?

I guess if it happens again, try and record the head node it happened on. Then we can maybe try and figure why that node is different?

@okemppin
Copy link
Author

okemppin commented Aug 9, 2021

@mathomp4 Will do. Thanks!

@mathomp4
Copy link
Member

mathomp4 commented Aug 9, 2021

@okemppin And thanks to you for using mepo. It is possible there is a bug in it. I'm...not the best Python programmer. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants