-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
202204 release - exited on signal 11 (Segmentation fault) #11
Comments
@StevePny What test are you running that results in this seg fault failure? Is it one of the tests in the CI directory? |
@StevePny To expand upon my previous comment: Is it one of the tests in the SHiELD_build repository CI directory? |
@laurenchilutti I was able to install SHiELD and run these example cases (regional_Laura and global_nest_Laura) prior to the 202204 release. |
@StevePny |
Looks like it is the NCEP library causing the crash. Segmentation fault occurs at SHiELD_physics/gsmphys/sfcsub.F Line 2757 in 8c46d4f
However, I still don't understand why it is the case. Before this line, another NCEP library, getgbh(), works just fine. Also, the same compiler flags and arguments worked previously. |
@kaiyuan-cheng just checking in - has any progress been made on clearing up this issue, or should we continue with the pre-202204 version? |
I don't have a solution at this moment, unfortunately.
…On Mon, Jul 18, 2022 at 7:04 PM Steve Penny ***@***.***> wrote:
@kaiyuan-cheng <https://github.com/kaiyuan-cheng> just checking in - has
any progress been made on clearing up this issue, or should we continue
with the pre-202204 version?
—
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AR2VX65KW5VABFKC5XN3A5TVUXPGLANCNFSM5VPZFXLA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@StevePny It turns out that the default stack size, 8 MB, is insufficient to hold the large one-dimension variable, lbms. The solution is to set an unlimited stack size. |
Yes, the stacksize error strikes again.
'limit stacksize unlimited' should be run on all machines by default. We
may want to make an FAQ item to this extent.
Lucas
…On Thu, Sep 8, 2022 at 6:15 PM kaiyuan-cheng ***@***.***> wrote:
Closed #11 <#11> as
completed.
—
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMUQRVDDCGODRLHIXNRXIDLV5JQPVANCNFSM5VPZFXLA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
To provide a clarifying detail -
With this setting I can run the regional_Laura_test case on an AWS c6g.8xlarge ec2 instance. Note - to be safe, I also set the stack size in the ec2 instance with: |
Hi, Steve. That is great---glad to hear it runs correctly now. This is
useful background for other users of the container.
Lucas
…On Thu, Dec 1, 2022 at 10:48 AM Steve Penny ***@***.***> wrote:
To provide a clarifying detail -
The docker container does not inherit the system stack limit by default.
The ulimit can be set on the command line when running the docker
container, but 'unlimited' is not a permitted option. In order to specify
an unlimited stack size in the docker container, one can add this option:
--ulimit stack=-1
With this setting I can run the regional_Laura_test case on an AWS
c6g.8xlarge ec2 instance.
Note - to be safe, I also set the stack size in the ec2 instance with:
ulimit -s unlimited
—
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMUQRVC75IUN63OAUZEXSBTWLDCGJANCNFSM5VPZFXLA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Running on linux ubuntu gnu (docker container):
I'm still working through getting the 202204 release running successfully (i.e. to at least roughly replicate the pre-202204 version). I'm currently getting this segmentation fault. A previous segmentation fault was corrected by updating the FMS package build to the 'main' branch of the FMS repo. I've symbolic linked the aerosol.txt, solarconstant_noaa_an.txt, co2historicaldata_*.txt and a few other key files to the INPUT/ directory (from their previous location in the main experiment directory):
For verification, I've also tried running the regional_Laura test case, and get a similar error:
The text was updated successfully, but these errors were encountered: