Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LMOD_CACHED_LOADS causes non-zero exit code when loading a module #613

Closed
smoors opened this issue Nov 20, 2022 · 18 comments
Closed

LMOD_CACHED_LOADS causes non-zero exit code when loading a module #613

smoors opened this issue Nov 20, 2022 · 18 comments

Comments

@smoors
Copy link

smoors commented Nov 20, 2022

Describe the bug
if LMOD_CACHED_LOADS is set to yes or 1, loading a module still works but returns exit code 1, without any error message.

To Reproduce

export LMOD_CACHED_LOADS=yes
module load foss/2022a
echo $?  # returns 1

Expected behavior
I would expect this to return 0, or is this intended behavior?

Desktop (please complete the following information):

  • OS: Linux
  • Linux distribution: CentOS 7
  • Lmod Version: 8.7.14
  • For Lmod Versions 8.7.7+, please run "module --miniConfig" and
    include the output here.

Modules based on Lua: Version 8.7.14  2022-11-01 10:59 -05:00
    by Robert McLay mclay@tacc.utexas.edu

Changes from Default Configuration
----------------------------------

Name                         Where Set  Default      Value
----                         ---------  -------      -----
LFS_VERSION                  D          1.6.3        1.8.0
LMOD_CACHED_LOADS            D          no           yes
LMOD_HAVE_LUA_TERM           C          no           yes
LMOD_PACKAGE_PATH            D          nil          <empty>
LMOD_PAGER                   C          less         /usr/bin/less
LMOD_SYSTEM_DEFAULT_MODULES  D          __unknown__  <empty>
LMOD_SYSTEM_NAME             E          false        hydra-skylake-ib
LMOD_TCLSH                   C          tclsh        /usr/bin/tclsh
MODULEPATH_ROOT              C                       /data/brussel/100/vsc10009/software/lmod/lmod-8.7.14/modulefiles
PATH_TO_LUA                  C          lua          /usr/bin/lua


Where Set -> D: default, E: environment, C: configuration
             lmod_cfg: lmod_config.lua SitePkg: SitePackage StdPkg: StandardPackage
             Other: Set somewhere outside of normal locations
@rtmclay
Copy link
Member

rtmclay commented Nov 20, 2022

I just ran the following with Lmod 8.7.14

$ export LMOD_CACHED_LOADS=yes
$ module load gmt
$ echo $?                           
0

So this is not a general problem with using LMOD_CACHED_LOADS=yes.

Please follow the instructions included with the bug_report template to provide a working test case that shows the issue. Please use a small module tree.

@rtmclay
Copy link
Member

rtmclay commented Nov 30, 2022

Have you a test case for this issue or can I close this issue?

@smoors
Copy link
Author

smoors commented Dec 1, 2022

thanks for your answer. I didn't find time to create a test case yet, will try to do it this week.

@smoors
Copy link
Author

smoors commented Dec 3, 2022

I traced this down to a caching error on a modulefile that modifies the MODULEPATH:

prepend_path("MODULEPATH", pathJoin(os.getenv("YALES2_HOME"), "modules"))

I guess that makes sense. is there a way around this that does not cause this error?

/usr/bin/lua: /usr/share/lmod/lmod/libexec/Spider.lua:567: stack overflow
stack traceback:
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        ...
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:582: in function 'l_build_keepT'
        /usr/share/lmod/lmod/libexec/Spider.lua:598: in function 'buildDbT'
        /usr/share/lmod/lmod/libexec/Cache.lua:630: in function 'build'
        /usr/share/lmod/lmod/libexec/spider:461: in function 'main'
        /usr/share/lmod/lmod/libexec/spider:842: in main chunk
        [C]: ?

@rtmclay
Copy link
Member

rtmclay commented Dec 3, 2022

In general, there is probably no way to prevent all such errors. The spider cache is designed to walk all changes to $MODULEPATH.

It is possible that endless loop might be able to be detected. If you can give me a test case that shows this failure, I'll take a look at it.

@smoors
Copy link
Author

smoors commented Dec 4, 2022

in trying to create a minimal example, I discovered that it's actually not the MODULEPATH change itself that causes the failure, but using a environment variable YALES2_HOME that was set in the same module file:

setenv("YALES2_HOME", pathJoin(os.getenv("VSC_SCRATCH"), "yales2"))
prepend_path("MODULEPATH", pathJoin(os.getenv("YALES2_HOME"), "modules"))

of course, this can be trivially fixed:

yales2_home = pathJoin(os.getenv("VSC_SCRATCH"), "yales2")
setenv("YALES2_HOME", yales2_home)
prepend_path("MODULEPATH", pathJoin(yales2_home, "modules"))

thanks a lot for your help!

@wpoely86
Copy link
Contributor

wpoely86 commented Dec 5, 2022

@rtmclay We're a bit puzzled by this. I though Lmod only pushed changes to the environment at the end of a load and so something like:

setenv("YALES2_HOME", pathJoin(os.getenv("VSC_SCRATCH"), "yales2"))
prepend_path("MODULEPATH", pathJoin(os.getenv("YALES2_HOME"), "modules"))

should not work. But it does work (without cached loads). What am I missing?

@rtmclay
Copy link
Member

rtmclay commented Dec 6, 2022

When Lmod loads a module, any setenv() function pushes the value in the current environment so that your setenv() followed by a prepend_path works. This feature existed in Tmod for a long time, so I reproduced it in Lmod.

However, when Lmod is "loading" a module when performing a spider cache build any setenv() command is currently ignored. That is why your use of a local lua variable works in both a regular load and a spider load. I will change Lmod so that you can use setenv() followed by a prepend_path() like in your example w/o requiring a local lua variable.

But it is a little complicated because I'll have to restore the original environment after each module is evaluated. (AKA` loaded) in spider mode.

I'll update you when this fix is available

@wpoely86
Copy link
Contributor

wpoely86 commented Dec 6, 2022

It's not a major issue to use a local lua variable but I thought this was the only way. This trick only works for setenv?

@rtmclay
Copy link
Member

rtmclay commented Dec 7, 2022

It works for both setenv and pushenv

@smoors
Copy link
Author

smoors commented Dec 8, 2022

so, there are 2 issues:

to solve the second problem, I found the following workaround:

execute {cmd="ml use $VSC_SCRATCH/yales2/modules",modeA={"load"}}
execute {cmd="ml unuse $VSC_SCRATCH/yales2/modules",modeA={"unload"}}

@smoors smoors closed this as completed Dec 8, 2022
@smoors smoors reopened this Dec 8, 2022
@wpoely86
Copy link
Contributor

wpoely86 commented Dec 8, 2022

@rtmclay Is there a more elegant way to get this done? The execute statements I mean.

@rtmclay
Copy link
Member

rtmclay commented Dec 11, 2022

The simple checks of:

  1. the directory string starts with a /
  2. the directory exists and is readable by the user

The only failure that is left is that the you'll get a stack overflow because of an infinite loop. You ought to use "$LMOD_DIR/check_module_tree_syntax" instead when ever you update the module tree instead of this execute{}.

@smoors
Copy link
Author

smoors commented Dec 19, 2022

the following works too (and is probably less hacky):

if ( mode() ~= "spider" ) then
    prepend_path("MODULEPATH", pathJoin(yales2_home, "modules"))
end

rtmclay pushed a commit that referenced this issue Dec 19, 2022
rtmclay pushed a commit that referenced this issue Dec 19, 2022
rtmclay pushed a commit that referenced this issue Dec 19, 2022
rtmclay pushed a commit that referenced this issue Dec 19, 2022
rtmclay pushed a commit that referenced this issue Dec 19, 2022
rtmclay pushed a commit that referenced this issue Dec 19, 2022
@rtmclay
Copy link
Member

rtmclay commented Dec 19, 2022

You only want to do that if you don't want modules in $YALES2_HOME/modules directory to not be spider-able

I have modified Lmod so that setenv() (and pushenv() ) to set variables in the local environment just like the way that normal loads do.

Please test Lmod 8.7.15 when you get the chance.

@smoors
Copy link
Author

smoors commented Dec 27, 2022

I tested some more.
with both Lmod 8.7.15 and 8.7.14, it works unless there's also a setenv that uses os.getenv in the same module file.

for example, the following 2 lines cannot be in the same file:

setenv("Y2_PYTHON_VERSION", os.getenv("EBVERSIONPYTHON"))
prepend_path("MODULEPATH", pathJoin(yales2_home, "modules"))

rtmclay pushed a commit that referenced this issue Dec 27, 2022
rtmclay pushed a commit that referenced this issue Dec 27, 2022
@rtmclay
Copy link
Member

rtmclay commented Dec 27, 2022

I am not able to see any errors or other issues when I add another setenv("name",os.getenv("NAME2")) in a module. I have added a new module file in rt/spider/mf4/Core/S/1.0.lua which has:

local yales2_home = "/unknown/a/b/c"
setenv("Y2_PYTHON_VERSION", os.getenv("EBVERSIONPYTHON"))
prepend_path("MODULEPATH", pathJoin(yales2_home, "modules"))

and EBVERSIONPYTHON is set to the string "3.7" in the environment. Please provide a bugReport example that shows the issue.

@smoors
Copy link
Author

smoors commented Feb 20, 2023

I tested this again and now the issue is gone. must have been an error in the module tree.
thanks a lot for helping out!

@smoors smoors closed this as completed Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants