Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python3 libraries not found when running scram scripts using python3 #10765

Closed
khurtado opened this issue Aug 19, 2021 · 2 comments
Closed

Python3 libraries not found when running scram scripts using python3 #10765

khurtado opened this issue Aug 19, 2021 · 2 comments

Comments

@khurtado
Copy link
Contributor

Impact of the bug
Invoking python3 for the scram scripts from a python3 environment doesn't work

Describe the bug
Using the python3 version of wmagent fails at job runtime with the following error:

CRITICAL:root:Failure to run SCRAM setup scripts.
Details:
        Scram Command Diagnostic:
        Command : scramv1
        Architecture: slc7_amd64_gcc900
        Executed: scramv1 project CMSSW CMSSW_12_0_0_pre6
        Exit Status: 3
        Stdout: b''
        Stderr: b'WARNING: In non-interactive mode release checks e.g. deprecated releases, production architectures are disabled.\npython3: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory\n'
WARNING:root:Exit code: 50513 has been already added to the job report

How to reproduce it
Run workflows with py3wmagent using cmssw: CMSSW_12_0_0_pre6

Expected behavior
Workflows should run

Additional context and error message
Add any other context about the problem here, like error message and/or traceback. You might want to use triple back ticks to properly format it.

@khurtado khurtado changed the title Python3 libraries not found for scram scripts when calling Startup.py with python3 Python3 libraries not found when running scram scripts using python3 Aug 19, 2021
@khurtado khurtado self-assigned this Aug 19, 2021
@khurtado
Copy link
Contributor Author

khurtado commented Aug 19, 2021

This is a problem between the python3 COMP environment and the scramv1 environment

Steps to reproduce:

# Source COMP python3:
source /cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/python3/3.8.2-comp/etc/profile.d/init.sh
source /cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/py3-future/0.18.2/etc/profile.d/init.sh
source /cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/py3-setuptools/39.2.0/etc/profile.d/init.sh

# Now, run scramv1, the same way it is done at runtime:
scramv1 project CMSSW CMSSW_12_0_0_pre6
python3: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory

More info: Python3 itself does have the right LD_LIBRARY_PATH environment, so the problem occurs however scramv1 makes the call to it

[khurtado@earth /tmpscratch/users/khurtado/work/py3bug/job/tmp]$ which python3
/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/python3/3.8.2-comp/bin/python3
[khurtado@earth /tmpscratch/users/khurtado/work/py3bug/job/tmp]$ ldd $(which python3)
	linux-vdso.so.1 =>  (0x00007ffd63bf1000)
	libpython3.8.so.1.0 => /cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/python3/3.8.2-comp/lib/libpython3.8.so.1.0 (0x00007fb795979000)
	libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fb795741000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb795521000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fb795319000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007fb795111000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fb794e09000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fb794a39000)
	libfreebl3.so => /lib64/libfreebl3.so (0x00007fb794831000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fb795d11000)
[khurtado@earth /tmpscratch/users/khurtado/work/py3bug/job/tmp]$ echo $LD_LIBRARY_PATH
/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/py3-setuptools/39.2.0/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/py3-future/0.18.2/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/py3-setuptools/39.2.0/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/python3/3.8.2-comp/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/python3/3.8.2-comp/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/db6/6.0.30/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/xz/5.2.2/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/libffi/3.2.1/lib64:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/libuuid/2.22.2/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/gdbm/1.12/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/sqlite/3.8.10.2/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/bz2lib/1.0.6/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/expat/2.0.1/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/zlib/1.2.8/lib:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/gcc/6.3.0/lib64:/cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/gcc/6.3.0/lib:/opt/crc/usr/local/lib:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el7-x86_64/lib64:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el7-x86_64/lib:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el7-x86_64/usr/lib64:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el7-x86_64/usr/lib:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el7-x86_64/usr/lib64/dcap:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el7-x86_64/usr/lib64/lcgdm::/opt/condor/RedHat7/lib/condor
```

@amaltaro
Copy link
Contributor

Shahzad (thanks!) provided a fix and deployed it to cvmfs today:
cms-sw/cmsdist#7222

test workflows (including python3) are now running fine.

Thanks for looking into this, Kenyi.

PS.: I still see a low rate of failulres, which seem to be only happening at T2_CH_CERN_HLT worker nodes. Hopefully it's just a stale cvmfs cache in some nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants