Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROOT and CUDADataFormats/Common/interface/Product.h #37215

Closed
silviodonato opened this issue Mar 11, 2022 · 10 comments · Fixed by #37218
Closed

ROOT and CUDADataFormats/Common/interface/Product.h #37215

silviodonato opened this issue Mar 11, 2022 · 10 comments · Fixed by #37218

Comments

@silviodonato
Copy link
Contributor

Working in studies of the GPU/CPU difference , I spot this error of ROOT with the abs function:

root -l /afs/cern.ch/work/s/sdonato/public/GPU_fluctuation_study/output_HLT_GPU_CPU_GPU2.root -b -q -e "Events->Scan(\"abs\(1\)\")"

the log is here logError.txt .

@dpiparo has identificated the problem with CUDADataFormats/Common/interface/Product.h

[sdonato@lxplus764 src]$ root -b
root [0] #include "/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre6/src/CUDADataFormats/Common/interface/Product.h"
In file included from ROOT_prompt_0:1:
In file included from /cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre6/src/CUDADataFormats/Common/interface/Product.h:6:
In file included from /cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre6/src/CUDADataFormats/Common/interface/ProductBase.h:7:
/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre6/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: remark: could not acquire lock file for module 'cuda': failed to create unique file /cvmfs/cms.cern.ch/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm.lock-fc52a3ea: Read-only file system [-Rmodule-build]
#include <cuda_runtime.h>
         ^
/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre6/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: remark: building module 'cuda' as '/cvmfs/cms.cern.ch/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm' [-Rmodule-build]
error: unable to open output file '/cvmfs/cms.cern.ch/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm': 'Read-only file system'
/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre6/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: remark: finished building module 'cuda' [-Rmodule-build]
/cvmfs/cms.cern.ch/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_0_pre6/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: fatal error: could not build module 'cuda'
#include <cuda_runtime.h>
 ~~~~~~~~^

or

[sdonato@lxplus764 src]$ root -l 
root [0] #include <cuda_runtime.h>
ROOT_prompt_0:1:10: remark: could not acquire lock file for module 'cuda': failed to create unique file /cvmfs/cms.cern.ch/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm.lock-30779e7c: Read-only file system [-Rmodule-build]
#include <cuda_runtime.h>
         ^
ROOT_prompt_0:1:10: remark: building module 'cuda' as '/cvmfs/cms.cern.ch/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm' [-Rmodule-build]
error: unable to open output file '/cvmfs/cms.cern.ch/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm': 'Read-only file system'
ROOT_prompt_0:1:10: remark: finished building module 'cuda' [-Rmodule-build]
ROOT_prompt_0:1:10: fatal error: could not build module 'cuda'
#include <cuda_runtime.h>
 ~~~~~~~~^
root [1] 
@cmsbuild
Copy link
Contributor

A new Issue was created by @silviodonato Silvio Donato.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

Adding @pcanal

@makortel
Copy link
Contributor

CUDADataFormats/Common/interface/Product.h is not intended to be persisted. All dictionaries with it should have been declared as transient. So I'm very confused why or how it ends up in the file.

@makortel
Copy link
Contributor

makortel commented Mar 11, 2022

assign heterogeneous

@cmsbuild
Copy link
Contributor

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

makortel commented Mar 11, 2022

The error log has

Error in <TClass::LoadClassInfo>: no interpreter information for class edm::Wrapper<hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias> > > is available even though it has a TClass initialization routine.
Error in <TClass::LoadClassInfo>: no interpreter information for class edm::Wrapper<hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias> > > is available even though it has a TClass initialization routine.

edmDumpEventContent shows

hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias> >    "hltHbherecoFromGPU"        ""                "HLTGPU"    

which has a dictionary allowing persistency

<class name="edm::Wrapper<hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias>>>"/>

The cms::cuda::Product is nevertheless not used there. Maybe the problem is that the same classes_def.xml declares transient products using the cms::cuda::Product, e.g.

<class name="edm::Wrapper<cms::cuda::Product<hcal::RecHitCollection<calo::common::ViewStoragePolicy>>>" persistent="false" />

and a persistent product leading to load of the CUDADataFormats/HcalRecHitSoA library ends up causing header parsing that fails (or something)?

Anyway, the dictionaries declared in CUDADataFormats should all be transient, and if anything there should really be persisted, that needs to go through DataFormats without any dependence on CUDA. Is this hltHbherecoFromGPU intended to be persisted, or was it just an accident?

For use in CMSSW dropping the product on input along (not tested)

process.source.inputCommands = cms.untracked.vstring("keep *", "drop *_hltHbherecoFromGPU_*_*") 

might work around the problem.

@fwyzard
Copy link
Contributor

fwyzard commented Mar 11, 2022

One more thing to clean up before/during/after the migration to Alpaka :-/

@makortel
Copy link
Contributor

I made a PR #37218 that makes all the dictionaries in CUDADataFormats transient that weren't already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants