Skip to content

ITS: Implement early return for missing cluster data#15300

Open
f3sch wants to merge 2 commits intodevfrom
f3sch-patch-2
Open

ITS: Implement early return for missing cluster data#15300
f3sch wants to merge 2 commits intodevfrom
f3sch-patch-2

Conversation

@f3sch
Copy link
Copy Markdown
Collaborator

@f3sch f3sch commented Apr 21, 2026

@ehellbar this fixes the crash you observe running:

o2-ctf-reader-workflow --max-tf 10 $GLOSET --ans-version 1.0 --ctf-dict none --delay 0 --loop 0 --ctf-input o2_ctf_run00554701_orbit0029026560_tf0000000001_epn315.root --onlyDet ITS,TPC --pipeline tpc-entropy-decoder:1  --configKeyValues "keyval.input_dir=$PWD;keyval.output_dir=/dev/null;;" | \
o2-gpu-reco-workflow $GLOSET --gpu-reconstruction "--severity info" --input-type=compressed-clusters-flat --disable-mc --output-type tracks,clusters,send-clusters-per-sector  --disable-ctp-lumi-request --pipeline gpu-reconstruction:${TPC_PIPELINES:-1} --configKeyValues "keyval.input_dir=$PWD;keyval.output_dir=/dev/null;;GPU_global.deviceType=CPU;GPU_proc.debugLevel=0;GPU_proc.tpcInputWithClusterRejection=1;GPU_proc.ompThreads=-1;GPU_proc.deviceNum=-2;" | \
o2-its-reco-workflow $GLOSET --trackerCA  --tracking-mode async --disable-mc --clusters-from-upstream  --pipeline its-tracker:${ITS_PIPELINES:-1},its-clusterer:${ITS_PIPELINES:-1}  --configKeyValues "keyval.input_dir=$PWD;keyval.output_dir=/dev/null;;ITSVertexerParam.phiCut=0.5;ITSVertexerParam.phiCut=0.2;ITSVertexerParam.clusterContributorsCut=3;ITSVertexerParam.tanLambdaCut=0.2;;;;ITSClustererParam.maxBCDiffToMaskBias=-1;MFTClustererParam.maxBCDiffToMaskBias=-1" | \
o2-dpl-run $GLOSET  --run -b | tee out_ITS-pipes${ITS_PIPELINES:-1}_ccdb-fetchers${NCCDB}_TPC${TPC_PIPELINES:-1}.log

The underlying reason why it crashes is that since the staggering PR the ITS tracking code requires that upstream producers at least provide the correct assumed time-structure, e.g., ROFs. In the this specific case we take the clusters directly from the CTFs, e.g., skipping re-clusterization where we make sure that ROFs are correct. If there is no ITS data recorded in the TF then the ROF vector is empty and you trigger this exception:

[40876:its-tracker]: [13:00:24][FATAL] Received inconsistent number of rofs on layer:-1 expected:576 received:0

Now if there is no data to consume be-it no clusters / no rofs the processing is entirely skipped. One could also have made the ctf-reader ensure 'correct' ROF output but I think it is better this way, maybe...

@ehellbar  this fixes the crash you observe running:
```
o2-ctf-reader-workflow --max-tf 10 $GLOSET --ans-version 1.0 --ctf-dict none --delay 0 --loop 0 --ctf-input o2_ctf_run00554701_orbit0029026560_tf0000000001_epn315.root --onlyDet ITS,TPC --pipeline tpc-entropy-decoder:1  --configKeyValues "keyval.input_dir=$PWD;keyval.output_dir=/dev/null;;" | \
o2-gpu-reco-workflow $GLOSET --gpu-reconstruction "--severity info" --input-type=compressed-clusters-flat --disable-mc --output-type tracks,clusters,send-clusters-per-sector  --disable-ctp-lumi-request --pipeline gpu-reconstruction:${TPC_PIPELINES:-1} --configKeyValues "keyval.input_dir=$PWD;keyval.output_dir=/dev/null;;GPU_global.deviceType=CPU;GPU_proc.debugLevel=0;GPU_proc.tpcInputWithClusterRejection=1;GPU_proc.ompThreads=-1;GPU_proc.deviceNum=-2;" | \
o2-its-reco-workflow $GLOSET --trackerCA  --tracking-mode async --disable-mc --clusters-from-upstream  --pipeline its-tracker:${ITS_PIPELINES:-1},its-clusterer:${ITS_PIPELINES:-1}  --configKeyValues "keyval.input_dir=$PWD;keyval.output_dir=/dev/null;;ITSVertexerParam.phiCut=0.5;ITSVertexerParam.phiCut=0.2;ITSVertexerParam.clusterContributorsCut=3;ITSVertexerParam.tanLambdaCut=0.2;;;;ITSClustererParam.maxBCDiffToMaskBias=-1;MFTClustererParam.maxBCDiffToMaskBias=-1" | \
o2-dpl-run $GLOSET  --run -b | tee out_ITS-pipes${ITS_PIPELINES:-1}_ccdb-fetchers${NCCDB}_TPC${TPC_PIPELINES:-1}.log
```

The underlying reason why it crashes is that since the staggering PR the ITS tracking code requires that upstream producers at least provide the correct assumed time-structure, e.g., ROFs. In the this specific case we take the clusters directly from the CTFs, e.g., skipping re-clusterization where we make sure that ROFs are correct. If there is no ITS data recorded in the TF then the ROF vector is empty hand you trigger this exception:
```
[40876:its-tracker]: [13:00:24][FATAL] Received inconsistent number of rofs on layer:-1 expected:576 received:0
```

Now if there is no data to consume be-it no clusters / no rofs the processing is entirely skipped. One could also have made the ctf-reader ensure 'correct' ROF output but I think it is better this way, maybe...
@f3sch f3sch changed the title Implement early return for missing cluster data ITS: Implement early return for missing cluster data Apr 21, 2026
@alibuild
Copy link
Copy Markdown
Collaborator

Error while checking build/O2/fullCI_slc9 for 092403a at 2026-04-21 19:39:

++ [[ /sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18 != '' ]]
+++ /sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/bin/geant4-config --datasets
+++ sed 's/[^ ]* //'
+++ sed 's/G4/export G4/'
+++ sed 's/DATA /DATA=/'
++ export G4NEUTRONHPDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4NDL4.7 export G4LEDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4EMLOW8.5 export G4LEVELGAMMADATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/PhotonEvaporation5.7 export G4RADIOACTIVEDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/RadioactiveDecay5.6 export G4PARTICLEXSDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4PARTICLEXS4.0 export G4PIIDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4PII1.3 export G4REALSURFACEDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/RealSurface2.2 export G4SAIDXSDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4SAIDDATA2.0 export G4ABLADATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4ABLA3.3 export G4INCLDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4INCL1.2 export G4ENSDFSTATEDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4ENSDFSTATE2.3
++ G4NEUTRONHPDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4NDL4.7
++ G4LEDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4EMLOW8.5
++ G4LEVELGAMMADATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/PhotonEvaporation5.7
++ G4RADIOACTIVEDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/RadioactiveDecay5.6
++ G4PARTICLEXSDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4PARTICLEXS4.0
++ G4PIIDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4PII1.3
++ G4REALSURFACEDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/RealSurface2.2
++ G4SAIDXSDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4SAIDDATA2.0
++ G4ABLADATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4ABLA3.3
++ G4INCLDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4INCL1.2
++ G4ENSDFSTATEDATA=/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/share/Geant4/data/G4ENSDFSTATE2.3
++ rm -Rf /sw/BUILD/c929fbf0cb4fa09d00091dc8e0311273b21d3fbb/O2-RTC-test/rtc-test
++ mkdir /sw/BUILD/c929fbf0cb4fa09d00091dc8e0311273b21d3fbb/O2-RTC-test/rtc-test
++ pushd /sw/BUILD/c929fbf0cb4fa09d00091dc8e0311273b21d3fbb/O2-RTC-test/rtc-test
/sw/BUILD/c929fbf0cb4fa09d00091dc8e0311273b21d3fbb/O2-RTC-test/rtc-test /sw/BUILD/c929fbf0cb4fa09d00091dc8e0311273b21d3fbb/O2-RTC-test
++ type /sw/slc9_x86-64/O2/slc9_x86-64-slc9_x86-64-local14/lib/libO2GPUTrackingCUDA.so
/sw/slc9_x86-64/O2/slc9_x86-64-slc9_x86-64-local14/lib/libO2GPUTrackingCUDA.so is /sw/slc9_x86-64/O2/slc9_x86-64-slc9_x86-64-local14/lib/libO2GPUTrackingCUDA.so
++ type /sw/slc9_x86-64/O2/slc9_x86-64-slc9_x86-64-local14/lib/libO2GPUTrackingHIP.so
/sw/slc9_x86-64/O2/slc9_x86-64-slc9_x86-64-local14/lib/libO2GPUTrackingHIP.so is /sw/slc9_x86-64/O2/slc9_x86-64-slc9_x86-64-local14/lib/libO2GPUTrackingHIP.so
++ type /sw/slc9_x86-64/O2/slc9_x86-64-slc9_x86-64-local14/lib/libO2GPUTrackingOCL.so
/sw/slc9_x86-64/O2/slc9_x86-64-slc9_x86-64-local14/lib/libO2GPUTrackingOCL.so is /sw/slc9_x86-64/O2/slc9_x86-64-slc9_x86-64-local14/lib/libO2GPUTrackingOCL.so
+++ find /usr/local/cuda /usr/local/cuda-13 /usr/local/cuda-13.0 /usr/local/cuda-13.1 -type d -name stubs -prune -false -o '(' -type f -o -type l ')' -name libcuda.so -printf :%h -quit
++ LD_LIBRARY_PATH=/sw/slc9_x86-64/O2/slc9_x86-64-slc9_x86-64-local14/lib:/sw/slc9_x86-64/O2-customization/v1.0.0-7/lib:/sw/slc9_x86-64/googlebenchmark/1.9.5-2/lib:/sw/slc9_x86-64/GBL/V03-01-04-10/lib:/sw/slc9_x86-64/bookkeeping-api/v1.9.2-19/lib:/sw/slc9_x86-64/grpc/v1.71.0-20/lib:/sw/slc9_x86-64/c-ares/1.18.1-31/lib:/sw/slc9_x86-64/MLModels/20220530-13/lib:/sw/slc9_x86-64/ONNXRuntime/v1.22.0-81/lib:/sw/slc9_x86-64/pytorch_cpuinfo/alice1-17/lib:/sw/slc9_x86-64/safe_int/v3.0.28a-1/lib:/sw/slc9_x86-64/date/v3.0.3-16/lib:/sw/slc9_x86-64/gpu-system/cuda_13.1.115_arch@80_real#86_real#89_real#120_real#75_virtual@_home_F52XG4RPNRXWGYLMF5RXKZDBBI000000-rocm_6.3.42134_arch@gfx906#gfx908@_home_F5XXA5BPOJXWG3IK-opencl-miopen-migraphx-cudnn-tensorrt-2/lib:/sw/slc9_x86-64/onnx/v1.17.0-alice2-29/lib:/sw/slc9_x86-64/Eigen3/3.4.0-onnx1-16/lib:/sw/slc9_x86-64/VecGeom/v1.2.6-37/lib:/sw/slc9_x86-64/libjalienO2/0.2.3-9/lib:/sw/slc9_x86-64/fastjet/v3.4.1_1.052-alice3-18/lib:/sw/slc9_x86-64/cgal/6.1.1-6/lib:/sw/slc9_x86-64/MPFR/v3.1.3-25/lib:/sw/slc9_x86-64/GMP/v6.2.1-17/lib:/sw/slc9_x86-64/JAliEn-ROOT/0.7.17-1/lib:/sw/slc9_x86-64/Alice-GRID-Utils/0.0.7-4/lib:/sw/slc9_x86-64/json-c/v0.18.0-5/lib:/sw/slc9_x86-64/libwebsockets/v4.3.4-1/lib:/sw/slc9_x86-64/xjalienfs/1.7.0-14/lib:/sw/slc9_x86-64/DebugGUI/v0.8.0-41/lib:/sw/slc9_x86-64/libuv/v1.52.0-5/lib:/sw/slc9_x86-64/GLFW/3.4-5/lib:/sw/slc9_x86-64/MCStepLogger/v0.6.2-1/lib:/sw/slc9_x86-64/FairMQ/v1.10.1-13/lib:/sw/slc9_x86-64/FairCMakeModules/v1.0.0-36/lib:/sw/slc9_x86-64/ZeroMQ/v4.3.5-36/lib:/sw/slc9_x86-64/ms_gsl/4.2.1-12/lib:/sw/slc9_x86-64/Monitoring/v3.19.12-1/lib:/sw/slc9_x86-64/Configuration/v2.8.0-66/lib:/sw/slc9_x86-64/Ppconsul/v0.2.3-alice3-15/lib:/sw/slc9_x86-64/Common-O2/v1.6.4-13/lib:/sw/slc9_x86-64/libInfoLogger/v2.10.1-5/lib:/sw/slc9_x86-64/HepMC3/3.3.1-23/lib:/sw/slc9_x86-64/FairRoot/v18.4.9-alice3-150/lib:/sw/slc9_x86-64/FairLogger/v2.3.1-12/lib:/sw/slc9_x86-64/fmt/11.1.2-21/lib:/sw/slc9_x86-64/simulation/v1.0-133/lib:/sw/slc9_x86-64/GEANT3/v4-5-21/lib:/sw/slc9_x86-64/GEANT3/v4-5-21/lib64:/sw/slc9_x86-64/GEANT4_VMC/v6-6-update1-p3-42/lib:/sw/slc9_x86-64/vgm/v5-3-106/lib:/sw/slc9_x86-64/GEANT4/v11.2.0-alice1-18/lib:/sw/slc9_x86-64/xercesc/Xerces-C_3_2_5-27/lib:/sw/slc9_x86-64/VMC/v2-1-23/lib:/sw/slc9_x86-64/ROOT/v6-36-04-alice9-25/lib:/sw/slc9_x86-64/nlohmann_json/v3.11.3-19/lib:/sw/slc9_x86-64/Vc/1.4.5-19/lib:/sw/slc9_x86-64/FFTW3/v3.3.9-38/lib:/sw/slc9_x86-64/TBB/v2022.3.0-11/lib:/sw/slc9_x86-64/XRootD/v5.8.4-17/lib:/sw/slc9_x86-64/GSL/v2.8-8/lib:/sw/slc9_x86-64/generators/v1.0-70/lib:/sw/slc9_x86-64/pythia6/428-alice4-11/lib:/sw/slc9_x86-64/ninja-fortran/fortran-v1.11.1.g9-10/lib:/sw/slc9_x86-64/pythia/v8315-alice1-21/lib:/sw/slc9_x86-64/HepMC/HEPMC_02_06_10-28/lib:/sw/slc9_x86-64/lhapdf/v6.5.2-45/lib:/sw/slc9_x86-64/arrow/v20.0.0-alice1-33/lib:/sw/slc9_x86-64/re2/2024-07-02-17/lib:/sw/slc9_x86-64/double-conversion/v3.4.0-5/lib:/sw/slc9_x86-64/RapidJSON/v1.1.0-alice2-37/lib:/sw/slc9_x86-64/flatbuffers/v24.3.25-24/lib:/sw/slc9_x86-64/xsimd/14.0.0-14/lib:/sw/slc9_x86-64/utf8proc/v2.11.2-5/lib:/sw/slc9_x86-64/protobuf/v29.3-20/lib:/sw/slc9_x86-64/Clang/v20.1.7-20/lib:/sw/slc9_x86-64/lz4/v1.10.0-5/lib:/sw/slc9_x86-64/boost/v1.90.0-alice1-2/lib:/sw/slc9_x86-64/bz2/1.0.8-23/lib:/sw/slc9_x86-64/lzma/v5.2.3-14/lib:/sw/slc9_x86-64/Python-modules/1.0-76/lib:/sw/slc9_x86-64/Python-modules-list/1.0-40/lib:/sw/slc9_x86-64/hdf5/1.14.6-15/lib:/sw/slc9_x86-64/Python/v3.10.19-13/lib:/sw/slc9_x86-64/libffi/v3.2.1-alice1-10/lib:/sw/slc9_x86-64/libffi/v3.2.1-alice1-10/lib64:/sw/slc9_x86-64/sqlite/v3.47.2-10/lib:/sw/slc9_x86-64/libpng/v1.6.47-16/lib:/sw/slc9_x86-64/FreeType/v2.10.1-25/lib:/sw/slc9_x86-64/AliEn-Runtime/v2-19-le-25/lib:/sw/slc9_x86-64/UUID/v2.27.1-15/lib:/sw/slc9_x86-64/AliEn-CAs/v1-14/lib:/sw/slc9_x86-64/libxml2/v2.9.3-23/lib:/sw/slc9_x86-64/abseil/20240722.0-17/lib:/sw/slc9_x86-64/ninja/fortran-v1.11.1.g9-25/lib:/sw/slc9_x86-64/CMake/v4.1.4-2/lib:/sw/slc9_x86-64/curl/7.70.0-24/lib:/sw/slc9_x86-64/OpenSSL/v1.1.1m-15/lib:/sw/slc9_x86-64/zlib/v1.3.1-6/lib:/sw/slc9_x86-64/alibuild-recipe-tools/v0.3.0-1/lib:/sw/slc9_x86-64/GCC-Toolchain/v14.2.0-alice2-1/lib:/sw/slc9_x86-64/GCC-Toolchain/v14.2.0-alice2-1/lib64:/sw/slc9_x86-64/defaults-release/v1-8/lib:/usr/local/cuda-13.0/compat
++ o2-gpu-standalone-benchmark --noEvents -g --gpuType CUDA --RTCenable 1 --RTCcacheOutput 0 --RTCoptConstexpr 1 --RTCcompilePerKernel 1 --RTCTECHrunTest 2
GPU processing enabled
[INFO] GPU Tracker library loaded and GPU tracker object created sucessfully
[INFO] Created GPUReconstruction instance for device type CUDA (2)
Using default event settings, no event dir loaded (solenoidBz: -5.006680, constBz 0, maxTimeBin -2)
Standalone Test Framework for CA Tracker - Using GPU
[INFO] Starting CUDA RTC Compilation
[INFO] RTC Compilation finished (359.985958 seconds)
++ o2-gpu-standalone-benchmark --noEvents -g --gpuType HIP --RTCenable 1 --RTCcacheOutput 0 --RTCoptConstexpr 1 --RTCcompilePerKernel 1 --RTCTECHrunTest 2
GPU processing enabled
[INFO] GPU Tracker library loaded and GPU tracker object created sucessfully
[INFO] Created GPUReconstruction instance for device type HIP (3)
Using default event settings, no event dir loaded (solenoidBz: -5.006680, constBz 0, maxTimeBin -2)
Standalone Test Framework for CA Tracker - Using GPU
[INFO] Starting HIP RTC Compilation

Full log here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants