Skip to content

[EMCAL-630] Force usage of references in range-based iteration over e…#7265

Merged
shahor02 merged 2 commits intoAliceO2Group:devfrom
mfasDa:EMCAL-630
Oct 13, 2021
Merged

[EMCAL-630] Force usage of references in range-based iteration over e…#7265
shahor02 merged 2 commits intoAliceO2Group:devfrom
mfasDa:EMCAL-630

Conversation

@mfasDa
Copy link
Copy Markdown
Collaborator

@mfasDa mfasDa commented Oct 11, 2021

…vents / cells

Explicitly force references preventing implicit use
of

  • move semanticsc
  • copy
    in range-based loop over events and cells.

@mfasDa
Copy link
Copy Markdown
Collaborator Author

mfasDa commented Oct 11, 2021

As discussed in #7217 in most of the cases, except if the low gain digitizer is below its lower limit, cells normally have information from both digitizers, and therefore we need to handle the overlap in the corresponding range selecting the more appropriate digitizer (and raise an error if either of the two is missing in a range where it is expected).

Over the past weekend crashes of the raw-to-cell converter have been observed. The crashes happen rarely, but sufficiently enough that after a certain time all EPNs are killed. Unfortunately there are no debug symbols on the EPN, therefore the stack trace is hard to read. The corresponding lines are

|munmap_chunk(): invalid pointer
*** Aborted

Backtrace:
/lib64/libc.so.6(gsignal+0x10f)[0x7fd4331b737f]
/lib64/libc.so.6(abort+0x127)[0x7fd4331a1db5]
/lib64/libc.so.6(+0x7a4e7)[0x7fd4331fa4e7]
/lib64/libc.so.6(+0x815ec)[0x7fd4332015ec]
/lib64/libc.so.6(+0x8189c)[0x7fd43320189c]
/opt/alisw/el8/O2/dataflow-20211009-1/lib/libO2EMCALWorkflow.so(_ZNSt8_Rb_treeIN2o217InteractionRecordESt4pairIKS1_St10shared_ptrISt6vectorINS0_5emcal13reco_workflow22RawToCellConverterSpec11RecCellInfoESaIS9_EEEESt10_Select1stISD_ESt4lessIS1_ESaISD_EE8_M_eraseEPSt13_Rb_tree_nodeISD_E+0x2a3)[0x7fd43b8479e3]
/opt/alisw/el8/O2/dataflow-20211009-1/lib/libO2EMCALWorkflow.so(_ZN2o25emcal13reco_workflow22RawToCellConverterSpec3runERNS_9framework17ProcessingContextE+0x29e5)[0x7fd43b846095]
/opt/alisw/el8/O2/dataflow-20211009-1/lib/libO2Framework.so(_ZN2o29framework20DataProcessingDevice22tryDispatchComputationERNS0_20DataProcessorContextERSt6vectorINS0_11DataRelayer12RecordActionESaIS6_EE+0x746)[0x7fd43ad6a406]
/opt/alisw/el8/O2/dataflow-20211009-1/lib/libO2Framework.so(_ZN2o29framework20DataProcessingDevice5doRunERNS0_20DataProcessorContextE+0x6b)[0x7fd43ad6ab9b]
/opt/alisw/el8/O2/dataflow-20211009-1/lib/libO2Framework.so(_ZN2o29framework20DataProcessingDevice3RunEv+0x28e)[0x7fd43ad6ca4e]
/opt/alisw/el8/FairMQ/v1.4.42-8/lib/libFairMQ.so.1.4(_ZN4fair2mq6Device10RunWrapperEv+0x372)[0x7fd4343030e2]
/opt/alisw/el8/FairMQ/v1.4.42-8/lib/libFairMQStateMachine.so.1.4(_ZN5boost6detail8function26void_function_obj_invoker1ISt8functionIFvN4fair2mq5StateEEEvS6_E6invokeERNS1_15function_bufferES6_+0x1d)[0x7fd433b8909d]
/opt/alisw/el8/FairMQ/v1.4.42-8/lib/libFairMQStateMachine.so.1.4(_ZN5boost8signals26detail11signal_implIFvN4fair2mq5StateEENS0_19optional_last_valueIvEEiSt4lessIiENS_8functionIS6_EENSB_IFvRKNS0_10connectionES5_EEENS0_5mutexEEclES5_+0x309)[0x7fd433b91f89]
/opt/alisw/el8/FairMQ/v1.4.42-8/lib/libFairMQStateMachine.so.1.4(_ZN4fair2mq3fsm8Machine_11ProcessWorkEv+0x4c0)[0x7fd433b92650]
/opt/alisw/el8/FairMQ/v1.4.42-8/lib/libFairMQStateMachine.so.1.4(_ZN4fair2mq12StateMachine11ProcessWorkEv+0x49)[0x7fd433b87969]
/opt/alisw/el8/FairMQ/v1.4.42-8/lib/libFairMQ.so.1.4(_ZN4fair2mq12DeviceRunner3RunEv+0xa97)[0x7fd4343136f7]
/opt/alisw/el8/O2/dataflow-20211009-1/lib/libO2Framework.so(_Z7doChildiPPcRN2o29framework15ServiceRegistryERKNS2_19RunningWorkflowInfoENS2_16RunningDeviceRefENS2_18ProcessingPoliciesERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEP9uv_loop_s+0x231)[0x7fd43ae9a9a1]
/opt/alisw/el8/O2/dataflow-20211009-1/lib/libO2Framework.so(_Z15runStateMachineRKSt6vectorIN2o29framework17DataProcessorSpecESaIS2_EERK12WorkflowInfoRKS_INS1_17DataProcessorInfoESaISA_EERKNS1_11CommandInfoERNS1_13DriverControlERNS1_10DriverInfoERS_INS1_17DeviceMetricsInfoESaISM_EERN5boost15program_options13variables_mapENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0xc0c)[0x7fd43ae9b9fc]
/opt/alisw/el8/O2/dataflow-20211009-1/lib/libO2Framework.so(_Z6doMainiPPcRKSt6vectorIN2o29framework17DataProcessorSpecESaIS4_EERKS1_INS3_26ChannelConfigurationPolicyESaIS9_EERKS1_INS3_16CompletionPolicyESaISE_EERKS1_INS3_14DispatchPolicyESaISJ_EERKS1_INS3_14ResourcePolicyESaISO_EERKS1_INS3_15ConfigParamSpecESaIST_EERNS3_13ConfigContextE+0x3102)[0x7fd43aea53c2]
o2-emcal-reco-workflow[0x40e671]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7fd4331a3493]
o2-emcal-reco-workflow[0x40a67e]

To my understanding this point to the destructor of the std::shared_ptr and the std::vector. The only way I could imagine this to happen is the move semantics applied in a range-based for loop, which should however be raised with the first vector that gets deleted. So by forcing reference I suppress the move semantics. Otherwise I am pretty clueless because the whole part does not mess with (raw) pointers. From the logs it is tricky to figure out, and offline I cannot reproduce the crash.

@shahor02
Copy link
Copy Markdown
Collaborator

@mfasDa if you tell me the working dir of this crashed job, I can try to extract its corefile (though using them on the EPN is not straightforward).

@mfasDa
Copy link
Copy Markdown
Collaborator Author

mfasDa commented Oct 11, 2021

Hi @shahor02 ,

The crash happend in run 503757. These are the EPNs and working dirs

epn207: /tmp/wn_dds/bd684d13-4ec8-4f4a-908b-7e951b891271/wn_online_epn207-ib.internal
epn208: /tmp/wn_dds/bd684d13-4ec8-4f4a-908b-7e951b891271/wn_online_epn208-ib.internal
epn212: /tmp/wn_dds/bd684d13-4ec8-4f4a-908b-7e951b891271/wn_online_epn212-ib.internal
epn213: /tmp/wn_dds/bd684d13-4ec8-4f4a-908b-7e951b891271/wn_online_epn213-ib.internal
epn214: /tmp/wn_dds/bd684d13-4ec8-4f4a-908b-7e951b891271/wn_online_epn214-ib.internal
epn215: /tmp/wn_dds/bd684d13-4ec8-4f4a-908b-7e951b891271/wn_online_epn215-ib.internal
epn216: /tmp/wn_dds/bd684d13-4ec8-4f4a-908b-7e951b891271/wn_online_epn216-ib.internal
epn217: /tmp/wn_dds/bd684d13-4ec8-4f4a-908b-7e951b891271/wn_online_epn217-ib.internal
epn218: /tmp/wn_dds/bd684d13-4ec8-4f4a-908b-7e951b891271/wn_online_epn218-ib.internal

I also tried to find the core dumps but didn't find any in the directory.

Thanks a lot for your help!

…vents / cells

Explicitly force references preventing implicit use
of
- move semanticsc
- copy
in range-based loop over events and cells.

In addition remove std::endl in a few LOG(DEBUG)/LOG(ERROR)
- Fix FEC ID in error message
- Use DDL instead of SM in error message
- Select also time of the raw fit from the selected
  channel
- Reset flags in all cases we find a second channel
  for the same tower
@mfasDa
Copy link
Copy Markdown
Collaborator Author

mfasDa commented Oct 13, 2021

Unit test error unrelated.

@shahor02 shahor02 merged commit fb1ff43 into AliceO2Group:dev Oct 13, 2021
@shahor02
Copy link
Copy Markdown
Collaborator

I had to squash it as rebase was producing conflicts.

@mfasDa
Copy link
Copy Markdown
Collaborator Author

mfasDa commented Oct 13, 2021

Thanks a lot! Squash is fine

@mfasDa mfasDa deleted the EMCAL-630 branch October 13, 2021 09:55
jokonig pushed a commit to jokonig/AliceO2 that referenced this pull request Dec 16, 2021
[EMCAL-630] Force usage of references in range-based iteration over e…
shahor02 pushed a commit that referenced this pull request Jan 11, 2022
…on (#7886)

* Merge pull request #7265 from mfasDa/EMCAL-630

[EMCAL-630] Force usage of references in range-based iteration over e…

* [EMCAL-566] Add executable for performing time and bad cell calibration

- executable to run calibration, oth time and bad channel, in offline
mode with local root histograms. Goal is to validate and debug the
calibration using run2 data.

- modified root to boost histogram conversion function: Specified the
return type, corrected bin numbers for boost histograms

- Added OpenMP option to cmake list. Goal is to parallelize the
time-calibration as many independent fits are performed

- Added implementation of time calibration in EMCALCalibExtractor

Co-authored-by: Markus Fasel <markus.fasel@cern.ch>
Co-authored-by: Joshua Koenig <joshua.konig@cern.ch>
jokonig pushed a commit to jokonig/AliceO2 that referenced this pull request Jan 26, 2022
[EMCAL-630] Force usage of references in range-based iteration over e…
jokonig pushed a commit to jokonig/AliceO2 that referenced this pull request Mar 18, 2022
[EMCAL-630] Force usage of references in range-based iteration over e…
shahor02 pushed a commit that referenced this pull request Mar 24, 2022
* Merge pull request #7265 from mfasDa/EMCAL-630

[EMCAL-630] Force usage of references in range-based iteration over e…

* [EMCAL-566]: Updated EMCal time calibration

- implemented arguments to optimize the time calibration: time window,
minimum number of entries needed for calibration
- Added option to store time distributions and calibration coefficients
in local root file
- Fixed initialization of EMCALCalibExtractor
- added CalibExtractor as shared pointer in CalibratorSpec

Co-authored-by: Markus Fasel <markus.fasel@cern.ch>
jokonig pushed a commit to jokonig/AliceO2 that referenced this pull request Apr 19, 2022
[EMCAL-630] Force usage of references in range-based iteration over e…
jokonig pushed a commit to jokonig/AliceO2 that referenced this pull request May 12, 2022
[EMCAL-630] Force usage of references in range-based iteration over e…
jokonig pushed a commit to jokonig/AliceO2 that referenced this pull request Jun 1, 2022
[EMCAL-630] Force usage of references in range-based iteration over e…
jokonig pushed a commit to jokonig/AliceO2 that referenced this pull request Jun 16, 2022
[EMCAL-630] Force usage of references in range-based iteration over e…
shahor02 pushed a commit that referenced this pull request Jun 17, 2022
* Merge pull request #7265 from mfasDa/EMCAL-630

[EMCAL-630] Force usage of references in range-based iteration over e…

* [EMCAL-566] Added option to profile calibration

- Added switch to obtain the time spend in the run function of the time
and bad channel calib. Needed because very high CPU consumption was
observed in online tests at point2 which cannot be reproduced
offline/locally. If option enabled (via EMCALCalibParams), time in ns is
printed out. Only for debug pruposes
- Added options to set slot lenght and restriction to trigger
calibration only at end of run

Co-authored-by: Markus Fasel <markus.fasel@cern.ch>
martenole pushed a commit to martenole/AliceO2 that referenced this pull request Jun 28, 2022
* Merge pull request AliceO2Group#7265 from mfasDa/EMCAL-630

[EMCAL-630] Force usage of references in range-based iteration over e…

* [EMCAL-566] Added option to profile calibration

- Added switch to obtain the time spend in the run function of the time
and bad channel calib. Needed because very high CPU consumption was
observed in online tests at point2 which cannot be reproduced
offline/locally. If option enabled (via EMCALCalibParams), time in ns is
printed out. Only for debug pruposes
- Added options to set slot lenght and restriction to trigger
calibration only at end of run

Co-authored-by: Markus Fasel <markus.fasel@cern.ch>
jokonig pushed a commit to jokonig/AliceO2 that referenced this pull request Jul 22, 2022
[EMCAL-630] Force usage of references in range-based iteration over e…
jokonig pushed a commit to jokonig/AliceO2 that referenced this pull request Aug 1, 2022
[EMCAL-630] Force usage of references in range-based iteration over e…
fjonasALICE pushed a commit to fjonasALICE/AliceO2 that referenced this pull request Oct 10, 2022
[EMCAL-630] Force usage of references in range-based iteration over e…
fjonasALICE pushed a commit to fjonasALICE/AliceO2 that referenced this pull request Oct 13, 2022
[EMCAL-630] Force usage of references in range-based iteration over e…
mfasDa added a commit to mfasDa/AliceO2 that referenced this pull request Nov 22, 2022
[EMCAL-630] Force usage of references in range-based iteration over e…
mhemmer-cern pushed a commit to mhemmer-cern/AliceO2 that referenced this pull request Mar 10, 2023
[EMCAL-630] Force usage of references in range-based iteration over e…
mfasDa added a commit to mfasDa/AliceO2 that referenced this pull request Mar 22, 2023
[EMCAL-630] Force usage of references in range-based iteration over e…
mhemmer-cern pushed a commit to mhemmer-cern/AliceO2 that referenced this pull request Apr 3, 2023
[EMCAL-630] Force usage of references in range-based iteration over e…
mhemmer-cern pushed a commit to mhemmer-cern/AliceO2 that referenced this pull request Apr 5, 2023
[EMCAL-630] Force usage of references in range-based iteration over e…
mhemmer-cern pushed a commit to mhemmer-cern/AliceO2 that referenced this pull request Apr 21, 2023
[EMCAL-630] Force usage of references in range-based iteration over e…
atriolo pushed a commit to atriolo/AliceO2 that referenced this pull request Oct 17, 2024
[EMCAL-630] Force usage of references in range-based iteration over e…
fchinu pushed a commit to fchinu/AliceO2 that referenced this pull request Sep 19, 2025
[EMCAL-630] Force usage of references in range-based iteration over e…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants