Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open issues regarding the ECAL local reconstruction on GPU #32480

Open
2 of 3 tasks
fwyzard opened this issue Dec 14, 2020 · 34 comments
Open
2 of 3 tasks

Open issues regarding the ECAL local reconstruction on GPU #32480

fwyzard opened this issue Dec 14, 2020 · 34 comments

Comments

@fwyzard
Copy link
Contributor

fwyzard commented Dec 14, 2020

Open issues regarding the ECAL local reconstruction on GPU

  • Update the ECAL ESProducts and move them to a more correct place
  • Migrate GPU code to use common constants and functions
  • Add references for magic numbers
@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2020

Update the ECAL ESProducts and move them to a more correct place

Make the equivalent changes regarding to the conditions that were done for HCAL in #32039.

Move the conditions types used on the GPU from

  • EventFilter/EcalRawToDigi/interface/ElectronicsMappingGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalGainRatiosGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalIntercalibConstantsGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAPDPNRatiosGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAPDPNRatiosRefGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAlphasGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalLinearCorrectionsGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalPedestalsGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalPulseCovariancesGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalPulseShapesGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalRecHitParametersGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalRechitADCToGeVConstantGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalRechitChannelStatusGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalSamplesCorrelationGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalTimeBiasCorrectionsGPU.h
  • RecoLocalCalo/EcalRecAlgos/interface/EcalTimeCalibConstantsGPU.h

and the corresponding .cc files to CondFormats/EcalObjects/

Update them to use

  • edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> instead of float *
  • cms::cuda::make_device_unique<float[]> instead of cudaMalloc
  • cms::cuda::copyAsync instead of cudaMemcpyAsync

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2020

Migrate GPU code to use common constants and functions

The functions in RecoLocalCalo/EcalRecProducers/plugins/Common.h are only used by the GPU implementation.

They could be reused from DataFormats/EcalDigi/interface/EcalMGPASample.h.

@cmsbuild
Copy link
Contributor

A new Issue was created by @fwyzard Andrea Bocci.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2020

assign reconstruction

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2020

assign heterogeneous

@cmsbuild
Copy link
Contributor

New categories assigned: heterogeneous,reconstruction

@slava77,@perrotta,@makortel,@jpata,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2020

@thomreis FYI

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 25, 2020

Add references for magic numbers

RecoLocalCalo/EcalRecProducers/plugins/EcalMultifitParametersGPUESProducer.cc uses variois hard-coded numbers without a clear explanation of their origin.
They appear to be reused in the same way in

RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitWorkerMultiFit.cc
RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitWorkerRatio.cc
RecoLocalCalo/EcalRecProducers/plugins/EcalUncalibRecHitWorkerGlobal.cc

The original CPU code should be updated (for example, moving those numbers to a central place and adding comments to explain their source) and the GPU code should then be updated accordingly.

@jpata
Copy link
Contributor

jpata commented Jan 11, 2021

Taking note of a few more specific todos:

  • EventFilter/EcalRawToDigi/plugins/EcalRawToDigiGPU.cc: esConsumes and constants
  • RecoLocalCalo/EcalRecProducers/plugins/EcalRecHitBuilderKernels.cu: very long function, many short variable names with a long lifetime (t_i, t_f, t1, t2 etc), several unaddressed TODO comments in the code like "how the heck is this possible", "Please clean up the code, ... also the original one!" etc
  • RecoLocalCalo/EcalRecProducers/plugins/EcalRecHitProducerGPU.cc: might not have to be edm::ExternalWork, https://github.com/cms-sw/cmssw/pull/31719/files#r511164561
  • RecoLocalCalo/EcalRecProducers/plugins/KernelHelpers.cu: some hardcoded constants, comments like "why on hell things are so complex and not simple" to be removed
  • RecoLocalCalo/EcalRecProducers/plugins/TimeComputationKernels.cu: hardcoded constants
  • RecoLocalCalo/EcalRecProducers/python/ecalRecHitGPU_cfi.py: TODOs in the code with unclear status
  • CPU to GPU workflow comparisons do not produce equivalent results, Patatrack integration - ECAL local reconstruction (7/N) #31719 (comment)

@thomreis
Copy link
Contributor

thomreis commented Mar 2, 2021

Update the ECAL ESProducts and move them to a more correct place

Make the equivalent changes regarding to the conditions that were done for HCAL in #32039.

Move the conditions types used on the GPU from

* `EventFilter/EcalRawToDigi/interface/ElectronicsMappingGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalGainRatiosGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalIntercalibConstantsGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAPDPNRatiosGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAPDPNRatiosRefGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalLaserAlphasGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalLinearCorrectionsGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalPedestalsGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalPulseCovariancesGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalPulseShapesGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalRecHitParametersGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalRechitADCToGeVConstantGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalRechitChannelStatusGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalSamplesCorrelationGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalTimeBiasCorrectionsGPU.h`

* `RecoLocalCalo/EcalRecAlgos/interface/EcalTimeCalibConstantsGPU.h`

and the corresponding .cc files to CondFormats/EcalObjects/

Update them to use

* `edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>>` instead of `float *`

* `cms::cuda::make_device_unique<float[]>` instead of `cudaMalloc`

* `cms::cuda::copyAsync` instead of `cudaMemcpyAsync`

Hi @fwyzard is the change of the types and to the cms::cuda functions supposed to work for all occurrences? For some (like the ones in EcalGainRatiosGPU.h) this compiles fine but for others (e.g. in EcalMultifitParametersGPU.h) I get compilation errors like this one:

CMSSW_11_3_0_pre3/src/CondFormats/EcalObjects/src/EcalMultifitParametersGPU.cc:27:83: error: cannot convert 'cms::cuda::device::impl::make_device_unique_selector<double []>::unbounded_array' {aka 'std::unique_ptr<double [], cms::cuda::device::impl::DeviceDeleter>'} to 'double*' in assignment
   27 |         product.amplitudeFitParametersEB = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEB_.size(), cudaStream);
      |                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                                   |
      |                                                                                   cms::cuda::device::impl::make_device_unique_selector<double []>::unbounded_array {aka std::unique_ptr<double [], cms::cuda::device::impl::DeviceDeleter>}

and also this:

CMSSW_11_3_0_pre3/src/CondFormats/EcalObjects/src/EcalMultifitParametersGPU.cc:33:101: error: no matching function for call to 'copyAsync(double*&, const std::vector<double, cms::cuda::HostAllocator<double> >&, CUstream_st*&)'
   33 |         cms::cuda::copyAsync(product.amplitudeFitParametersEE, amplitudeFitParametersEE_, cudaStream);

@makortel
Copy link
Contributor

makortel commented Mar 2, 2021

Is product.amplitudeFitParametersEB perhaps double*? In that case it should be changed to cms::cuda::device::unique_ptr<double[]>.

@thomreis
Copy link
Contributor

thomreis commented Mar 2, 2021

No it is edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEB;. I have changed it from double*.

@makortel
Copy link
Contributor

makortel commented Mar 3, 2021

The second error also suggests product.amplitudeFitParametersEE to be a double *. Could you perhaps point to the code?

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 3, 2021

I think the current code is this:

  struct Product {
    ~Product();
    double *amplitudeFitParametersEB, *amplitudeFitParametersEE, *timeFitParametersEB, *timeFitParametersEE;
  };

and (just a fragment):

        // malloc
        cudaCheck(cudaMalloc((void**)&product.amplitudeFitParametersEB,
                             this->amplitudeFitParametersEB_.size() * sizeof(double)));

        // transfer
        cudaCheck(cudaMemcpyAsync(product.amplitudeFitParametersEB,
                                  this->amplitudeFitParametersEB_.data(),
                                  this->amplitudeFitParametersEB_.size() * sizeof(double),
                                  cudaMemcpyHostToDevice,
                                  cudaStream));

I think it should become something like

  struct Product {
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEB;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEE;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEB;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEE;
  };

and:

        // malloc
        amplitudeFitParametersEB = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEB_.size(), stream);

        // transfer
        cms::cuda::copyAsync(product.amplitudeFitParametersEB, amplitudeFitParametersEB_, stream);

but I've been typing this in here without actually testing, so double check everything !

@thomreis
Copy link
Contributor

thomreis commented Mar 3, 2021

My branch is here: https://github.com/thomreis/cmssw/tree/ecal-local-reco-gpu-fix-issues/CondFormats/EcalObjects
I did not commit the changes yet since they do not compile.
But the changes I made look like what you have posted above (apart from the stream variable being named cudaStream)

In EcalMultifitParametersGPU.h:

class EcalMultifitParametersGPU {
public:
  struct Product {
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEB;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEE;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEB;
    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEE;
  };

In EcalMultifitParametersGPU.cc:

EcalMultifitParametersGPU::Product const& EcalMultifitParametersGPU::getProduct(cudaStream_t cudaStream) const {
  auto const& product = product_.dataForCurrentDeviceAsync(
      cudaStream, [this](EcalMultifitParametersGPU::Product& product, cudaStream_t cudaStream) {
        // allocate
        product.amplitudeFitParametersEB = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEB_.size(), cudaStream);
        product.amplitudeFitParametersEE = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEE_.size(), cudaStream);
        product.timeFitParametersEB = cms::cuda::make_device_unique<double[]>(timeFitParametersEB_.size(), cudaStream);
        product.timeFitParametersEE = cms::cuda::make_device_unique<double[]>(timeFitParametersEE_.size(), cudaStream);
        // transfer
        cms::cuda::copyAsync(product.amplitudeFitParametersEB, amplitudeFitParametersEB_, cudaStream);
        cms::cuda::copyAsync(product.amplitudeFitParametersEE, amplitudeFitParametersEE_, cudaStream);
        cms::cuda::copyAsync(product.timeFitParametersEB, timeFitParametersEB_, cudaStream);
        cms::cuda::copyAsync(product.timeFitParametersEE, timeFitParametersEE_, cudaStream);
      });
  return product;
}

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 3, 2021

I've tried making these changes on top of CMSSW_11_3_0_pre3:

diff --git a/RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h b/RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h
index 56aa0579ff77..a6c0b1c81aa2 100644
--- a/RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h
+++ b/RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h
@@ -4,6 +4,8 @@
 #include <array>
 
 #include "FWCore/ParameterSet/interface/ParameterSet.h"
+#include "FWCore/Utilities/interface/propagate_const_array.h"
+#include "HeterogeneousCore/CUDAUtilities/interface/device_unique_ptr.h"
 
 #ifndef __CUDACC__
 #include "HeterogeneousCore/CUDAUtilities/interface/HostAllocator.h"
@@ -13,8 +15,10 @@
 class EcalMultifitParametersGPU {
 public:
   struct Product {
-    ~Product();
-    double *amplitudeFitParametersEB, *amplitudeFitParametersEE, *timeFitParametersEB, *timeFitParametersEE;
+    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEB;
+    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> amplitudeFitParametersEE;
+    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEB;
+    edm::propagate_const_array<cms::cuda::device::unique_ptr<double[]>> timeFitParametersEE;
   };
 
 #ifndef __CUDACC__
@@ -29,8 +33,10 @@ public:
   }
 
 private:
-  std::vector<double, cms::cuda::HostAllocator<double>> amplitudeFitParametersEB_, amplitudeFitParametersEE_,
-      timeFitParametersEB_, timeFitParametersEE_;
+  std::vector<double, cms::cuda::HostAllocator<double>> amplitudeFitParametersEB_;
+  std::vector<double, cms::cuda::HostAllocator<double>> amplitudeFitParametersEE_;
+  std::vector<double, cms::cuda::HostAllocator<double>> timeFitParametersEB_;
+  std::vector<double, cms::cuda::HostAllocator<double>> timeFitParametersEE_;
 
   cms::cuda::ESProduct<Product> product_;
 #endif  // __CUDACC__
diff --git a/RecoLocalCalo/EcalRecAlgos/src/EcalMultifitParametersGPU.cc b/RecoLocalCalo/EcalRecAlgos/src/EcalMultifitParametersGPU.cc
index 010da6444b61..149ba92ff170 100644
--- a/RecoLocalCalo/EcalRecAlgos/src/EcalMultifitParametersGPU.cc
+++ b/RecoLocalCalo/EcalRecAlgos/src/EcalMultifitParametersGPU.cc
@@ -1,7 +1,6 @@
-#include "RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h"
-
 #include "FWCore/Utilities/interface/typelookup.h"
-#include "HeterogeneousCore/CUDAUtilities/interface/cudaCheck.h"
+#include "HeterogeneousCore/CUDAUtilities/interface/copyAsync.h"
+#include "RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h"
 
 EcalMultifitParametersGPU::EcalMultifitParametersGPU(edm::ParameterSet const& ps) {
   auto const& amplitudeFitParametersEB = ps.getParameter<std::vector<double>>("EBamplitudeFitParameters");
@@ -20,45 +19,20 @@ EcalMultifitParametersGPU::EcalMultifitParametersGPU(edm::ParameterSet const& ps
   std::copy(timeFitParametersEE.begin(), timeFitParametersEE.end(), timeFitParametersEE_.begin());
 }
 
-EcalMultifitParametersGPU::Product::~Product() {
-  cudaCheck(cudaFree(amplitudeFitParametersEB));
-  cudaCheck(cudaFree(amplitudeFitParametersEE));
-  cudaCheck(cudaFree(timeFitParametersEB));
-  cudaCheck(cudaFree(timeFitParametersEE));
-}
-
 EcalMultifitParametersGPU::Product const& EcalMultifitParametersGPU::getProduct(cudaStream_t cudaStream) const {
   auto const& product = product_.dataForCurrentDeviceAsync(
       cudaStream, [this](EcalMultifitParametersGPU::Product& product, cudaStream_t cudaStream) {
-        // malloc
-        cudaCheck(cudaMalloc((void**)&product.amplitudeFitParametersEB,
-                             this->amplitudeFitParametersEB_.size() * sizeof(double)));
-        cudaCheck(cudaMalloc((void**)&product.amplitudeFitParametersEE,
-                             this->amplitudeFitParametersEE_.size() * sizeof(double)));
-        cudaCheck(cudaMalloc((void**)&product.timeFitParametersEB, this->timeFitParametersEB_.size() * sizeof(double)));
-        cudaCheck(cudaMalloc((void**)&product.timeFitParametersEE, this->timeFitParametersEE_.size() * sizeof(double)));
+        // allocate GPU memory
+        product.amplitudeFitParametersEB = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEB_.size(), cudaStream);
+        product.amplitudeFitParametersEE = cms::cuda::make_device_unique<double[]>(amplitudeFitParametersEE_.size(), cudaStream);
+        product.timeFitParametersEB = cms::cuda::make_device_unique<double[]>(timeFitParametersEB_.size(), cudaStream);
+        product.timeFitParametersEE = cms::cuda::make_device_unique<double[]>(timeFitParametersEE_.size(), cudaStream);
 
         // transfer
-        cudaCheck(cudaMemcpyAsync(product.amplitudeFitParametersEB,
-                                  this->amplitudeFitParametersEB_.data(),
-                                  this->amplitudeFitParametersEB_.size() * sizeof(double),
-                                  cudaMemcpyHostToDevice,
-                                  cudaStream));
-        cudaCheck(cudaMemcpyAsync(product.amplitudeFitParametersEE,
-                                  this->amplitudeFitParametersEE_.data(),
-                                  this->amplitudeFitParametersEE_.size() * sizeof(double),
-                                  cudaMemcpyHostToDevice,
-                                  cudaStream));
-        cudaCheck(cudaMemcpyAsync(product.timeFitParametersEB,
-                                  this->timeFitParametersEB_.data(),
-                                  this->timeFitParametersEB_.size() * sizeof(double),
-                                  cudaMemcpyHostToDevice,
-                                  cudaStream));
-        cudaCheck(cudaMemcpyAsync(product.timeFitParametersEE,
-                                  this->timeFitParametersEE_.data(),
-                                  this->timeFitParametersEE_.size() * sizeof(double),
-                                  cudaMemcpyHostToDevice,
-                                  cudaStream));
+        cms::cuda::copyAsync(product.amplitudeFitParametersEB, amplitudeFitParametersEB_, cudaStream);
+        cms::cuda::copyAsync(product.amplitudeFitParametersEE, amplitudeFitParametersEE_, cudaStream);
+        cms::cuda::copyAsync(product.timeFitParametersEB, timeFitParametersEB_, cudaStream);
+        cms::cuda::copyAsync(product.timeFitParametersEE, timeFitParametersEE_, cudaStream);
       });
   return product;
 }

and they seem to build fine.

@thomreis
Copy link
Contributor

thomreis commented Mar 3, 2021

Thanks for confirming this. The only difference to my version that I can see is that I have moved the headers to CondFormats/EcalConds already. Maybe some BuildFile issue.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 3, 2021

A silly thing: is it possible you are still including the old header from RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h, which is then found in the release, and uses the old types ?

@thomreis
Copy link
Contributor

thomreis commented Mar 3, 2021

Bingo! At least for two of the files with issues.

I still get some errors on other files. Does the recipe also work for classes that use std::vector<float> const& like, e.g. here: https://github.com/thomreis/cmssw/blob/ecal-local-reco-gpu-fix-issues/CondFormats/EcalObjects/interface/EcalTimeBiasCorrectionsGPU.h#L39-L42
The ones using std::vector<float, cms::cuda::HostAllocator<float>> all work now.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 3, 2021

No, it only works for vectors that use the HostAllocator; a standard vector does not "pin" its memory, so the copies need an extra step to be asynchronous (which is done internally by CUDA, but requires a worker thread and an extra memory buffer).

Can you change the type to std::vector<float, cms::cuda::HostAllocator<float>> ? or is there any reason why not ?

@thomreis
Copy link
Contributor

thomreis commented Mar 3, 2021

I do not know why it was implemented like that for those classes. Maybe @amassiro does?

If it can be changed to std::vector<float, cms::cuda::HostAllocator<float>> does the currently implemented strategy to have only one Product.values variable that can be allocated for valuesEB_ and with an offset also for valuesEE_ work? Like, e.g. in https://github.com/thomreis/cmssw/blob/ecal-local-reco-gpu-fix-issues/CondFormats/EcalObjects/src/EcalPulseCovariancesGPU.cc#L17-L36

@amassiro
Copy link
Contributor

amassiro commented Mar 3, 2021

I do not know why it was implemented like that for those classes. Maybe @amassiro does?

I don't remember, it could be that at the beginning it was the only option.
Maybe @vkhristenko remembers the motivation

If it can be changed to std::vector<float, cms::cuda::HostAllocator<float>> does the currently implemented strategy to have only one Product.values variable that can be allocated for valuesEB_ and with an offset also for valuesEE_ work? Like, e.g. in https://github.com/thomreis/cmssw/blob/ecal-local-reco-gpu-fix-issues/CondFormats/EcalObjects/src/EcalPulseCovariancesGPU.cc#L17-L36

@vkhristenko
Copy link
Contributor

I do not know why it was implemented like that for those classes. Maybe @amassiro does?

I don't remember, it could be that at the beginning it was the only option.
Maybe @vkhristenko remembers the motivation

If it can be changed to std::vector<float, cms::cuda::HostAllocator<float>> does the currently implemented strategy to have only one Product.values variable that can be allocated for valuesEB_ and with an offset also for valuesEE_ work? Like, e.g. in https://github.com/thomreis/cmssw/blob/ecal-local-reco-gpu-fix-issues/CondFormats/EcalObjects/src/EcalPulseCovariancesGPU.cc#L17-L36

I do not remember if there was any reason for this (if not documented, prolly no), but just change to use that allocator accordingly (have to make copies, etc...)

@thomreis
Copy link
Contributor

thomreis commented Mar 4, 2021

The issue seems to lie here:

CMSSW_11_3_0_pre3/src/CondFormats/EcalObjects/src/EcalIntercalibConstantsGPU.cc:7:70: error: no matching function for call to 'std::vector<float, cms::cuda::HostAllocator<float> >::vector(<brace-enclosed initializer list>)'
    7 |     : valuesEB_{values.barrelItems()}, valuesEE_{values.endcapItems()} {}

@vkhristenko
Copy link
Contributor

vkhristenko commented Mar 4, 2021 via email

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 4, 2021

The issue seems to lie here:

CMSSW_11_3_0_pre3/src/CondFormats/EcalObjects/src/EcalIntercalibConstantsGPU.cc:7:70: error: no matching function for call to 'std::vector<float, cms::cuda::HostAllocator<float> >::vector(<brace-enclosed initializer list>)'
    7 |     : valuesEB_{values.barrelItems()}, valuesEE_{values.endcapItems()} {}

Yes - that is on purpose.
As I wrote earlier:

a standard vector does not "pin" its memory, so the copies need an extra step to be asynchronous (which is done internally by CUDA, but requires a worker thread and an extra memory buffer).

If there are no downsides, to me it seems better to change the vector to use the HostAllocator so the copy can be fully asynchronous.

@makortel what do you think ?

@makortel
Copy link
Contributor

makortel commented Mar 4, 2021

If there are no downsides, to me it seems better to change the vector to use the HostAllocator so the copy can be fully asynchronous.

I agree. The only downside I can think of is an additional copy of the data, but I think being able to copy asynchronously benefits more than that.

@thomreis
Copy link
Contributor

thomreis commented Mar 5, 2021

Is there a way to make this compile:

        auto const offset = this->valuesEB_.size();
        // transfer
        cms::cuda::copyAsync(product.values, valuesEB_, cudaStream);
        cms::cuda::copyAsync(product.values + offset, valuesEE_, cudaStream);

with values being

  struct Product {
    edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> values;
  };

and valuesEB_ an valuesEE_ being

  std::vector<float, cms::cuda::HostAllocator<float>> valuesEB_;
  std::vector<float, cms::cuda::HostAllocator<float>> valuesEE_;

The current compile error is

CMSSW_11_3_0_pre3/src/CondFormats/EcalObjects/src/EcalIntercalibConstantsGPU.cc:31:76: error: no matching function for call to 'copyAsync(edm::propagate_const_array<std::unique_ptr<float [], cms::cuda::device::impl::DeviceDeleter> >::element_type*, const std::vector<float, cms::cuda::HostAllocator<float> >&, CUstream_st*&)'
   31 |         cms::cuda::copyAsync(product.values + offset, valuesEE_, cudaStream);

Or is there need for separate edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> valuesEB and edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> valuesEE?

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 5, 2021

Is there a way to make this compile [...]

No.

Or is there need for separate edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> valuesEB and edm::propagate_const_array<cms::cuda::device::unique_ptr<float[]>> valuesEE?

If you want to use a single vector on the GPU, you should use a single vector on the CPU as well: fill it with the EB values first and EE values after.

@thomreis
Copy link
Contributor

thomreis commented Mar 5, 2021

Thanks. @amassiro @vkhristenko is this the preferred version? Having a single vector on the GPU?

@amassiro
Copy link
Contributor

amassiro commented Mar 5, 2021

Thanks. @amassiro @vkhristenko is this the preferred version? Having a single vector on the GPU?

For me it's the same, let's decide one strategy and propagate the same format everywhere.
I just don't want one strategy for uncalibrehcit and another for rechit conditions.
The rechit were adapted to the same procedure used by unucalibrechit, it's ok to have one single vector, if it makes things faster, everywhere.

@thomreis
Copy link
Contributor

thomreis commented Mar 5, 2021

I have seen both. E.g. EcalGainRatiosGPU (https://github.com/cms-sw/cmssw/blob/master/RecoLocalCalo/EcalRecAlgos/src/EcalGainRatiosGPU.cc#L8-L20) uses one vector for EB and EE while EcalMultifitParametersGPU (https://github.com/cms-sw/cmssw/blob/master/RecoLocalCalo/EcalRecAlgos/interface/EcalMultifitParametersGPU.h#L17) uses separate ones for EB and EE.

@thomreis
Copy link
Contributor

Hi @fwyzard the first two items are done now with PR #33116 merged.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 24, 2021

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants