Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Cellular Automaton #48

Conversation

felicepantaleo
Copy link

Preliminary version of the GPU CA.
I'll make it heterogeneousEDProducer, place copies in the correct place, remove commented code.
@rovere @fwyzard @VinInn @makortel

@cmsbot
Copy link

cmsbot commented May 23, 2018

A new Pull Request was created by @felicepantaleo (Felice Pantaleo) for CMSSW_10_2_X_Patatrack.

It involves the following packages:

RecoPixelVertexing/PixelTriplets
RecoTracker/TkHitPairs

@cmsbot, @fwyzard can you please review it and eventually sign? Thanks.

cms-bot commands are listed here

Copy link

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a cursory look and have also some general comments

  • In general the reformatting is nice, but
    • there are some cases where IMHO it would be clearer to avoid line breaks
    • in principle it would be nicer to have the reformatting in its own PR
  • There is a lot of copy-paste (in addition to what there was already), but that is probably best to deal with later when the dust settles (i.e. after migrating to HeterogeneousEDProducer etc)


std::vector<int> theOuterLayerPairs;
std::vector<int> theInnerLayerPairs;
std::string name() const { return theName; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return const std::string&.

Eventually it would be nice to avoid strings as the layer identifiers.

edm::ParameterSet comparitorPSet =
cfg.getParameter<edm::ParameterSet>("SeedComparitorPSet");
std::string comparitorName =
comparitorPSet.getParameter<std::string>("ComponentName");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the line breaks be avoided?

}
}

void CAHitQuadrupletGenerator::fillDescriptions(edm::ParameterSetDescription& desc) {
void CAHitQuadrupletGenerator::fillDescriptions(
edm::ParameterSetDescription &desc) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the line break be avoided?

@@ -39,15 +39,15 @@ class RecHitsSortedInPhi {
typedef std::pair<HitIter,HitIter> Range;

using DoubleRange = std::array<int,4>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes in this file are reformatting, so in principle could be avoided in this PR.

#include <cuda.h>
#include <cuda_runtime.h>

template <int maxSize, class T> struct GPUSimpleVector {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from
https://github.com/cms-patatrack/cmssw/blob/CMSSW_10_2_X_Patatrack/HeterogeneousCore/CUDAUtilities/interface/GPUSimpleVector.h
? (ok, I see int maxSize template parameter)

Anyway it would be better placed (eventually) in HeterogeneousCore/CUDAUtilities.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, can we avoid having two GPUSimpleVector classes ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed with Felice and I agree the use case is valid (one GPU allocation vs. a GPU allocation per hit). The intention of the class is to provide a vector-like interface on top of an (dynamically-allocated) array, close to what

template <typename T, unsigned int N>
class VecArray {

does in CPU.

I'd suggest to treat this class similarly, i.e. rename to e.g. GPUVecArray (and reorder the template parameters as <class, int>) and move to HeterogeneousCore/CUDAUtilities/interface.

<flags EDM_PLUGIN="1"/>
<flags CUDA_FLAGS="--expt-relaxed-constexpr"/>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to add this, it is included by default in the CUDA_FLAGS


std::vector<const HitDoublets *> hitDoublets;

const int numberOfHitsInNtuplet = 4;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unused

@cmsbot
Copy link

cmsbot commented May 25, 2018

Pull request #48 was updated. @cmsbot, @fwyzard can you please check and sign again.

1 similar comment
@cmsbot
Copy link

cmsbot commented May 25, 2018

Pull request #48 was updated. @cmsbot, @fwyzard can you please check and sign again.

Copy link

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I repeat my earlier general comments on the formatting:

  • separating the formatting changes from the rest would make review of the rest much easier
  • there are many places where the line breaks make the code more difficult to read (IMHO)

//
// #include "CAHitQuadrupletGeneratorGPU.h"
// using CAHitQuadrupletGPUEDProducer = CAHitNtupletEDProducerT<CAHitQuadrupletGeneratorGPU>;
// DEFINE_FWK_MODULE(CAHitQuadrupletGPUEDProducer);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the commented code.

CAHitNtupletHeterogeneousEDProducer::CAHitNtupletHeterogeneousEDProducer(
const edm::ParameterSet &iConfig)
: HeterogeneousEDProducer<heterogeneous::HeterogeneousDevices<
heterogeneous::GPUCuda, heterogeneous::CPU>>(iConfig), doubletToken_(consumes<IntermediateHitDoublets>(iConfig.getParameter<edm::InputTag>("doublets"))),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base class constructor call can be shortened to

: HeterogeneousEDProducer(iConfig),


CAHitNtupletHeterogeneousEDProducer::~CAHitNtupletHeterogeneousEDProducer() {

}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to replace the empty destructor with adding = default to the declaration on line 34.

void CAHitNtupletHeterogeneousEDProducer::beginStreamGPUCuda(
edm::StreamID streamId, cuda::stream_t<> &cudaStream) {

}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not mandatory to implement beginStreamGPUCuda(), so you could remove the method altogether. But where do you allocate the GPU memory? (answering to myself: in the constructor, but see other comment why the allocations should (eventually) be moved here)


void CAHitNtupletHeterogeneousEDProducer::produceGPUCuda(
edm::HeterogeneousEvent &iEvent, const edm::EventSetup &iSetup,
cuda::stream_t<> &cudaStream) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ordering acquireGPUCuda() and produceGPUCuda() in that order would make them easier to read.

dim3 numberOfBlocks_find(8, numberOfRootLayerPairs);
((GPUSimpleVector<maxNumberOfQuadruplets, Quadruplet>
*)(h_foundNtuplets[regionIndex]))
->reset();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If h_foundNtuplets really has to contain void *, could we at least use reinterpret_cast here (and elsewhere)?

cudaMemsetAsync(device_isOuterHitOfCell, 0,
maxNumberOfLayers * maxNumberOfHits *
sizeof(GPUSimpleVector<maxCellsPerHit, unsigned int>),
cudaStream_);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment that this resets temporary memory for the next event, and is not needed for reading the output? Just to explain why it is ok to have async call here without cudaStreamSynchronize().

const TrackingRegion &region = regionLayerPairs.region();
auto foundQuads = fetchKernelResult(index);
std::cout << foundQuads.size() << " found quads" << std::endl;
unsigned int numberOfFoundQuadruplets = foundQuads.size();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation becomes inconsistent at this line. But I'd rather re-indent the lines above than those below.

const TrackingRegion &region = regionLayerPairs.region();
auto seedingHitSetsFiller = seedingHitSets->beginRegion(&region);
generator_.fillResults(regionDoublets, ntuplets, iSetup, seedingLayerHits,
cudaStream.id());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks like CAHitQuadrupletGeneratorGPU::fillResults() already loops over the regions and fills ntuplets for all regions. Why does it have to be repeated here for each region? Should the loop perhaps be something along

  generator_.fillResults(regionDoublets, ntuplets, iSetup, seedingLayerHits, cudaStream.id());
  int index = 0;
  for (const auto &regionLayerPairs : regionDoublets) {
    const TrackingRegion &region = regionLayerPairs.region();
    auto seedingHitSetsFiller = seedingHitSets->beginRegion(&region);
    fillNtuplets(seedingHitSetsFiller, ntuplets[index]);
    index++;
  )

instead?

<flags EDM_PLUGIN="1"/>
<flags CUDA_FLAGS="--expt-relaxed-constexpr"/>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard commented earlier that this is not needed as the flag is included by default in CUDA_FLAGS.

@makortel
Copy link

It is also noteworthy that (because of the formatting changes, IIUC) this PR conflicts with cms-sw#23363.

@fwyzard
Copy link

fwyzard commented May 29, 2018

question: with these changes, do we use the new producer for the GPU workflow ?

@makortel
Copy link

question: with these changes, do we use the new producer for the GPU workflow ?

No. Additional changes in the configuration are needed to use this new producer.

@cmsbot
Copy link

cmsbot commented May 30, 2018

Pull request #48 was updated. @cmsbot, @fwyzard can you please check and sign again.

@fwyzard fwyzard force-pushed the CMSSW_10_2_X_Patatrack branch 2 times, most recently from 1965f2e to 30594f6 Compare June 4, 2018 16:10
@fwyzard
Copy link

fwyzard commented Jun 5, 2018

Validation summary

Reference release CMSSW_10_2_0_pre4 at 926a81b
Development branch CMSSW_10_2_X_Patatrack at b1e6d1c
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_0_pre3-PU25ns_101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre3-101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_0_pre3-PU25ns_101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

  • reference DQM plots for reference release, workflow 10824.5
  • DQM plots for development release, workflow 10824.5
  • DQM plots for development release, workflow 10824.8
  • DQM plots for development release, workflow 10824.7
  • DQM plots for development release, workflow 10824.9 are missing
  • DQM plots for testing release, workflow 10824.5
  • DQM plots for testing release, workflow 10824.8
  • DQM plots for testing release, workflow 10824.7
  • DQM plots for testing release, workflow 10824.9 are missing
  • DQM comparison for reference workflow 10824.5
  • DQM comparison for workflow 10824.8
  • DQM comparison for workflow 10824.7
  • DQM comparison for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_0_pre3-101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

  • reference DQM plots for reference release, workflow 10824.5
  • DQM plots for development release, workflow 10824.5
  • DQM plots for development release, workflow 10824.8 are missing
  • DQM plots for development release, workflow 10824.7
  • DQM plots for development release, workflow 10824.9 are missing
  • DQM plots for testing release, workflow 10824.5
  • DQM plots for testing release, workflow 10824.8
  • DQM plots for testing release, workflow 10824.7
  • DQM plots for testing release, workflow 10824.9 are missing
  • DQM comparison for reference workflow 10824.5
  • DQM comparison for workflow 10824.8
  • DQM comparison for workflow 10824.7
  • DQM comparison for workflow 10824.9

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_0_pre3-PU25ns_101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre3-101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/a56c9f39c1dfffed018bbd9a0f026238f9390c21/log .

@fwyzard
Copy link

fwyzard commented Jun 5, 2018

The development summary for workflow 10824.8 is very succinct:

======== Error: Application received signal 139

Which is strange because the same workflow is successful in the validation of other PRs.

@cmsbot
Copy link

cmsbot commented Jun 5, 2018

Pull request #48 was updated. @cmsbot, @fwyzard can you please check and sign again.

1 similar comment
@cmsbot
Copy link

cmsbot commented Jun 5, 2018

Pull request #48 was updated. @cmsbot, @fwyzard can you please check and sign again.


bool emptyRegionDoublets = false;
std::unique_ptr<RegionsSeedingHitSets> seedingHitSets;
std::vector<OrderedHitSeeds> ntuplets;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the member variable names to match the coding rules, i.e. add a trailing _

static constexpr int maxNumberOfHits = 1000;
static constexpr int maxNumberOfRegions = 30;

unsigned int numberOfRootLayerPairs = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the member variable names to match the coding rules, i.e. add a trailing _

theInnerR, theOuterR);
}

// __host__ __device__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you delete the commented out part ?

@cmsbot
Copy link

cmsbot commented Jun 8, 2018

Pull request #48 was updated. @cmsbot, @fwyzard can you please check and sign again.

1 similar comment
@cmsbot
Copy link

cmsbot commented Jun 8, 2018

Pull request #48 was updated. @cmsbot, @fwyzard can you please check and sign again.

@felicepantaleo
Copy link
Author

I still need to make use of the existing non-templated GPU vector and probably rebase the PR...

@fwyzard
Copy link

fwyzard commented Jun 14, 2018

To reformulate the comment: the code should not crash.
If we use fixed-size buffers, their use should be protected, with the algorithms doing one of

  • adapting, e.g. processing all elements, a chunk at a time
  • processing only the elements that fit the buffer, and signal a LogError
  • process no elements, and signal a LogError

@cmsbot
Copy link

cmsbot commented Jun 14, 2018

Pull request #48 was updated. @cmsbot, @fwyzard can you please check and sign again.

@felicepantaleo
Copy link
Author

@fwyzard, I have implemented your suggestion and replaced the assert with a LogError and return

@fwyzard
Copy link

fwyzard commented Jun 14, 2018

Validation summary

Reference release CMSSW_10_2_0_pre5 at 30c7b03
Development branch CMSSW_10_2_X_Patatrack at 655e4ed
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_0_pre3-PU25ns_101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre3-101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_0_pre3-PU25ns_101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

  • reference DQM plots for reference release, workflow 10824.5
  • DQM plots for development release, workflow 10824.5
  • DQM plots for development release, workflow 10824.8
  • DQM plots for development release, workflow 10824.7
  • DQM plots for development release, workflow 10824.9 are missing
  • DQM plots for testing release, workflow 10824.5
  • DQM plots for testing release, workflow 10824.8 are missing
  • DQM plots for testing release, workflow 10824.7
  • DQM plots for testing release, workflow 10824.9 are missing
  • DQM comparison for reference workflow 10824.5
  • DQM comparison for workflow 10824.8
  • DQM comparison for workflow 10824.7
  • DQM comparison for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_0_pre3-101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

  • reference DQM plots for reference release, workflow 10824.5
  • DQM plots for development release, workflow 10824.5
  • DQM plots for development release, workflow 10824.8
  • DQM plots for development release, workflow 10824.7
  • DQM plots for development release, workflow 10824.9 are missing
  • DQM plots for testing release, workflow 10824.5
  • DQM plots for testing release, workflow 10824.8 are missing
  • DQM plots for testing release, workflow 10824.7
  • DQM plots for testing release, workflow 10824.9 are missing
  • DQM comparison for reference workflow 10824.5
  • DQM comparison for workflow 10824.8
  • DQM comparison for workflow 10824.7
  • DQM comparison for workflow 10824.9

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_0_pre3-PU25ns_101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre3-101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/7be28d1421adf3f678cb1e9c25683564047e3e87/log .

@fwyzard
Copy link

fwyzard commented Jun 14, 2018

As of 2ebbcf3 , both 10824.8 workflows are failing (TTbar with SIGABRT, abort, Zmumu with SIGSEGV, segmentation violation).

@cmsbot
Copy link

cmsbot commented Jun 14, 2018

Pull request #48 was updated. @cmsbot, @fwyzard can you please check and sign again.

@fwyzard
Copy link

fwyzard commented Jun 16, 2018

Validation summary

Reference release CMSSW_10_2_0_pre5 at 30c7b03
Development branch CMSSW_10_2_X_Patatrack at 655e4ed
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_0_pre3-PU25ns_101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre3-101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_0_pre3-PU25ns_101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

  • reference DQM plots for reference release, workflow 10824.5
  • DQM plots for development release, workflow 10824.5
  • DQM plots for development release, workflow 10824.8
  • DQM plots for development release, workflow 10824.7
  • DQM plots for development release, workflow 10824.9 are missing
  • DQM plots for testing release, workflow 10824.5
  • DQM plots for testing release, workflow 10824.8
  • DQM plots for testing release, workflow 10824.7
  • DQM plots for testing release, workflow 10824.9 are missing
  • DQM comparison for reference workflow 10824.5
  • DQM comparison for workflow 10824.8
  • DQM comparison for workflow 10824.7
  • DQM comparison for workflow 10824.9

/RelValZMM_13/CMSSW_10_2_0_pre3-101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

  • reference DQM plots for reference release, workflow 10824.5
  • DQM plots for development release, workflow 10824.5
  • DQM plots for development release, workflow 10824.8
  • DQM plots for development release, workflow 10824.7
  • DQM plots for development release, workflow 10824.9 are missing
  • DQM plots for testing release, workflow 10824.5
  • DQM plots for testing release, workflow 10824.8 are missing
  • DQM plots for testing release, workflow 10824.7
  • DQM plots for testing release, workflow 10824.9 are missing
  • DQM comparison for reference workflow 10824.5
  • DQM comparison for workflow 10824.8
  • DQM comparison for workflow 10824.7
  • DQM comparison for workflow 10824.9

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_0_pre3-PU25ns_101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre3-101X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/c31f6d167e9a9330f83b4144f95152ae5eeffac3/log .

@fwyzard fwyzard merged commit 2b0f382 into cms-patatrack:CMSSW_10_2_X_Patatrack Jun 16, 2018
@fwyzard
Copy link

fwyzard commented Jun 16, 2018

I have split the squash commit in two, to keep separate the work on the GPU::SimpleVector and GPU::VecArray, and the work on the CA.

@fwyzard
Copy link

fwyzard commented Jun 16, 2018

@makortel sorry for not addressing your comments earlier; can you summarise which ones are still relevant, and I'll try to make the corresponding changes ?

@makortel
Copy link

@fwyzard No problem, I tried to gather them below (most are mostly aesthetic and possibly subjective)

@fwyzard
Copy link

fwyzard commented Jun 18, 2018

I'll implement 1. and 2., look into 3. and 4., and leave 5. as is for the moment... I'd rather make clang-format do the work us.

@makortel
Copy link

@fwyzard Sounds good, thanks!

@fwyzard
Copy link

fwyzard commented Jun 18, 2018

OK, I've done something along the lines of 3. and 4. (and maybe 5.) as well.

See #83 for the clean up.

fwyzard added a commit that referenced this pull request Jun 18, 2018
Apply some clean up to the code and formatting of `CAHitNtupletHeterogeneousEDProducer` and `CAHitQuadrupletGeneratorGPU`, as suggested by @makortel during the review of #48:
  - clean up the `BuildFile.xml`
  - remove unused data members and arguments from function calls;
  - percolate the CUDA stream instead of storing it as a data member.

Also:
  - add `cudaCheck` calls around memory allocations and copies;
  - reduce the number of memory allocations used to set up the GPU state.
fwyzard added a commit that referenced this pull request Oct 8, 2020
Apply some clean up to the code and formatting of `CAHitNtupletHeterogeneousEDProducer` and `CAHitQuadrupletGeneratorGPU`, as suggested by @makortel during the review of #48:
  - clean up the `BuildFile.xml`
  - remove unused data members and arguments from function calls;
  - percolate the CUDA stream instead of storing it as a data member.

Also:
  - add `cudaCheck` calls around memory allocations and copies;
  - reduce the number of memory allocations used to set up the GPU state.
fwyzard added a commit that referenced this pull request Oct 20, 2020
Apply some clean up to the code and formatting of `CAHitNtupletHeterogeneousEDProducer` and `CAHitQuadrupletGeneratorGPU`, as suggested by @makortel during the review of #48:
  - clean up the `BuildFile.xml`
  - remove unused data members and arguments from function calls;
  - percolate the CUDA stream instead of storing it as a data member.

Also:
  - add `cudaCheck` calls around memory allocations and copies;
  - reduce the number of memory allocations used to set up the GPU state.
fwyzard added a commit that referenced this pull request Oct 20, 2020
Apply some clean up to the code and formatting of `CAHitNtupletHeterogeneousEDProducer` and `CAHitQuadrupletGeneratorGPU`, as suggested by @makortel during the review of #48:
  - clean up the `BuildFile.xml`
  - remove unused data members and arguments from function calls;
  - percolate the CUDA stream instead of storing it as a data member.

Also:
  - add `cudaCheck` calls around memory allocations and copies;
  - reduce the number of memory allocations used to set up the GPU state.
fwyzard added a commit that referenced this pull request Oct 23, 2020
Apply some clean up to the code and formatting of `CAHitNtupletHeterogeneousEDProducer` and `CAHitQuadrupletGeneratorGPU`, as suggested by @makortel during the review of #48:
  - clean up the `BuildFile.xml`
  - remove unused data members and arguments from function calls;
  - percolate the CUDA stream instead of storing it as a data member.

Also:
  - add `cudaCheck` calls around memory allocations and copies;
  - reduce the number of memory allocations used to set up the GPU state.
fwyzard added a commit that referenced this pull request Nov 6, 2020
Apply some clean up to the code and formatting of `CAHitNtupletHeterogeneousEDProducer` and `CAHitQuadrupletGeneratorGPU`, as suggested by @makortel during the review of #48:
  - clean up the `BuildFile.xml`
  - remove unused data members and arguments from function calls;
  - percolate the CUDA stream instead of storing it as a data member.

Also:
  - add `cudaCheck` calls around memory allocations and copies;
  - reduce the number of memory allocations used to set up the GPU state.
fwyzard added a commit that referenced this pull request Nov 6, 2020
Apply some clean up to the code and formatting of `CAHitNtupletHeterogeneousEDProducer` and `CAHitQuadrupletGeneratorGPU`, as suggested by @makortel during the review of #48:
  - clean up the `BuildFile.xml`
  - remove unused data members and arguments from function calls;
  - percolate the CUDA stream instead of storing it as a data member.

Also:
  - add `cudaCheck` calls around memory allocations and copies;
  - reduce the number of memory allocations used to set up the GPU state.
fwyzard added a commit that referenced this pull request Nov 16, 2020
Apply some clean up to the code and formatting of `CAHitNtupletHeterogeneousEDProducer` and `CAHitQuadrupletGeneratorGPU`, as suggested by @makortel during the review of #48:
  - clean up the `BuildFile.xml`
  - remove unused data members and arguments from function calls;
  - percolate the CUDA stream instead of storing it as a data member.

Also:
  - add `cudaCheck` calls around memory allocations and copies;
  - reduce the number of memory allocations used to set up the GPU state.
fwyzard added a commit that referenced this pull request Nov 27, 2020
Apply some clean up to the code and formatting of `CAHitNtupletHeterogeneousEDProducer` and `CAHitQuadrupletGeneratorGPU`, as suggested by @makortel during the review of #48:
  - clean up the `BuildFile.xml`
  - remove unused data members and arguments from function calls;
  - percolate the CUDA stream instead of storing it as a data member.

Also:
  - add `cudaCheck` calls around memory allocations and copies;
  - reduce the number of memory allocations used to set up the GPU state.
fwyzard pushed a commit that referenced this pull request Dec 26, 2020
Apply some clean up to the code and formatting of `CAHitNtupletHeterogeneousEDProducer` and `CAHitQuadrupletGeneratorGPU`, as suggested by @makortel during the review of #48:
  - clean up the `BuildFile.xml`
  - remove unused data members and arguments from function calls;
  - percolate the CUDA stream instead of storing it as a data member.

Also:
  - add `cudaCheck` calls around memory allocations and copies;
  - reduce the number of memory allocations used to set up the GPU state.
fwyzard added a commit that referenced this pull request Jan 15, 2021
Apply some clean up to the code and formatting of `CAHitNtupletHeterogeneousEDProducer` and `CAHitQuadrupletGeneratorGPU`, as suggested by @makortel during the review of #48:
  - clean up the `BuildFile.xml`
  - remove unused data members and arguments from function calls;
  - percolate the CUDA stream instead of storing it as a data member.

Also:
  - add `cudaCheck` calls around memory allocations and copies;
  - reduce the number of memory allocations used to set up the GPU state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants