Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster2TP assoc on GPU #105

Merged
merged 20 commits into from Jul 31, 2018
Merged

Cluster2TP assoc on GPU #105

merged 20 commits into from Jul 31, 2018

Conversation

VinInn
Copy link

@VinInn VinInn commented Jul 26, 2018

This PR contains mostly a cluster to trackingParticle association on GPU
It includes the possibility to dump all hits on cpu with the corresponding TP
adding to config process.tpClusterProducerHeterogeneous.dumpCSV = True
of course
process.tpClusterProducerHeterogeneousPixelTrackingOnly.dumpCSV = True
for our workflows

it also includes a "proto" doublet code ready to produce Cells to be consumed by the CA.

I prefer this is merged now.
We will proceed to create Cells and use them in CA later

@VinInn
Copy link
Author

VinInn commented Jul 26, 2018

ok to get a list of hits tat can be compared one can use
grep "HIT" dump2.log| cut -d' ' -f2,4-15 | sort -g -k1 -k10 -k3 > zmumu2.csv

@VinInn
Copy link
Author

VinInn commented Jul 26, 2018

indeed zmumu does not reproduce
ev det charge xg yg zg rg iphi tkId pt n1 tkId2 pt2
diff zmumu1.csv zmumu2.csv | less

3341d3340
< 8 1830 99663 2.479492 13.942786 -50.228298 14.161539 14547 337 118 5 2 312
3502a3502
> 8 1830 99663 2.479492 13.942786 -50.228298 14.161539 14547 2 312 3 337 118
12420a12421
> 25 263 3560705 -1.036580 -7.205391 20.692194 7.279571 -17876 370 750 70 367 3577
12696d12696
< 25 263 3560705 -1.036580 -7.205391 20.692194 7.279571 -17876 367 3577 70 370 750
34684c34684
< 59 34 53883 -2.440561 1.777732 -9.959659 3.019382 26200 0 0 0 0 0
---
> 59 34 79036 -1.781883 2.457852 -12.177644 3.035810 22928 0 0 0 0 0
38699a38700

etc

oops, no is the clus2TP that does not fully reproduce in case of multiple TPs...
ev 59, det 34 instead seems a real issue
so select only those with no second TP

diff zmumu1.csv zmumu2.csv | grep "0 0 0"    
< 59 34 53883 -2.440561 1.777732 -9.959659 3.019382 26200 0 0 0 0 0
> 59 34 79036 -1.781883 2.457852 -12.177644 3.035810 22928 0 0 0 0 0
< 408 31 275932 -1.483108 2.348596 19.877537 2.777681 22260 0 0 0 0 0
> 408 31 380389 -1.003347 2.485823 21.884169 2.680676 20385 0 0 0 0 0
diff zmumu1.csv zmumu3.csv | grep "0 0 0"  
< 243 86 15754 2.166721 -2.250007 13.777572 3.123653 -8388 0 0 0 0 0
> 243 86 44356 2.109987 -2.308259 17.822094 3.127316 -8659 0 0 0 0 0
< 408 31 275932 -1.483108 2.348596 19.877537 2.777681 22260 0 0 0 0 0
> 408 31 380389 -1.003347 2.485823 21.884169 2.680676 20385 0 0 0 0 0
diff zmumu2.csv zmumu3.csv | grep "0 0 0"
< 59 34 79036 -1.781883 2.457852 -12.177644 3.035810 22928 0 0 0 0 0
> 59 34 53883 -2.440561 1.777732 -9.959659 3.019382 26200 0 0 0 0 0
< 243 86 15754 2.166721 -2.250007 13.777572 3.123653 -8388 0 0 0 0 0
> 243 86 44356 2.109987 -2.308259 17.822094 3.127316 -8659 0 0 0 0 0

Copy link

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotted a few things that could be cleaned up, otherwise looks good to me.

count = step;
}
return first;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as in

template<typename RandomIt, typename T, typename Compare = less<T>>
constexpr
RandomIt lower_bound(RandomIt first, RandomIt last, const T& value, Compare comp={})

right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but the one cudastd does not compile seems to require __device__ __host__
at least in this context

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Should we then consider decorating the cudastd ones with __device__ __host__? (possibly in a later PR)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definetively!
I prefer first we find a location for a macro or something that guarantee __device__ __host__ not be defined if a non cuda compiler is used...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this is the error message

/home/vin/GPUDoublets/CMSSW_10_2_0_pre6_Patatrack/src/HeterogeneousCore/CUDAUtilities/interface/cudastdAlgorithm.h(46): error: calling a __device__ function("operator()") from a __host__ __device__ function("lower_bound") is not allowed
          detected during instantiation of "RandomIt cuda_std::lower_bound(RandomIt, RandomIt, const T &, Compare) [with RandomIt=const std::array<uint32_t, 4UL> *, T=std::array<uint32_t, 4UL>, Compare=lambda [](const std::array<uint32_t, 4UL> &, const std::array<uint32_t, 4UL> &)->bool]" 
/home/vin/GPUDoublets/CMSSW_10_2_0_pre6_Patatrack/src/SimTracker/TrackerHitAssociation/plugins/ClusterSLOnGPU.cu(75): here

pretty bizzare

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the lambda gets declared only as __device__?

Copy link

@makortel makortel Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this https://devblogs.nvidia.com/new-compiler-features-cuda-8/
What would happen with

auto less = [] __host__ __device__ (...)->bool{

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in in a global function
I can mark it device host

-  auto less = [](std::array<uint32_t,4> const & a, std::array<uint32_t,4> const & b)->bool {
+  auto less = [] __device__ __host__ (std::array<uint32_t,4> const & a, std::array<uint32_t,4> const & b)->bool {

ok fine it compiles
I will make new PR, you to judge how ugly is it...

const std::array<uint32_t,4> me{{id,ch,0,0}};

auto less = [](std::array<uint32_t,4> const & a, std::array<uint32_t,4> const & b)->bool {
return a[0]<b[0] || ( !(b[0]<a[0]) && a[1]<b[1]); // in this context we do not care of [2]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the logic. !(b[0]<a[0]) is equivalent to a[0]<=b[0], which given the left side if || has the same effect as a[0]==b[0]. I find the latter easier to understand.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is the standard way to code lexicographic ordering in std, when the only requirement is the existance of operator< (not operator==)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks. On the other hand in this case the compared types are uint32_t, but ok.


cudaCheck(cudaMalloc((void**) & slgpu.me_d, sizeof(ClusterSLGPU)));
cudaCheck(cudaMemcpyAsync(slgpu.me_d, &slgpu, sizeof(ClusterSLGPU), cudaMemcpyDefault, stream.id()));
cudaCheck(cudaDeviceSynchronize());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this synchronization is not needed.

cudaCheck(cudaMalloc((void**) & slgpu.n2_d,(ClusterSLGPU::MaxNumModules*256)*sizeof(uint32_t)));


cudaCheck(cudaMalloc((void**) & slgpu.me_d, sizeof(ClusterSLGPU)));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these freed anywhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oopsss, no.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


assert(sl.me_d);
simLink<<<blocks, threadsPerBlock, 0, stream.id()>>>(dd.me_d,ndigis, hh.gpu_d, sl.me_d,n);
cudaStreamSynchronize(stream.id());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this synchronization is not needed (even for the dump below).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is needed for the dump below in case of other printf (can go inside the if)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? dumpLink below is launched asynchronously on the same CUDA stream, so I'd expect it to work without this synchronization.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is printf that requires syncronization to dump the buffer to host.
otherwise it will overwrite the circular one on device.
at least this is what I understood (and observed)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so you want to protect against any potential earlier printf? Then yes, please move to inside the if (with a comment explaining the need).


iEvent.put<Output>(std::move(output), [legacy](const GPUProduct& hits, CPUProduct& cpu) {
cpu = *legacy; delete legacy;
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice example in favor of #100.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INDEED

@makortel
Copy link

+1 from me

@VinInn
Copy link
Author

VinInn commented Jul 26, 2018

some level of irreproducibility exists:
the number of clusters/hits is always the same

[innocent@vinzen0]/home/vin/mc2018/crash% wc  zmumu6.csv                          
  276370  3869180 20537503 zmumu6.csv
[innocent@vinzen0]/home/vin/mc2018/crash% wc  zmumu5.csv
  276370  3869180 20537501 zmumu5.csv
[innocent@vinzen0]/home/vin/mc2018/crash% wc  zmumu4.csv
  276370  3869180 20537502 zmumu4.csv

details changes in few cases: some can be attributed to cluster2TP (to be investigated)

[innocent@vinzen0]/home/vin/mc2018/crash% grep "5 0 126468" zmumu5.csv
5 0 126468 2.908309 0.966696 -25.288895 3.064762 3347 4 443 13 0 0 0
[innocent@vinzen0]/home/vin/mc2018/crash% grep "5 0 126468" zmumu6.csv
5 0 126468 2.908309 0.966696 -25.288895 3.064762 3347 4 443 12 0 0 1

other seems really coming from the clusterizer (always for clusters not associated to any TP???)

[innocent@vinzen0]/home/vin/mc2018/crash% diff zmumu6.csv zmumu4.csv | grep " 0$" 
> 5 0 126468 2.908309 0.966696 -25.288895 3.064762 3347 4 443 13 0 0 0
< 59 34 79036 -1.781883 2.457852 -12.177644 3.035810 22928 0 0 0 0 0 0
> 59 34 53883 -2.440561 1.777732 -9.959659 3.019382 26200 0 0 0 0 0 0
< 480 49 22916 -3.064693 -0.377123 -18.870319 3.087810 -31490 0 0 0 0 0 0
> 480 49 188751 -2.813137 -1.259513 -18.027351 3.082225 -28377 0 0 0 0 0 0
[innocent@vinzen0]/home/vin/mc2018/crash% diff zmumu5.csv zmumu4.csv | grep " 0$" 
> 119 840 53604 -8.183054 13.835355 -22.581303 16.074184 21956 0 0 0 0 0 0
< 119 840 82958 -8.744591 13.499733 -22.080933 16.084486 22381 0 0 0 0 0 0
< 176 55 96198 -3.022953 -0.524170 19.582039 3.068061 -30976 0 0 0 0 0 0
> 176 55 109700 -2.934326 -0.835302 20.598766 3.050902 -29875 0 0 0 0 0 0
< 408 31 380389 -1.003347 2.485823 21.884169 2.680676 20385 0 0 0 0 0 0
> 408 31 275932 -1.483108 2.348596 19.877537 2.777681 22260 0 0 0 0 0 0
< 450 270 39491 0.347789 -6.659621 13.322131 6.668696 -15840 0 0 0 0 0 0
> 450 270 395927 1.614992 -6.495625 17.991888 6.693380 -13842 0 0 0 0 0 0
< 480 49 22916 -3.064693 -0.377123 -18.870319 3.087810 -31490 0 0 0 0 0 0
> 480 49 188751 -2.813137 -1.259513 -18.027351 3.082225 -28377 0 0 0 0 0 0

@fwyzard
Copy link

fwyzard commented Jul 30, 2018

Validation summary

Reference release CMSSW_10_2_0_pre6 at a674e1f
Development branch CMSSW_10_2_X_Patatrack at 64e6201
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_0_pre6-PU25ns_102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_0_pre6-102X_upgrade2018_realistic_v7-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/e36b10437a73fea9d58141f501e488d387cd5645/log .

@@ -1,6 +1,15 @@
#ifndef HeterogeneousCore_CUDAUtilities_cudastdAlgorithm_h
#define HeterogeneousCore_CUDAUtilities_cudastdAlgorithm_h

#ifdef __CUDACC__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VinInn does it work as well if you replace the whole block with just

#include <cuda_runtime.h>

?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, #include <cuda_runtime.h> is enough, as it #defines away the CUDA-specific attributes when not building for CUDA (i.e. if __CUDACC__ is not defined).

The downside is that one must <use name="cuda"/> in the BuildFile, to let the compiler find cuda_runtime.h in the first place.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what is the decision?

Copy link
Author

@VinInn VinInn Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strictly speaking clients of HeterogeneousCore/CUDAUtilities should use it
(in test as well)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to change the #defines, but I run into link errors: my guess is that some code sees the __host__ __device__ as __attribute__((host)) __attribute__((device)), while other code sees an empty #define, and the two symbols to not match...

I think the best soultion would be either to #include <cuda_runtime.h>, or to patch the CUDA API wrappers to include that one instead of some internal CUDA includes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved to #include <cuda_runtime.h>
and fixed the other ```NVCC``

#ifndef SimTrackerTrackerHitAssociationClusterHeterogeneousProduct_H
#define SimTrackerTrackerHitAssociationClusterHeterogeneousProduct_H

#ifndef __NVCC__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check for __CUDACC__ rather than __NVCC__


namespace trackerHitAssociationHeterogeneousProduct {

#ifndef __NVCC__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

ClusterSLGPU * gpu_d=nullptr;
};

#ifndef __NVCC__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@makortel
Copy link

makortel commented Aug 1, 2018

Fixed in #111.

fwyzard pushed a commit that referenced this pull request Dec 14, 2018
fwyzard pushed a commit that referenced this pull request Dec 15, 2018
* First implementation of deep tau id.

* Building dpf isolation module

* Adding in v1

* Adding in runTauIDMVA for other users

* making things fully reproducible

* Reorganisation of configuration files: cff split to cfi and cff

* Some code cleaning

* adapt to cfi/cff reorganization

* Review of DPF and DeepTauId code.

- Defined base class for deep tau discriminators.
- Removed weight files from home cms repository. Now using weights from cms-data.
- Defined WP for both discriminators. Now all discriminators return the corresponding WP results.
- Removed cfi files. Using fillDescriptions instead.
- General code review and cleaning.

* Added example of a python configuration file to produce pat::Tau collection with the new Tau-Ids

* requested changes on runDeepTauIDsOnMiniAOD.py

* Clean runTauIdMVA.py tool and test config to run tauIDs

* Made DeepTauId and DPFIsolation thread-safe

* Finish implement thread-safe requirements on DPFIsolation

* Disable DPFTau_2016_v1 and issue some warnings

* Remove assigning value of variable to itself

* - Implemented on runTauIdMVA the option to work with new training files quantized
- Added a new parameter 'version' on runTauIdMVA, used on DPFIsolation
- Changes on DeepTauId to reduce memory consumption

* - Implementation of global cache to avoid reloading graph for each thread and reduce the memory consuption
- Creation of class DeepTauCache in DeepTauBase, in which now is created graph and session
- Implementation of two new static methods inside the class DeepTauBase: initializeGlobalCache and globalEndJob. The graph and DeepTauCache object are created now inside initializeGlobalCache

* Applied changes on DeepTauBase to allow load new training files using memory mapping

* Implemented TauWPThreshold class.

TauWPThreshold class parses WP cut string (or value) provided in the
python configuration. It is needed because the use of the standard
StringObjectFunction class to parse complex expression results in an
extensive memory usage (> 100 MB per expression).

* Remove the qm.pb input files and leaving just the quantized and the original files

* -Overall, changes to improve memory usage, among these are:
	- Implementation of global cache to avoid reloading graph for each thread
	- Creation of two new static methods inside the class DeepTauBase: initializeGlobalCache and globalEndJob. The graph and DeepTauCache object are created now inside initializeGlobalCache. The memory consumption of initializeGlobalCache for the original, quantized and files that are load using memory mapping method are in the memory_usage.pdf file
	- Implemented configuration to use new training files quantized, and set them as default
	- Implementation of configuration for load files using memory mapping. In our case there wasn't any improvement, respect at the memory consumption of this method, respect the quantized files, so this is not used, but set for future training files
- General code review and cleaning.

* Applied style comments

* Applied style comments

* Applied comments

* Change to be by default the original training file for deepTau, instead of the quantized

* Changes regarding forward-porting DNN-related developments from the PRs #105 and #106 from 94X to 104X

* Applied commets of previus PR

* cleaning code

* Modification in the config to work with new label in files

* Applied comment about the expected format of name of training file

* Fix in last commit

* Applied last comments

* Changes regarding forward-porting DNN-related developments from the PRs #105 and #106 from 94X to 104X

* Applied @perrotta comments on 104X

* Fix error

* Applied comments

* Applied comments

* Fix merge problem

* Applied a few commets

* Applied more changes

* Applied a few small followups

*  Fixed error on DPFIsolation

* Update DPFIsolation.cc

* - RecoTauTag/RecoTau/plugins/DeepTauId.cc: Remove ' clusterVariables 'as a  class member
- RecoTauTag/RecoTau/test/runDeepTauIDsOnMiniAOD.py: Update globaltag and sample

* Added changes in RecoTauTag/RecoTau/python/tools/runTauIdMVA.py made in the commit 194a1d5 from the PR cms-sw#25016

* Fix error on runDeepTauIDsOnMiniAOD

* Change the GT in RecoTauTag/RecoTau/test/runDeepTauIDsOnMiniAOD.py
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Oct 19, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard added a commit that referenced this pull request Nov 27, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard added a commit that referenced this pull request Nov 28, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Dec 25, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Dec 26, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard added a commit that referenced this pull request Dec 26, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Jan 15, 2021
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
fwyzard pushed a commit that referenced this pull request Apr 1, 2021
Implement a heterogeneous Cluster-to-TrackingParticle associator running on the GPU.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants