Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow up to the Pixel alpaka migration #43796

Open
2 tasks
fwyzard opened this issue Jan 26, 2024 · 15 comments
Open
2 tasks

Follow up to the Pixel alpaka migration #43796

fwyzard opened this issue Jan 26, 2024 · 15 comments

Comments

@fwyzard
Copy link
Contributor

fwyzard commented Jan 26, 2024

This issue keeps track of the open issues from the migration of the pixel reconstruction from CUDA to Alpaka:

General comments

  • Remove the legacy CUDA modules and data formats

  • Remove "Alpaka" from the names of the modules, after the CUDA ones are retired

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 26, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @fwyzard Andrea Bocci.

@antoniovilela, @Dr15Jones, @sextonkennedy, @makortel, @smuzaffar, @rappoccio can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 26, 2024

Follow up to comments from #41285:

  • Address the TODO in DataFormats/SiPixelClusterSoA/interface/ClusteringConstants.h
    See this discussion.

  • Address why we need using namespace pixelTopology; in DataFormats/TrackingRecHitSoA/src/classes.h and DataFormats/TrackingRecHitSoA/src/alpaka/classes_*.h.

  • Move the queue parameter to the beginning of the parameter list, and pass it by reference if possible.
    See here, here and here.

  • Optimise the copies and synchronisations in DataFormats/TrackingRecHitSoA/interface/TrackingRecHitsDevice.h
    See here and here.

  • Find a better way to create CPU-only module configurations.
    See also #43780.

  • Remove duplicate and commented out code.
    For example, see here.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 26, 2024

Follow up to comments from #41286:

  • Figure out why we need using namespace pixelTopology; in DataFormats/TrackSoA/src/classes.h.
    See here.

  • In DataFormats/TrackSoA/test/alpaka/TrackSoAHeterogeneous_test.dev.cc, is it intentional to fill the values beyond the nTracks() ?
    See here.

  • I wonder if it would be useful to abstract this behavior of using direct assignment on CPU serial backend and atomic CAS on all parallel backends in a way that the #ifdef could be avoided in the algorithm code?
    See here.

  • Find a better alternative than a crash to the recursion overflow in RecoTracker/PixelSeeding/plugins/alpaka/CACell.h.
    See here.

  • Move RecoTracker/PixelTrackFitting/plugins/PixelTrackDumpAlpaka.cc under test/, and find a more appropriate name.
    See here.

  • Simplify the use of a helper class with static methods in DataFormats/TrackSoA/interface/alpaka/TrackUtilities.h.
    All the static functions operating on TrackSoAConstView could be moved to global namespace of PixelTrackLayout.h along

    template <typename TrackerTraits>
    ALPAKA_FN_HOST_ACC ALPAKA_FN_INLINE constexpr float charge(const TrackSoAConstView<TrackerTraits>& tracks, int32_t i) {
    ...
    }

    after which they could be used simply as

    float ch = charge(tracksoa, i);

    instead of

    float ch = TracksUtilities<TrackerTraits>::charge(tracksoa, i);

    Unfortunately with this approach the compiler fails the template deduction. Maybe we can find a smarter workaround...
    This seems to work, but maybe it's too generic (until we have concepts) ?

    template <typename TrackSoAViewT>
    ALPAKA_FN_HOST_ACC ALPAKA_FN_INLINE static constexpr float charge(TrackSoAViewT const& tracks, int32_t i) {
        float v = tracks[i].state()(2);
        return float((0.0f < v) - (v < 0.0f));
    }

    See the discussion here.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 26, 2024

Follow up to comments from #41287

  • Find a common approach to avoid host-to-device copies when the device is actually the cpu.
    See this discussion regarding:
        if constexpr (std::is_same_v<Device, alpaka::DevCpu>) {
            event.emplace(deviceToken_, std::move(hostProduct));
        } else {
            BeamSpotDevice deviceProduct{event.queue()};
            alpaka::memcpy(event.queue(), deviceProduct.buffer(), hostProduct.const_buffer());
            event.emplace(deviceToken_, std::move(deviceProduct));
        }

This could be addressed at the same time as we introduce support for unified memory.

@makortel
Copy link
Contributor

assign heterogeneous, reconstruction

@cms-sw/trk-dpg-l2 @cms-sw/tracking-pog-l2

@cmsbuild
Copy link
Contributor

New categories assigned: heterogeneous,reconstruction

@fwyzard,@jfernan2,@makortel,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 26, 2024

Follow up to comments from #41288

  • Remove "alpaka" from the name of the SoA collections, and rename "host"/"device" to "reference"/"target".
    See the discussion here.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 26, 2024

Follow up comments to #43294

  • Implement a general logic to add a CPU-only version of an @alpaka module.
    Replace the current ad-hoc implementations, e.g.

    pixelVerticesAlpakaSerial = pixelVerticesAlpaka.clone(
        pixelTrackSrc = 'pixelTracksAlpakaSerial',
        alpaka = None
    )
    pixelVerticesAlpakaSerial._TypedParameterizable__type = 'alpaka_serial_sync::' + pixelVerticesAlpaka._TypedParameterizable__type.removesuffix('@alpaka')

    with a general-purpose function.

  • Implement pixel-only triplet workflows

  • Decide if we need to implement CPU-only workflows (e.g. .401).
    This could be identical to the .402 workflows, with the cmsDriver.py --accelerators cpu option.
    During the HLT developments with GPU meeting on 29/01/2024 we discussed this issue, and decided against creating the CPU-only workflows.

  • Add the validation workflows to Configuration/PyReleaseValidation/python/relval_gpu.py, and configure the bot to run them.

@AdrianoDee
Copy link
Contributor

(@fwyzard feel free to include it in the comment above, I would delete this in case)

@AdrianoDee
Copy link
Contributor

AdrianoDee commented Jan 30, 2024

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 9, 2024

Issues found while working on the follow up to the Alpaka integration in CMSSW, #43853:

  • handle size --> size+1 when creating a SiPixelClustersHost from a SiPixelClustersDevice
  • move digi errors to the front of the collection
@@ ...
-          err->push_back(SiPixelErrorCompact{rawId, ww, error, fedId});
+          err[gIndex].pixelErrors() = SiPixelErrorCompact{rawId, ww, error, fedId};
+          alpaka::atomicInc(acc, &err.size(), 0xffffffff, alpaka::hierarchy::Blocks{});
  • rewrite all code using for_each_element_in_block_strided etc
    • element_index_range_in_block
    • element_index_range_in_grid
    • for_each_element_in_block_strided
    • for_each_element_in_grid_strided
    • next_valid_element_index_strided
      from:
    • RecoLocalTracker/SiPixelRecHits/plugins/alpaka/PixelRecHits.h
    • RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToClusterKernel.dev.cc
    • RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/CalibPixel.h
    • RecoTracker/PixelSeeding/plugins/alpaka/CAPixelDoubletsAlgos.h
    • RecoTracker/PixelSeeding/plugins/alpaka/CAFishbone.h
    • RecoTracker/PixelSeeding/plugins/alpaka/CAHitNtupletGeneratorKernelsImpl.h
  • Make consistent the naming of SoA, Layout, etc. using TrackingRecHit vs TrackingRecHits
  • Make all inclusions guards consistent with the file names
  • Re-read and simplify RecoTracker/PixelSeeding/plugins/alpaka/CAHitNtupletGenerator.cc

@AdrianoDee
Copy link
Contributor

AdrianoDee commented Aug 25, 2024

  • The VertexSoA has a unique index for the tracks and the vertices and so it needs to be sized for the bigger of the two (the tracks, of course). This means we are wasting memory (this is true also for Phase1). We need to split the two indices.

@fwyzard
Copy link
Contributor Author

fwyzard commented Aug 26, 2024

  • The VertexSoA has a unique index for the tracks and the vertices and so it needs to be sized for the bigger of the two (the tracks, of course). This means we are wasting memory (this is true also for Phase1). We need to split the two indices.

@ericcano implemented this in #43952; I thought the interface was too cumbersome and would like to develop something more user-friendly, but if the impact is already measurable we could go ahead with the proposed implementation and iterate on it further in the future.

@makortel
Copy link
Contributor

  • The VertexSoA has a unique index for the tracks and the vertices and so it needs to be sized for the bigger of the two (the tracks, of course). This means we are wasting memory (this is true also for Phase1). We need to split the two indices.

@ericcano implemented this in #43952; I thought the interface was too cumbersome and would like to develop something more user-friendly, but if the impact is already measurable we could go ahead with the proposed implementation and iterate on it further in the future.

Just to write out loud a possible alternative: implement ZVertex{Host,Device}SoA with two PortableCollection members for the arrays of different number of elements. I'm not sure though if this approach would be significantly simpler (except maybe for the dictionaries). If the complexity is more in how the Views are being used, I'd expect an abstraction with simpler interface be as feasible in both approaches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants