Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track parameter estimation for cpu and cuda #85

Merged
merged 3 commits into from
Dec 10, 2021

Conversation

beomki-yeo
Copy link
Contributor

@beomki-yeo beomki-yeo commented Sep 7, 2021

traccc_pr_75 should be merged first before this PR

This is for track parameter estimation from found seeds.
The algorithm is almost copy-and-paste of EstimateTrackParamsFromSeed in Acts - a different thing is that the local position on surface is obtained from measurement directly while the acts one obtains it with global-to-local transformation and surface object.

There are a couple of features and issues that people might be interested

  1. The size of eigen-based class/structure is measurement differently by cpu and cuda compiler which makes data transfer unstable and wrong. So the data size is synchronized by adding a data align qualifier (see core/include/definitions/qualifiers.hpp)
  2. I initially put std::optional< measurement > member in spacepoint to refer it when I obtain the local track parameter, but it turned out that std::optional doesn't work in cuda device code - or anyone knows how to make it work??
  3. The algebras used in track parameter estimation utilize algebra-plugin. But it seems it doesnt work with array plugin since the functions in algebra-plugin do not have host device qualifiers

Meanwhile I have observed that seed finding doesn't work with eigen_plugin and double scalar type ... I will look into that and make another PR to resolve it

TO-DO list:

  1. measure the time for cpu and cuda
  2. optimization

Update on Sep 22

This can be merged after #92

@beomki-yeo beomki-yeo changed the title Track parameter estimation for cpu and cuda WIP: Track parameter estimation for cpu and cuda Sep 7, 2021
Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already made a lot of comments, I would stop here for now. I think we'd need to sort out what we want to do with the container classes first. As right now I just don't think that they add any value over using the underlying vecmem types directly.

core/include/edm/collection.hpp Outdated Show resolved Hide resolved
core/include/seeding/track_params_estimation_helper.hpp Outdated Show resolved Hide resolved
@@ -55,7 +55,7 @@ struct seed_finding {
/// Callable operator for the cuda seed finding
///
/// @return seed_container is the vecmem seed container
host_seed_container operator()(
host_seed_container& operator()(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait... what? Are you returning a permanent object all of a sudden? This does not seem thread-safe at all... 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this change was to use the return output of certain algorithm as an input of next algorithm directly. With copy method, the memory resource is missing so we cannot use it directly for the next algorithm.

auto output = cuda_algorithm(input) // output cannot be used as an input for the next algorithm bec

But as you mentioned that it is not thread-safe, I reverted it to copy method and do some modification:

algorithm::output_type output(mr);
output = cuda_algorithm(input)

Is it okay to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. (Sorry for the late answer... 😦)

track_params_estimation(vecmem::memory_resource* mr)
: m_mr(mr), params(mr) {}

output_type& operator()(input_type& i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, returning a non-const reference to a member object seems terribly thread-unsafe. I would've just opted for copy elision on return in this function...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answered above

@@ -22,7 +22,9 @@ class seeding_algorithm {
using output_type =
std::pair<host_internal_spacepoint_container, host_seed_container>;

seeding_algorithm(vecmem::memory_resource* mr = nullptr) : m_mr(mr) {
seeding_algorithm(vecmem::memory_resource* mr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is not the only place where this pattern shows up, but this is where I'll comment about it...

We should only pass the memory resources around by pointer if it is allowed to pass a null pointer. In this case it's not really allowed anymore. So it would make for a better interface (in my opinion) to receive the memory resource as a reference. (vecmem::memory_resource&)

It is indeed annoying that the std classes receive memory resources through pointers. Since they do allow the pointers to be invalid. (In which case they fall back on a default resource.) But with vecmem in most cases we just need to pass explicitly created memory resources around. (Otherwise compilation on macOS can fail. As the Apple Clang compiler doesn't handle "default memory resources" quite correctly. Which provides a pretty good debugging tool for vecmem in the end. 😛)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey wait. If the input is reference, it's not polymorphic anymore for different memory_resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you just want it to be (vecmem::cuda::memory_resource& mr) for cuda algorithm and (vecmem::sycl::memory_resource& mr) for sycl algorithm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References in C++ are the same in terms of polymorpism to pointers, so this should not be a problem. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks. The changes are applied


private:
output_type params;
std::shared_ptr<traccc::cuda::track_params_estimation> tp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to use a shared pointer here? Remember, these things are not cheap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessarily a problem here; this is a pointer to a function object, so it is unlikely to be replaced very often. Since std::shared_ptr has zero de-referencing overhead, I don't foresee any performance degradation here.

Whether it is really necessary is debatable, but I can see some benefit to this approach given that function objects in C++ are often assumed to be copyable. Using shared pointers here can feasibly reduce the overhead of that copying. But perhaps Beomki had different reasons in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenswat is right - This algorithm objects are generated only once before the actual algorithm runs. It could also be normal objects but I was too lazy to make a copy operator for algorithm classes. And if it doesn't introduce any overheads, we can just keep this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as for traccc::track_params_estimation_algorithm, I don't see what's the purpose of this class. 😕 What interface does it add on top of traccc::cuda::track_params_estimation?

Also, why does it not inherit from the traccc::algorithm base class? 😕

@beomki-yeo beomki-yeo changed the title WIP: Track parameter estimation for cpu and cuda Track parameter estimation for cpu and cuda Sep 7, 2021
@beomki-yeo
Copy link
Contributor Author

I update the algebra_plugins to the latest commit: the rotate function of eigen algebra is missing in the current version

@beomki-yeo
Copy link
Contributor Author

Current PR compiles, but there is a runtime error in cuda track parameter estimation.
It must work when algebra_plugin_PR17 is merged, which adds inline to eigen functions

@beomki-yeo
Copy link
Contributor Author

Naive estimation shows that the averaged track parameter estimation time is:
cpu: ~ 0.0009 sec
cuda: ~ 0.0012 sec

So cuda is slightly slower than cpu unfortunately. It's not a serious problem since the cuda seed finding time is usually 0.05 sec.
But let me profile and optimize it a bit for better performance.

@stephenswat
Copy link
Member

If we're talking about 1.2 milliseconds here we're basically teetering on the border of what is achievable with memory transfer overhead and launch delays. Just following Amdahl's law I wouldn't necessarily recommend spending too much time optimizing this part of the code. Great job! 😄

@beomki-yeo
Copy link
Contributor Author

@stephenswat OK. I will checkout whether the results are the same with Acts track parameter estimation before this PR goes in. Will let you know once it is done.

@beomki-yeo
Copy link
Contributor Author

I added a test file tests/cpu/algorithms/compare_with_acts_seeding.cpp, which checks the results between acts and traccc seeding algorithm. Since I needed a hit file data so opened a traccc-data_MR1 in traccc-data repository.

@asalzburger or anyone who has write access to traccc-data can review the MR?

@beomki-yeo
Copy link
Contributor Author

beomki-yeo commented Sep 13, 2021

Other than than traccc-data_MR1, I also need algebra-plugin_PR17 to be merged to make this PR work (@niermann999 or anybody could have a look?)

@beomki-yeo
Copy link
Contributor Author

Oh I made the MR in wrong repository. I moved it to the official one: traccc_data_mr_1

@beomki-yeo
Copy link
Contributor Author

Are there more comments on this?

Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still very much in favour of this development, but I think the code could be simplified a bit in a number of places...

const seedfinder_config m_seedfinder_config;
const seedfilter_config m_seedfilter_config;
multiplet_estimator m_estimator;
seed_filtering m_seed_filtering;

vecmem::memory_resource& m_mr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I know... I keep harping on about this one issue... 😦

In my mind a class that uses vecmem::memory_resource should look like:

class my_class {

public:
   my_class( ..., vecmem::memory_resource& mr, ... ) : ..., m_mr( &mr ), ...

   my_class( const my_class& ) = default;
   my_class( my_class&& ) = default;

   my_class& operator=( const my_class& ) = default;
   my_class& operator=( my_class&& ) = default;

   vecmem::memory_resource& memory_resource() { return *m_mr; }
   const vecmem::memory_resource& memory_resource() const { return *m_mr; }

private:
   vecmem::memory_resource* m_mr;
};

So while its public interface should receive and hand out memory resources by reference (showing that the memory resource is an existing one for sure), internally it should use a pointer. This is because a reference member makes objects of the class non-copyable. Since references cannot be modified after creation. But if the member is a simple pointer, the default copy/move constructors/operators should work just fine for the class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I had to realise through acts-project/vecmem#114, it's probably an even better idea to use std::reference_wrapper<vecmem::memory_resource> as the variable that "points at" the memory resource. As that both ensures that we would initialise it correctly under all circumstances with a reference, and also allows us to copy/assign our own objects with such variables.

https://en.cppreference.com/w/cpp/utility/functional/reference_wrapper

So in all the places where I was complaining about vecmem::memory_resource member variables, the most ideal would seem to switch to using std::reference_wrapper<vecmem::memory_resource>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. definitely reference_wrapper would be better. Fixed now

@@ -0,0 +1,26 @@
/** TRACCC library, part of the ACTS project (R&D line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely a question of convention... But I would rather call this file track_params_estimating.hpp. I think we should only use the .cuh extension on files that have actual CUDA code in them. Like a template class that launches kernels. (Which is not a good design idea, I just brought it up as an example.)

I myself took to organising my own code like the following recently:

  • helper_function.hpp: Declaration of the helper function, using regular C++;
  • helper_function.cu: Implementation of the helper function, using the CUDA runtime API, and implementing device code.

In such a setup one can come up with the relatively simple convention, that .cuh header files must only be included by .cu source files. As .cpp source files would not be able to interpret them. While .hpp header files can be included by both .cpp and .cu files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see if that's a usual convention. I modified all .cuh into .hpp

namespace cuda {

/// track parameter estimation for cuda
class track_params_estimation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I'm not sure what would be the best names, but traccc::cuda::track_params_estimating and traccc::cuda::track_params_estimation are way too similar.

But I'm also not sure what role this class is playing in general. 😕 Was this necessary for the previous implementation of how algorithms worked? In the current setup I think this is an unnecessary middle layer between trackccc::track_params_estimation_algorithm and traccc::cuda::track_params_estimating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I removed track_params_estimating since it is confusing as you mentioned. Instead, I directly implement operator() in track_params_estimation.cu. Now it will look much better.

For the algorithm part, I will comment out in the below as well; I removed the track_params_estimation_algorithm as it seems not needed for the moment. The initial motivation of algorithm one is for setting the configuration parameter rather than touching the original track_parameter_estimation in core directory. but It is apparent that we don't need the algorithm version because there is not any configuration parameter for track parameter estimation for the moment.

/// @param resource vecmem memory resource
void track_params_estimating(host_seed_container& seeds,
host_bound_track_parameters_collection& params,
vecmem::memory_resource* resource);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should rather be a vecmem::memory_resource& reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Anyway this file is removed


namespace traccc {

class track_params_estimation_algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm continuing to be confused...

Is this class supposed to (eventually) decide between using the host or the CUDA implementation? If not, why does it exist? What additional value does it give on top of using traccc::track_params_estimation directly?

I'm probably missing something, but the naive UI in my head would be to have:

  • traccc::track_params_estimation_algorithm: An "algorithm" for running the track parameter estimation on the host.
  • traccc::cuda::track_params_estimation_algorithm: An "algorithm" for running the track parameter estimation using CUDA.

I think some of the layers introduced in this PR are not actually necessary. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained above, these are removed now


private:
output_type params;
std::shared_ptr<traccc::cuda::track_params_estimation> tp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as for traccc::track_params_estimation_algorithm, I don't see what's the purpose of this class. 😕 What interface does it add on top of traccc::cuda::track_params_estimation?

Also, why does it not inherit from the traccc::algorithm base class? 😕

Comment on lines 54 to 71
traccc::clusterization_algorithm::input_type ca_input(cells_per_event);
traccc::clusterization_algorithm::output_type ca_output;

ca_output = ca(ca_input);

auto& measurements_per_event = ca_output.first;
auto& spacepoints_per_event = ca_output.second;

/*-----------------------
Seeding algorithm
-----------------------*/

traccc::seeding_algorithm::input_type sa_input(spacepoints_per_event);
traccc::seeding_algorithm::output_type sa_output;

sa_output = sa(sa_input);

auto& internal_sp_per_event = sa_output.first;
auto& seeds = sa_output.second;

/*----------------------------
Track params estimation
----------------------------*/

traccc::track_params_estimation_algorithm::input_type tp_input(seeds);
traccc::track_params_estimation_algorithm::output_type tp_output;

tp_output = tp(tp_input);

auto& params = tp_output;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the updated algorithms semantics this could surely be simplified quite a bit. Most of these lines should not be needed anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 342 to 349
// Google Test can be run manually from the main() function
// or, it can be linked to the gtest_main library for an already
// set-up main() function primed to accept Google Test test cases.
int main(int argc, char** argv) {
::testing::InitGoogleTest(&argc, argv);

return RUN_ALL_TESTS();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should rather be linked against GTest::gtest_main than to implement the main() function for ourselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply removed int main(..) since it was already linked to gtest. Not sure if it is what you meant though

@beomki-yeo
Copy link
Contributor Author

beomki-yeo commented Sep 29, 2021

I applied the update algorithm semantics to the seed finding as well as track parameter estimation even though it was not the initial purpose of this PR. I used the ravlue reference for the cuda algorithms since const lvalue reference doesn't work for vecmem containers. Maybe @stephenswat can look into the code to check if I didn't follow well the intention of the semantics.

I didn't edit spacepoint_grouping for the updated algorithm semantics but we can leave it as it is. Because I will make another PR to remove it and rewrite the algorithm with detray::grid2 once acts-project/detray#96 is merged

@beomki-yeo
Copy link
Contributor Author

Squashed into one commit

@stephenswat stephenswat added the feature New feature or request label Nov 11, 2021
@beomki-yeo
Copy link
Contributor Author

beomki-yeo commented Nov 12, 2021

Rebased with main branch.
With #103, the seq_example now runs for both eigen and array plugin.
However, This PR does not generate the execution files for different plugins with a single compilation - need to change cmake options to get execution file for different plugin. (Default option prioritizes the eigen plugin)

Event though it is not very desirable, let's make this happen in future PR... :)

pinging @krasznaa @stephenswat

p.s. overall run time does not change a lot between array and eigen plugin

Copy link
Member

@stephenswat stephenswat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good work here overall! It feels like there is a lot of unrelated work in this PR though - is that here for a reason or did it get included by accident? I also have some other comments, but otherwise I think this should be close to ready. 😄

m.variance = var / totalWeight;
m.variance = 1 / totalWeight * var;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to make this change? It seems to me like it would be slower and less numerically accurate. 😦

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just becuase operator / is not defined for array in algebra-plugins

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will mark with TODO for later change

Comment on lines -13 to +18
#include <cuda/seeding/doublet_counting.cuh>
#include <cuda/seeding/doublet_finding.cuh>
#include <cuda/seeding/seed_selecting.cuh>
#include <cuda/seeding/triplet_counting.cuh>
#include <cuda/seeding/triplet_finding.cuh>
#include <cuda/seeding/weight_updating.cuh>
#include <cuda/seeding/doublet_counting.hpp>
#include <cuda/seeding/doublet_finding.hpp>
#include <cuda/seeding/seed_selecting.hpp>
#include <cuda/seeding/triplet_counting.hpp>
#include <cuda/seeding/triplet_finding.hpp>
#include <cuda/seeding/weight_updating.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we moving away from the .cuh extension? That's quite standard for CUDA programs. 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so. But they don't have any cuda related code inside .cuh so decided to use .hpp extension for them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was one of my recommendations to use .hpp on headers that can be included by not only .cu files, but other C++ source files as well.

Since as I explained before, I like to use .cuh as a "contagious" extension. Marking something as .cuh should be done only when that particular header has code for launching a kernel, or when it includes some other .cuh file. In all other cases we should just use .hpp.

mutable std::shared_ptr<std::mutex> mutex{std::make_shared<std::mutex>()};
mutable host_doublet_counter_container doublet_counter_container;
mutable host_doublet_container mid_bot_container;
mutable host_doublet_container mid_top_container;
mutable host_triplet_counter_container triplet_counter_container;
mutable host_triplet_container triplet_container;
mutable host_seed_container seed_container;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised to find this here. Are we relying on the internal state of our algorithms? I really think we should aim to have our algorithms be stateless!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know.
Without these, I have to create host_doublet/triplet_container in operator() function for every event which makes entire algorithm much slower (it's almost 50% slowdown). Do you have a better idea on this?

device/cuda/include/cuda/seeding/seed_finding.hpp Outdated Show resolved Hide resolved
/// @return seed_collection is the vector of seeds per event
output_type operator()(
host_internal_spacepoint_container&& i) const override {
std::lock_guard<std::mutex> lock(*mutex.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really need to do locking (which I hope we don't!) I think this is probably a little coarse-grained. It will be hard to get any decent acceleration from parallelism this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also didn't want to make this change. The only way to remove mutable internal members whould be taking them as external parameters - but this is also coarse-grained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I don't think we have to achieve more parallelization in cuda operator(), which is grid bin level.

Comment on lines 59 to 65
auto uvTransform = [](const vector3& local) -> vector2 {
vector2 uv;
scalar denominator = local[0] * local[0] + local[1] * local[1];
uv[0] = local[0] / denominator;
uv[1] = local[1] / denominator;
return uv;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't actually act as a closure - maybe we can just define it as a free function? Or maybe move it the algebra plugins?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't recommend moving it into algebra plugin.
I will just make a free function in the same file

device/cuda/src/seeding/track_params_estimation.cu Outdated Show resolved Hide resolved
Comment on lines -19 to +22
class seeding_algorithm {
public:
using input_type = host_spacepoint_container&;
using output_type =
std::pair<host_internal_spacepoint_container, host_seed_container>;
class seeding_algorithm
: public algorithm<
std::pair<host_internal_spacepoint_container, host_seed_container>(
host_spacepoint_container&&)> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes related to the track parameter seeding in some way or have they snuck into this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They snuck.

examples/run/cuda/seq_example_cuda.cpp Outdated Show resolved Hide resolved
std::reference_wrapper<vecmem::memory_resource> m_mr;

// mutable internal objects for multiplets
mutable std::shared_ptr<std::mutex> mutex{std::make_shared<std::mutex>()};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People are okay with these mutable objects and mutex lock?
Without these, I have to declare host_doublet/multiplet_container in operator() function for every event which makes entire algorithm much slower (it's almost 50% slowdown)

m.variance = var / totalWeight;
m.variance = 1 / totalWeight * var;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just becuase operator / is not defined for array in algebra-plugins

Comment on lines -13 to +18
#include <cuda/seeding/doublet_counting.cuh>
#include <cuda/seeding/doublet_finding.cuh>
#include <cuda/seeding/seed_selecting.cuh>
#include <cuda/seeding/triplet_counting.cuh>
#include <cuda/seeding/triplet_finding.cuh>
#include <cuda/seeding/weight_updating.cuh>
#include <cuda/seeding/doublet_counting.hpp>
#include <cuda/seeding/doublet_finding.hpp>
#include <cuda/seeding/seed_selecting.hpp>
#include <cuda/seeding/triplet_counting.hpp>
#include <cuda/seeding/triplet_finding.hpp>
#include <cuda/seeding/weight_updating.hpp>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so. But they don't have any cuda related code inside .cuh so decided to use .hpp extension for them.

/// @return seed_collection is the vector of seeds per event
output_type operator()(
host_internal_spacepoint_container&& i) const override {
std::lock_guard<std::mutex> lock(*mutex.get());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also didn't want to make this change. The only way to remove mutable internal members whould be taking them as external parameters - but this is also coarse-grained.

/// @return seed_collection is the vector of seeds per event
output_type operator()(
host_internal_spacepoint_container&& i) const override {
std::lock_guard<std::mutex> lock(*mutex.get());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I don't think we have to achieve more parallelization in cuda operator(), which is grid bin level.

mutable std::shared_ptr<std::mutex> mutex{std::make_shared<std::mutex>()};
mutable host_doublet_counter_container doublet_counter_container;
mutable host_doublet_container mid_bot_container;
mutable host_doublet_container mid_top_container;
mutable host_triplet_counter_container triplet_counter_container;
mutable host_triplet_container triplet_container;
mutable host_seed_container seed_container;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know.
Without these, I have to create host_doublet/triplet_container in operator() function for every event which makes entire algorithm much slower (it's almost 50% slowdown). Do you have a better idea on this?

output_type operator()(host_seed_container&& seeds) const override;

private:
std::reference_wrapper<vecmem::memory_resource> m_mr;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something I don't understand well. what is normal wrapper?

Comment on lines -19 to +22
class seeding_algorithm {
public:
using input_type = host_spacepoint_container&;
using output_type =
std::pair<host_internal_spacepoint_container, host_seed_container>;
class seeding_algorithm
: public algorithm<
std::pair<host_internal_spacepoint_container, host_seed_container>(
host_spacepoint_container&&)> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They snuck.

@beomki-yeo
Copy link
Contributor Author

is that here for a reason or did it get included by accident?

No good reason for that... :( I will try to make PR more clear in the future

@beomki-yeo
Copy link
Contributor Author

Are there more comments on this? Otherwise I want it to get merged in

@beomki-yeo
Copy link
Contributor Author

@stephenswat @krasznaa Are there more issues that I need to address?

Copy link
Member

@stephenswat stephenswat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me - just two small things I would like to follow up on. When that is done we can check whether Attila or anyone else has any comments, and then I would be in favour of moving to merge. 😄

core/include/definitions/primitives.hpp Show resolved Hide resolved
}

private:
std::reference_wrapper<vecmem::memory_resource> m_mr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge deal, but I still don't see the need to use reference wrappers here as opposed to normal wrappers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's something I am not sure - it is recommended by @krasznaa. I think he can explain us the motivation for this.

Copy link
Member

@stephenswat stephenswat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll sign off on this for now, but I will leave just a bit of time for others to comment if they want.

@beomki-yeo
Copy link
Contributor Author

@stephenswat looks like there is no comment anymore. Can you get this in? We can come back to the wrapper thing at any time in the future

@beomki-yeo
Copy link
Contributor Author

it's squashed into one commit

Copy link
Member

@stephenswat stephenswat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it! 👍

@stephenswat stephenswat merged commit af73a7b into acts-project:main Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants