Make Clusterization Result Correct #197

beomki-yeo · 2022-05-22T14:36:03Z

The clusterization result has been wrong which makes the seeding efficiency of seq_example drop to nearly zero.
And I am glad that I finally fixed the problems (as far as I can detect)

There are ~~two~~ three bugs that I fixed:

Bug 1. Wrong binning setup for clusterization

The data files that I updated in traccc-data were generated from the digitization configuration file (acts/Examples/Algorithms/Digitization/share/default-geometric-config-generic.json), which sets different binning for each module.

Meanwhile, the traccc cell_module always have the same binning scheme:

// in cell.hpp
pixel_data pixel{-8.425, -36.025, 0.05, 0.05};

Thus, I took out binning information using Acts::GeometryHierarchyMap and apply to pixel_data in csv.hpp

Bug 2. Cell module was shifted during reading in `csv.hpp`

There was still a mismatch between header and item of cell_container. It was because the cell module information of the next cell group was recorded for the current cell group. It could be fixed by replacing geometry_id to reference_id in read_cells function.

// in csv.hpp
    while (creader.read(iocell)) {

        if (first_line_read and iocell.geometry_id != reference_id) {
            if (tfmap != nullptr) {
               // reference_id should be used instead of geometry_id
                if (tfmap->contains(iocell.geometry_id)) { 
                    module.placement = (*tfmap)[iocell.geometry_id];
                }
            }
            ...

Bug 3. Use measurement-simhit-map for correct indexing

See #197 (comment)

Physics performance after fix

I will compare two cases:

seq_example with ttbar<140> cell file (clusterization+seeding)

seeding_example with ttbar<140> hit file (seeding)

beomki-yeo · 2022-05-24T13:59:57Z

One thing that I still don't understand is some events of ttbar<200> and ttbar<300> give me terrible efficiency.
Precisely, 5th event of ttbar<200> and many of ttbat<300> have the problem
All ttbar<20 - 140> samples look fine.

./bin/traccc_seq_example --detector_file=tml_detector/trackml-detector.csv --digitization_config_file=tml_detector/default-geometric-config-generic.json --cell_directory=tml_full/ttbar_mu200/ --hit_directory=tml_full/ttbar_mu200/ --particle_directory=tml_full/ttbar_mu200/ --events=1 --skip=4

beomki-yeo · 2022-05-24T22:08:32Z

Um finally, I found the reason.
I was overlooking the existence of measurement-simhit-map.csv. In the past, I simply ignored it because I thought measurement id and hit id are always equal, which is wrong. That's why the seeding efficiency dropped after a random eta value
Now it is fixed.

10 ttbar<200> events:

beomki-yeo · 2022-05-26T17:52:05Z

Anybody review please :p

krasznaa

Apart from questions about how tolerances are handled in the tests, I think this looks good.

I was very pleased to see that the SYCL code was propagating this info everywhere correctly already. Kudos to @konradkusiak97. I was absolutely expecting that we would've forgotten about at least 1-2 places. 😛

Depending on how you feel about setting up the tests, you could just go ahead and merge the PR like this. 😉

krasznaa · 2022-05-28T15:14:14Z

tests/common/tests/cca_test.hpp

@@ -107,7 +107,7 @@ class ConnectedComponentAnalysisTests
                result.at(io_truth.geometry_id);

            traccc::scalar tol = std::max(
-                0.01, 0.0001 * std::max(io_truth.channel0, io_truth.channel1));
+                0.1, 0.0001 * std::max(io_truth.channel0, io_truth.channel1));


😕 Did the tolerance worsen as a result of all of this? I thought if anything, this would make the test more accurate...

Well. the test didn't pass the variance test after changes.

EXPECT_NEAR(match->variance[0], io_truth.variance0, tol); EXPECT_NEAR(match->variance[1], io_truth.variance1, tol);

TBH I don't know how this test works and data was created :/ But it's not surprising at all as out variance estimation is not that robust. My guess is that regenerating the synthetic data with the changes would solve the problem here

I think I figured out the reason.
The default pitch size of pixel_data is 1 which makes the reconstructed variance (match->variance[0] and match->variance[1] ) 1/12 or ~0.08.
And the io_truth.variance0 (which is truth input from synthetic data) is 0.
That's why 0.01 tolerance doesn't work here.

Then someone will ask why it has been OK so far.
Before this PR, pixel_data of cell_module gets the default pitch size of 0.05:

pixel_data pixel{-8.425, -36.025, 0.05, 0.05};

This makes the reconstructed variance 0.05/12, which is fine with 0.01 tolerance.
This PR removes the default value of pixel_data and causes the unit test fails with 0.08 variance.

The conclusion is cca_test or synthetic data needs some modification :p...

krasznaa · 2022-05-28T15:25:24Z

tests/cpu/test_clusterization_resolution.cpp

+            // Make sure that the difference in spacepoint position is less than
+            // 1%
+            EXPECT_TRUE(
+                traccc::getter::norm(sp_recon.global - sp_truth.global) /
+                    traccc::getter::norm(sp_recon.global) <
+                0.01);


🤔 Interesting. Is a percentage figure really the most adequate here? In reality I would hope that our reconstruction accuracy is some absolute value on the surface of the detectors. And not 0 in one corner, and some larger value in the opposite corner. To me the most physically accurate test here would be to say that the reconstructed position has to be within X um from the true position.

Which I would then express like:

EXPECT_LT(traccc::getter::norm(sp_recon.global - sp_truth.global), 0.01);

Assuming that we want to go for 10 um for instance. Though admittedly at this point we should start using the units from Acts to express these magic numbers more clearly.

You are right but let's go with percentage for the moment. Once we add lorentz shift to clusterization algorithm, we can test with expected spatial resolution.

beomki-yeo · 2022-05-28T17:21:16Z

@krasznaa Hmm I think it needs the approval again after force pushing

krasznaa

Let's go ahead like this then.

beomki-yeo added the bug Something isn't working label May 22, 2022

beomki-yeo requested review from paulgessinger and krasznaa May 22, 2022 14:36

beomki-yeo requested a review from stephenswat May 26, 2022 17:53

krasznaa approved these changes May 28, 2022

View reviewed changes

Make clusterization result correct

ce35752

beomki-yeo force-pushed the feat-digitization_info branch from a8592f5 to ce35752 Compare May 28, 2022 17:06

beomki-yeo mentioned this pull request May 28, 2022

Refactor Clusterization #200

Merged

krasznaa approved these changes May 29, 2022

View reviewed changes

krasznaa merged commit 86f21a4 into acts-project:main May 29, 2022

beomki-yeo mentioned this pull request Oct 26, 2023

Improve CCA Unit Test #476

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Clusterization Result Correct #197

Make Clusterization Result Correct #197

beomki-yeo commented May 22, 2022 •

edited

Loading

beomki-yeo commented May 24, 2022

beomki-yeo commented May 24, 2022

beomki-yeo commented May 26, 2022

krasznaa left a comment

krasznaa May 28, 2022

beomki-yeo May 28, 2022 •

edited

Loading

beomki-yeo May 29, 2022

krasznaa May 28, 2022

beomki-yeo May 28, 2022

beomki-yeo commented May 28, 2022

krasznaa left a comment

Make Clusterization Result Correct #197

Make Clusterization Result Correct #197

Conversation

beomki-yeo commented May 22, 2022 • edited Loading

Bug 1. Wrong binning setup for clusterization

Bug 2. Cell module was shifted during reading in csv.hpp

Bug 3. Use measurement-simhit-map for correct indexing

Physics performance after fix

beomki-yeo commented May 24, 2022

beomki-yeo commented May 24, 2022

beomki-yeo commented May 26, 2022

krasznaa left a comment

Choose a reason for hiding this comment

krasznaa May 28, 2022

Choose a reason for hiding this comment

beomki-yeo May 28, 2022 • edited Loading

Choose a reason for hiding this comment

beomki-yeo May 29, 2022

Choose a reason for hiding this comment

krasznaa May 28, 2022

Choose a reason for hiding this comment

beomki-yeo May 28, 2022

Choose a reason for hiding this comment

beomki-yeo commented May 28, 2022

krasznaa left a comment

Choose a reason for hiding this comment

beomki-yeo commented May 22, 2022 •

edited

Loading

Bug 2. Cell module was shifted during reading in `csv.hpp`

beomki-yeo May 28, 2022 •

edited

Loading