Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Clusterization Result Correct #197

Merged
merged 1 commit into from
May 29, 2022

Conversation

beomki-yeo
Copy link
Contributor

@beomki-yeo beomki-yeo commented May 22, 2022

The clusterization result has been wrong which makes the seeding efficiency of seq_example drop to nearly zero.
And I am glad that I finally fixed the problems (as far as I can detect)

There are two three bugs that I fixed:

Bug 1. Wrong binning setup for clusterization

The data files that I updated in traccc-data were generated from the digitization configuration file (acts/Examples/Algorithms/Digitization/share/default-geometric-config-generic.json), which sets different binning for each module.

Meanwhile, the traccc cell_module always have the same binning scheme:

// in cell.hpp
pixel_data pixel{-8.425, -36.025, 0.05, 0.05};

Thus, I took out binning information using Acts::GeometryHierarchyMap and apply to pixel_data in csv.hpp

Bug 2. Cell module was shifted during reading in csv.hpp

There was still a mismatch between header and item of cell_container. It was because the cell module information of the next cell group was recorded for the current cell group. It could be fixed by replacing geometry_id to reference_id in read_cells function.

// in csv.hpp
    while (creader.read(iocell)) {

        if (first_line_read and iocell.geometry_id != reference_id) {
            if (tfmap != nullptr) {
               // reference_id should be used instead of geometry_id
                if (tfmap->contains(iocell.geometry_id)) { 
                    module.placement = (*tfmap)[iocell.geometry_id];
                }
            }
            ...

Bug 3. Use measurement-simhit-map for correct indexing

See #197 (comment)

Physics performance after fix

I will compare two cases:

  1. seq_example with ttbar<140> cell file (clusterization+seeding)

  1. seeding_example with ttbar<140> hit file (seeding)

@beomki-yeo beomki-yeo added the bug Something isn't working label May 22, 2022
@beomki-yeo
Copy link
Contributor Author

One thing that I still don't understand is some events of ttbar<200> and ttbar<300> give me terrible efficiency.
Precisely, 5th event of ttbar<200> and many of ttbat<300> have the problem
All ttbar<20 - 140> samples look fine.

./bin/traccc_seq_example --detector_file=tml_detector/trackml-detector.csv --digitization_config_file=tml_detector/default-geometric-config-generic.json --cell_directory=tml_full/ttbar_mu200/ --hit_directory=tml_full/ttbar_mu200/ --particle_directory=tml_full/ttbar_mu200/ --events=1 --skip=4

@beomki-yeo
Copy link
Contributor Author

Um finally, I found the reason.
I was overlooking the existence of measurement-simhit-map.csv. In the past, I simply ignored it because I thought measurement id and hit id are always equal, which is wrong. That's why the seeding efficiency dropped after a random eta value
Now it is fixed.

10 ttbar<200> events:

@beomki-yeo
Copy link
Contributor Author

Anybody review please :p

Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from questions about how tolerances are handled in the tests, I think this looks good.

I was very pleased to see that the SYCL code was propagating this info everywhere correctly already. Kudos to @konradkusiak97. I was absolutely expecting that we would've forgotten about at least 1-2 places. 😛

Depending on how you feel about setting up the tests, you could just go ahead and merge the PR like this. 😉

@@ -107,7 +107,7 @@ class ConnectedComponentAnalysisTests
result.at(io_truth.geometry_id);

traccc::scalar tol = std::max(
0.01, 0.0001 * std::max(io_truth.channel0, io_truth.channel1));
0.1, 0.0001 * std::max(io_truth.channel0, io_truth.channel1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😕 Did the tolerance worsen as a result of all of this? I thought if anything, this would make the test more accurate...

Copy link
Contributor Author

@beomki-yeo beomki-yeo May 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. the test didn't pass the variance test after changes.

            EXPECT_NEAR(match->variance[0], io_truth.variance0, tol);
            EXPECT_NEAR(match->variance[1], io_truth.variance1, tol);

TBH I don't know how this test works and data was created :/ But it's not surprising at all as out variance estimation is not that robust. My guess is that regenerating the synthetic data with the changes would solve the problem here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I figured out the reason.
The default pitch size of pixel_data is 1 which makes the reconstructed variance (match->variance[0] and match->variance[1] ) 1/12 or ~0.08.
And the io_truth.variance0 (which is truth input from synthetic data) is 0.
That's why 0.01 tolerance doesn't work here.

Then someone will ask why it has been OK so far.
Before this PR, pixel_data of cell_module gets the default pitch size of 0.05:

pixel_data pixel{-8.425, -36.025, 0.05, 0.05};

This makes the reconstructed variance 0.05/12, which is fine with 0.01 tolerance.
This PR removes the default value of pixel_data and causes the unit test fails with 0.08 variance.

The conclusion is cca_test or synthetic data needs some modification :p...

Comment on lines +83 to +88
// Make sure that the difference in spacepoint position is less than
// 1%
EXPECT_TRUE(
traccc::getter::norm(sp_recon.global - sp_truth.global) /
traccc::getter::norm(sp_recon.global) <
0.01);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Interesting. Is a percentage figure really the most adequate here? In reality I would hope that our reconstruction accuracy is some absolute value on the surface of the detectors. And not 0 in one corner, and some larger value in the opposite corner. To me the most physically accurate test here would be to say that the reconstructed position has to be within X um from the true position.

Which I would then express like:

EXPECT_LT(traccc::getter::norm(sp_recon.global - sp_truth.global), 0.01);

Assuming that we want to go for 10 um for instance. Though admittedly at this point we should start using the units from Acts to express these magic numbers more clearly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right but let's go with percentage for the moment. Once we add lorentz shift to clusterization algorithm, we can test with expected spatial resolution.

@beomki-yeo
Copy link
Contributor Author

@krasznaa Hmm I think it needs the approval again after force pushing

Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead like this then.

@krasznaa krasznaa merged commit 86f21a4 into acts-project:main May 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants