Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate particles in mergers #32

Merged
merged 40 commits into from
May 20, 2024

Conversation

VictorForouhar
Copy link
Collaborator

Do merging checks after unbinding an object, rather than the whole set of objects. This ensures we can update the boundness of particles before unbound ones are passed to the parent of the subhalo. In some cases, this leads to particle duplication.

@VictorForouhar
Copy link
Collaborator Author

Running first test with COLIBRE example box.

@VictorForouhar
Copy link
Collaborator Author

Early testing suggests new merging method does indeed decrease number of duplicate particles. We achieve the same number as tests where merging between subhaloes are disabled.

As a comparison, the master version produces:

Testing presence of duplicate particles in HBTplus at snapshot index 10.
Opening HBT catalogue: /cosma7/data/dp004/dc-foro1/colibre/hbt_testing/merged_master/010/SubSnap_010.{file_nr}.hdf5 --- DONE
1170 particles out of 8695713 are duplicate.
60 unique subhalos out of 57064 share particles.

Master version without merging:

Testing presence of duplicate particles in HBTplus at snapshot index 10. 
Opening HBT catalogue: /cosma7/data/dp004/dc-foro1/colibre/hbt_testing/removing_duplicates_no_mergers/010/SubSnap_010.{file_nr}.hdf5 --- DONE
70 particles out of 8673697 are duplicate.
13 unique subhalos out of 57064 share particles.

New merging version:

Testing presence of duplicate particles in HBTplus at snapshot index 10. 
Opening HBT catalogue: /cosma7/data/dp004/dc-foro1/colibre/hbt_testing/new_merging_criteria_fix/010/SubSnap_010.{file_nr}.hdf5 --- DONE
70 particles out of 8676564 are duplicate.
13 unique subhalos out of 57064 share particles.

@VictorForouhar
Copy link
Collaborator Author

VictorForouhar commented Apr 27, 2024

There are some small differences in the timing and number of mergers. I am assuming this is due to the fact that the master version only checks merging with the phase-space information of the first round of unbinding.

The new version uses up-to-date phase-space information, e.g. if a subhalo can become overlapping with its parent after accreting particles from its satellite. The master version does not take into account the newly accreted particles. Will check a few examples next week.

@VictorForouhar
Copy link
Collaborator Author

I also have an idea concerning the remaining set of duplicate particles, I will test it next week.

@VictorForouhar
Copy link
Collaborator Author

VictorForouhar commented Apr 29, 2024

This new version results in slightly more mergers relative to the original version. Below is the number of mergers at a given snapshot. Overall, 25 more objects are identified as merged by snapshot 12 in the new version, an increase of 0.6%.
image

Would have affected absolute position values, and potentially the position dispersion (if the core of a subhalo is split across a PCB)
To check values relative to old calculations. Run disabling the new
merging scheme, to obtain the same subhalo-subhalo comparisons.
@VictorForouhar
Copy link
Collaborator Author

Running a test with the asserts added in 3c7967e results in no crashes, indicating the core calculations and phase-space distances are the same the old and new versions. Prior to ad15256, they asserts would crash. I will now remove those temporary asserts.

@VictorForouhar
Copy link
Collaborator Author

I will re-run a test to see how the merger rates vary between old and new versions. In particular, see how #32 (comment) changes now that the spatial distance calculations are done correctly.

Should only be used for runs using the same FOF and particle information!
@VictorForouhar
Copy link
Collaborator Author

After the PCB correction, they are now in better agreement (note difference in y-axis scale of bottom plot). The number of merged objects is the same in both runs.
image

@VictorForouhar
Copy link
Collaborator Author

Looking into differences in merging of matched objects. Some appear to be disrupted in the old but merged in the new (first plot). Others have experienced mergers in the old but not in the new (second plot). I now believe my previous statement concerning the updating the phase-space properties after unbinding a single object vs all can also suppress mergers. It all depends on where the newly accreted particles shift the core to.
image
image

Information available within Unbind, whereas previous method relied on a
helper array that did not have the same ordering.
@VictorForouhar
Copy link
Collaborator Author

First example shown above is caused due to a small difference in Nbound of the parent (+1 particle is bound), and the PhaseSpaceDistance goes from > 2 to < 2.

@jchelly
Copy link
Collaborator

jchelly commented Apr 29, 2024

Running this on the colibre test with gcc address sanitizer it stops with a heap buffer overflow error at subhalo_tracking.cpp:679. There's a full stack trace (although with lots of repetition due to MPI) in /cosma7/data/dp004/jch/HBT-colibre/merging/remove_duplicate_particles_in_mergers/logs/job.HBT.6937740.out. I'll try it in DDT.

This would occur because we tried to access element  -1 in the halo array.
@VictorForouhar
Copy link
Collaborator Author

Running with address sanitizer now works after 200d30f. Will submit a FLAMINGO test now, which also crashed last night at this point.

@VictorForouhar
Copy link
Collaborator Author

The memory issues we currently have appear to be within the merger tree routines. Whilst that is getting solved, I will work on getting the last mechanism for particles duplication under control.

VictorForouhar and others added 3 commits May 3, 2024 09:11
* Masking function that gives preference to shallower subhaloes in the hierarchy

* Added masking from bottom to top

* Update value of TracerIndex to reflect change in Particle vector ordering

* Added function call to mask particles from the source subhalo
Called after saving bound properties.

* Skip FOF groups that only have a central or an unbound component

* Comments

* Moved location of CleanTracks
Makes more sense to be placed within save routine.

* Fixed array indexing for hierarchical relationships
At the stage where we want to do the mask cleaning, the NestedSubhalos values have been globalised and thus do not reflect the location in the local Subhalo array.

* tracer_counter only asserted for subhalos with Nbound > 0

* Update Nbound value after mask clean step.

Outdated value would lead to memory issues in Subhalo_t::KickNullParticles,
which relies of Nbound instead of Particles.size(). We should really make Nbound
consistent throughout the code!!

* Nasty bug with iterator start
Shout out to Rob for spotting it!

* Value of Nbound now means the number of remaining bound particles in the masked subset.

* VERY crude way to allocate sufficient memory in ExclusionList

* Split IO into two distinct steps: bound and source particle saving
This allows me to clean the particles after saving bound properties, but
before saving the source information used for restarts

* Removed stray timer

* More comments

* Fix in duplicate particle testing script.
Would only account for one of the TrackIds hosting a duplicate particle.

* Applied formatter

* Number of tracers used for merger tree default to minimum number of tracers
This value defaults to 10.

* Made the masking work.
- New debug checks added
- Masking from top to bottom and viceversa are called separately, which
  solves cases where the initial central guess was incorrect.
- Fixed a bug when checking whether the number of bound particles is
  changing when masking

* Formatting

* Avoid using MemberTable in CleanTracks()

* Move CleanTracks() call to just before AssignHosts

This avoids problems with restarting because we don't write an out
of date Nbound to the SubSnap files. It also means that we use all
of the bound particles from computing halo positions and velocities
for deciding centrals.

* Fix concerning orphan masses in central identification

Previously, newly created orphans would retain their bound masses,
meaning that they could remain as centrals even if another resolved
subhalo was present. This fix makes all orphans have zero mass, in
line with their zero bound particles.

* Updated comments

---------

Co-authored-by: John Helly <j.c.helly@durham.ac.uk>
src/subhalo_merge.cpp Outdated Show resolved Hide resolved
src/subhalo_merge.cpp Outdated Show resolved Hide resolved
src/subhalo_unbind.cpp Outdated Show resolved Hide resolved
@VictorForouhar
Copy link
Collaborator Author

Made the suggested changes. @robjmcgibbon

Copy link
Collaborator

@robjmcgibbon robjmcgibbon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\o/

@jchelly
Copy link
Collaborator

jchelly commented May 17, 2024

If I run the colibre test with asserts enabled I get a crash with this branch:

HBT: /cosma7/data/dp004/jch/HBT/crash/HBTplus/src/subhalo_tracking.cpp:1173: void SubhaloMasker_t::MaskTopBottom(HBTInt, std::vector<Subhalo_t>&, const MappedIndexTable_t<long int, long int>&): Assertion `insert_status.second == true' failed.

@VictorForouhar
Copy link
Collaborator Author

Okay, thanks for flagging. Weird that it was not triggered before... could it be some of the latest commits?

@jchelly
Copy link
Collaborator

jchelly commented May 17, 2024

It was processing snapshot 4:

saving 4086 subhalos to ./test_output//003
Conversion factor from SWIFT length units to 0.681 Mpc/h = 1
Conversion factor from SWIFT mass units to 6.81e+09 Msun/h = 1
Conversion factor from SWIFT velocity units to 1 km/s = 1
Null group ID is 2147483647
Number of ranks per node reading simultaneously is 28
33223360 particles loaded at Snapshot 4(36)
10103 groups loaded at snapshot 4(36)
HBT: /cosma7/data/dp004/jch/HBT/crash/HBTplus/src/subhalo_tracking.cpp:1173: void SubhaloMasker_t::MaskTopBottom(HBTInt, std::vector<Subhalo_t>&, const MappedIndexTable_t<long int, long int>&): Assertion `insert_status.second == true' failed.

@VictorForouhar
Copy link
Collaborator Author

I ran it with the last commit of the branch doing the masking, de59340d623044424df2ee9e2c258926aaf22c4c, and it did not crash. Merging these two branches must have caused the issue.

@VictorForouhar
Copy link
Collaborator Author

Which is weird, since that branch had already incorporated changes in the merging algorithm.

@VictorForouhar
Copy link
Collaborator Author

Branch immediately after merging worked. The issue is likely caused by the subsequent commits.

It would produce duplicate particles since we'd be adding the bound
component twice. Then the assert in the masking would fail due to the
presence of duplicates and the fail to insert some particles.
@VictorForouhar
Copy link
Collaborator Author

Latest commit fixes the assert failure. Essentially, I was calling the MergeTo routine twice if it ran in DEBUG mode.

@VictorForouhar VictorForouhar merged commit b8d5f0a into master May 20, 2024
@VictorForouhar VictorForouhar deleted the remove_duplicate_particles_in_mergers branch May 20, 2024 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants