-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SO list offsets are wrong/counterintuitive #108
Comments
Thanks @bwvdnbro for recording the issue here. tl;dr: you are almost right: yes, there is an element being skipped when building The longer story now: like I mentioned in Slack, I had actually been looking at this code these days in the context of trying to reduce the maximum memory usage (see #71). The code has some extra complexity because the VELOCIraptor-STF/src/substructureproperties.cxx Lines 3306 to 3311 in b3c3371
Then they are 1-indexed: VELOCIraptor-STF/src/substructureproperties.cxx Line 3372 in b3c3371
VELOCIraptor-STF/src/substructureproperties.cxx Lines 3751 to 3758 in b3c3371
The same repeats in Skipping element Line 1461 in b3c3371
In that line the indexing should happen with As part of the changes that I've been working on for #71 I've also changed the indexing strategy for |
Sorry, I meant element 1 and not 0 (my mind is just not used to thinking about 1 as the first element). There is absolutely no hurry, I am currently using the |
Spherical overdensity information (Particle IDs and types) are collected by two different routines: GetInclusiveMasses and GetSOMasses. Both routines use the following technique: first, for "ngroup" groups, a C array-of-vectors (AOV) of "ngroup + 1" elements was allocated via "new" (which was eventually delete[]'ed). Vectors for each group are filled independently as groups are processed; the loop over groups is 1-indexed, and indexing into these AOVs happens using the same iteration variable, meaning that their 0 element is skipped. Finally, these AOVs are passed to WriteSOCatalog for writing. WritingSOCatalog is aware of the 1-based indexing, and additionally it flattens the AOVs into a single vector (thus duplicating their memory requirement) to finally write the data into the output HDF5 file. This commit originally aimed to reduce the memory overhead of the final writing of data into the HDF5 file. The main change required to achieve this is to perform the flattening of data at separate times, such that particles IDs and types are not flattened at the same time, but one after the other, with memory from the first flattening operation being released before the second flattening happens, thus reducing the maximum memory requirement of the application. This goal was achieved. However, while performing these changes two things became clear: firstly, that using a vector-of-vectors (VOV) was a better interface than an AOV (due to automatic memory management), and secondly that the 1-based indexing of the original AOVs introduced much complexity in the code. The scope of these changes was then broadened to cover these two extra changes, and therefore this commit considerably grew in size. In particular the 0-indexing of the VOVs allowed us to more easily use more std algorithms that clarify the intent in certain places of the code. There are other minor changes that have also been included in this commit, mostly to reduce variable scopes, reduce code duplication, and such. Assertions have also been sprinkled here and there to add further assurance that the code is working as expected. As an unintended side-effect, this commit also fixed the wrongly-calculated Offset dataset, which was off by one index in the original values. This problem was reported in #108, and seems to have always been there.
Spherical overdensity information (Particle IDs and types) are collected by two different routines: GetInclusiveMasses and GetSOMasses. Both routines use the following technique: first, for "ngroup" groups, a C array-of-vectors (AOV) of "ngroup + 1" elements was allocated via "new" (which was eventually delete[]'ed). Vectors for each group are filled independently as groups are processed; the loop over groups is 1-indexed, and indexing into these AOVs happens using the same iteration variable, meaning that their 0 element is skipped. Finally, these AOVs are passed to WriteSOCatalog for writing. WritingSOCatalog is aware of the 1-based indexing, and additionally it flattens the AOVs into a single vector (thus duplicating their memory requirement) to finally write the data into the output HDF5 file. This commit originally aimed to reduce the memory overhead of the final writing of data into the HDF5 file (see #71). The main change required to achieve this is to perform the flattening of data at separate times, such that particles IDs and types are not flattened at the same time, but one after the other, with memory from the first flattening operation being released before the second flattening happens, thus reducing the maximum memory requirement of the application. This goal was achieved. However, while performing these changes two things became clear: firstly, that using a vector-of-vectors (VOV) was a better interface than an AOV (due to automatic memory management), and secondly that the 1-based indexing of the original AOVs introduced much complexity in the code. The scope of these changes was then broadened to cover these two extra changes, and therefore this commit considerably grew in size. In particular the 0-indexing of the VOVs allowed us to more easily use more std algorithms that clarify the intent in certain places of the code. There are other minor changes that have also been included in this commit, mostly to reduce variable scopes, reduce code duplication, and such. Assertions have also been sprinkled here and there to add further assurance that the code is working as expected. As an unintended side-effect, this commit also fixed the wrongly-calculated Offset dataset, which was off by one index in the original values. This problem was reported in #108, and seems to have always been there.
@bwvdnbro please have a look at the changes in the |
Hi @rtobar, I will take a look at this tomorrow. I have never actually run VELOCIraptor before, so it can take a while. So far I was just looking at the output provided by someone else. |
@rtobar, I can now confirm that the changes in the |
Branch merged now, thanks for testing. |
Describe the bug
The offsets stored in the SO list output seem not consistent with the SO sizes in the same file. My (possibly wrong) expectation is that the particle IDs belonging to SO
i
are stored inpIDs[offset[i] : offset[i]+size[i]]
, but that is not the case.To Reproduce
The problem can be best illustrated through the following Python snippet that can be applied to any HDF5 SO list output:
This will produce two error messages. The first one because the final offset is not separated from the end of the particle ID list by the size of the final SO, and the second because the next offset does not match the previous offset plus the previous size.
Expected behavior
ofs[i] == ofs[i-1]+siz[i-1]
Put differently, I would expect
ofs
to be equivalent toLog files
Not applicable.
Environment (please complete the following information):
Irrelevant for this problem.
Additional context
I think the problem is situated in
io.cxx:1450
, where the value ofSOpids[0]
seems to be (incorrectly) omitted from the loop.The text was updated successfully, but these errors were encountered: