Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/vector/Vlib: always write out topo files in update mode #3459

Merged
merged 5 commits into from
Oct 15, 2024

Conversation

metzm
Copy link
Contributor

@metzm metzm commented Feb 26, 2024

When a vector with topology is opened in update mode, but nothing is changed, topology info (topo, cidx, sidx) is not written out. This PR fixes this.

I am not sure if this is a bug or an enhancement.

@metzm metzm added enhancement New feature or request vector Related to vector data processing C Related code is in C libraries labels Feb 26, 2024
@metzm metzm added this to the 8.4.0 milestone Feb 26, 2024
@metzm metzm requested a review from HuidaeCho February 26, 2024 15:52
Copy link
Member

@HuidaeCho HuidaeCho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works without Vect_build_partial() or Vect_build() as long as the map is Vect_close(). Still, G_fatal_error() needs to add an error handler to call Vect_close(). Is it possible just to crash without closing the map?

@metzm
Copy link
Contributor Author

metzm commented Feb 26, 2024

When a vector is opened with Vect_open_update2(), the topo, cidx, and sidx files are deleted and need to be created anew which happens with Vect_close(). That means with a crash before `Vect_close(), these files and with them all of the topology info are lost.

@HuidaeCho HuidaeCho self-requested a review February 26, 2024 17:18
HuidaeCho
HuidaeCho previously approved these changes Feb 26, 2024
@wenzeslaus wenzeslaus added bug Something isn't working and removed enhancement New feature or request labels Feb 26, 2024
@HuidaeCho HuidaeCho enabled auto-merge (squash) February 26, 2024 17:20
@wenzeslaus
Copy link
Member

When a vector is opened with Vect_open_update2(), the topo, cidx, and sidx files are deleted and need to be created anew which happens with Vect_close().

Thanks for the explanation. This PR then sounds like a bug fix to me. I think it is fair to say that expected behavior of "open-for-update, do-nothing, close" sequence is that it would not invalidate the data.

@metzm
Copy link
Contributor Author

metzm commented Feb 26, 2024

With 8ef73e8 there is a better solution where a map opened in update mode survives with level 2, any changes would be discarded, i.e. if Vect_close() can not be called because of a crash earlier on, the old topo, sidx, cidx, fidx files would still exist.

HuidaeCho
HuidaeCho previously approved these changes Feb 26, 2024
@HuidaeCho
Copy link
Member

HuidaeCho commented Feb 26, 2024

With 8ef73e8 there is a better solution where a map opened in update mode survives with level 2, any changes would be discarded, i.e. if Vect_close() can not be called because of a crash earlier on, the old topo, sidx, cidx, fidx files would still exist.

Works well! Thanks!

Once topology is built after any changes, Vect_close() or v.build is needed and there is no way to discard them.

@metzm
Copy link
Contributor Author

metzm commented Feb 27, 2024

I am not convinced with the current proposed solution because in case of a crash this can lead to a modified coor file (modified geometries) and old topo, sidx, cidx files no longer matching the new coor file which can cause strange errors.

Therefore I prefer my first solution were even with no changes, the topo, sidx, cidx files are created anew. In case of a crash, there will be no topo, sidx, cidx files and v.build will be needed to recreate them from the coor file.

@HuidaeCho
Copy link
Member

HuidaeCho commented Feb 27, 2024

I am not convinced with the current proposed solution because in case of a crash this can lead to a modified coor file (modified geometries) and old topo, sidx, cidx files no longer matching the new coor file which can cause strange errors.

Therefore I prefer my first solution were even with no changes, the topo, sidx, cidx files are created anew. In case of a crash, there will be no topo, sidx, cidx files and v.build will be needed to recreate them from the coor file.

Did a quick test.

  • open; fatal error (2nd solution preferred)
    • 1st solution: Level downgraded; Need v.build
    • 2nd solution: Level 2
  • open; close (tie)
    • 1st solution: Level 2
    • 2nd solution: Level 2
  • open; changes; fatal error (1st solution preferred)
    • 1st solution: Level downgraded; Need v.build; Changes preserved
    • 2nd solution: v.info doesn't work; Level 2 after v.build; Changes preserved
  • open; changes; close (1st solution preferred)
    • 1st solution: Level downgraded; Need v.build; Changes preserved
    • 2nd solution: v.info doesn't work; Level 2 after v.build; Changes preserved
  • open; changes; build; fatal error (1st solution preferred)
    • 1st solution: Level downgraded; Need v.build; Changes preserved
    • 2nd solution: v.info doesn't work; Level 2 after v.build; Changes preserved
  • open; changes; build; close (tie)
    • Everything is good!

In no case, changes were discarded. With open; changes; close breaking v.info with the 2nd solution, your 1st solution might be better. I think with either solution, a custom error handler that builds and closes the map would be needed. If that's missing, the 1st solution would be better (downgrade over breaking v.info).

Or breaking v.info with the 2nd solution might also be better instead of silently (hard to notice) downgrading level 1 with the 1st, causing issues later? At least, the error message mentions rebuild topology:

$ v.info roadsmajor_upper_half
WARNING: Size of 'coor' file differs from value saved in topology file
WARNING: Please rebuild topology for vector map
         <roadsmajor_upper_half@user1>
ERROR: Unable to open topology file for vector map
       <roadsmajor_upper_half@user1>

Having said that, there are only three vector modules that use Vect_open_update2?():

$ cd vector
$ find -type f -name "*.c" -exec grep -H "Vect_open_update2\?(" {} \; | cut -d/ -f2 | sort -u
v.clean
v.edit
v.patch

So I'm not sure...

@wenzeslaus
Copy link
Member

This may not be obvious since we have mostly tests of individual tools, but these examples can be rewritten as tests in Python without a need to create a separate binary like we have for 3D raster library. We have examples of using ctypes to test the C library functions for both grass.gunittest tests using ctypes (also without grass.pygrass) and pytest-based tests.

@marisn
Copy link
Contributor

marisn commented Feb 28, 2024

Probably we could take a different route – modify functions that write coor file to set "dirty" bit (e.g. touch dirty or topo version -1) before write indicating topo et al. files being not up to date. Thus no "dirty" bit == no need to modify topo file. "Dirty" bit present == failure to open on level 2, rebuild required. I didn't look deep into code to see how easy this could be implemented.

@metzm
Copy link
Contributor Author

metzm commented Feb 28, 2024

@marisn This does not solve the problem of a crash.

I prefer solution 1 where the support files are deleted when opening and always written out anew when closing. Writing out these files is fast and does not require rebuilding topology. With solution 2, there is the danger of ending up with non matching/wrong support files which can lead to difficult to debug problems.

@HuidaeCho
Copy link
Member

@marisn This does not solve the problem of a crash.

I prefer solution 1 where the support files are deleted when opening and always written out anew when closing. Writing out these files is fast and does not require rebuilding topology. With solution 2, there is the danger of ending up with non matching/wrong support files which can lead to difficult to debug problems.

@metzm Based on my test, I think solution 1 should be preferred because with either solution, we need an error handler to properly close the map and it'll be the developer's mistake if that doesn't happen and solution 1 will be less harmful in that case.

HuidaeCho
HuidaeCho previously approved these changes Feb 28, 2024
@HuidaeCho
Copy link
Member

I saw that same error between edits without topology building. I was wondering if it's always required to rebuild topology or if there are any cases where we don't need it when editing features. Will it be slow to rebuild it if there are a lot of features?

@metzm
Copy link
Contributor Author

metzm commented Mar 2, 2024

[...] Will it be slow to rebuild it if there are a lot of features?

If topology is already built and you just want that the cidx is marked up to date, it should be enough to call Vect_build() without calling Vect_build_partial(GV_BUILD_NONE) first, and this should be very fast.

@metzm
Copy link
Contributor Author

metzm commented Mar 2, 2024

In pygrass, here https://github.com/OSGeo/grass/blob/main/python/grass/pygrass/vector/abstract.py#L454-L470
and here https://github.com/OSGeo/grass/blob/main/python/grass/pygrass/vector/abstract.py#L454
the vector is first closed, then topology is built, then the vector is closed again. This seems wrong to me, topology should be built when the vector is still open, or the other way around, the vector should only be closed after topology has been built, not before.

@marisn
Copy link
Contributor

marisn commented Mar 3, 2024

This seems wrong to me, topology should be built when the vector is still open, or the other way around, the vector should only be closed after topology has been built, not before.

Please create a PR for the documentation as neither Vect_build or Vect_close mention this important bit.

@wenzeslaus wenzeslaus modified the milestones: 8.4.0, Future Apr 27, 2024
@metzm metzm changed the title Vlib: always write out topo files in update mode lib/vlib: always write out topo files in update mode Oct 3, 2024
@metzm metzm changed the title lib/vlib: always write out topo files in update mode lib/vector/Vlib: always write out topo files in update mode Oct 3, 2024
@metzm
Copy link
Contributor Author

metzm commented Oct 3, 2024

This seems wrong to me, topology should be built when the vector is still open, or the other way around, the vector should only be closed after topology has been built, not before.

Please create a PR for the documentation as neither Vect_build or Vect_close mention this important bit.

Vect_build() needs to be called as int Vect_build(struct Map_info *Map), thus a valid struct Map_info pointer must be passed which implies an opened vector map.

@github-actions github-actions bot added the Python Related code is in Python label Oct 3, 2024
@metzm metzm requested a review from HuidaeCho October 4, 2024 15:40
@metzm
Copy link
Contributor Author

metzm commented Oct 4, 2024

With the fix in pygrass, all checks have now passed. Ready to merge?

@echoix echoix modified the milestones: Future, 8.5.0 Oct 4, 2024
@metzm metzm added the backport to 8.4 PR needs to be backported to release branch 8.4 label Oct 7, 2024
@metzm metzm merged commit f7d2ecf into OSGeo:main Oct 15, 2024
26 checks passed
@metzm metzm deleted the Vlib_update_l2 branch October 15, 2024 13:12
metzm added a commit that referenced this pull request Oct 15, 2024
* Vlib: always write out topo files in update mode
* delete support files only when closing
* fix pygrass Vect_close

---------

Co-authored-by: Huidae Cho <grass4u@gmail.com>
@neteler neteler removed the backport to 8.4 PR needs to be backported to release branch 8.4 label Oct 16, 2024
@neteler neteler modified the milestones: 8.5.0, 8.4.1 Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working C Related code is in C libraries Python Related code is in Python vector Related to vector data processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants