Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libgis: Distinguish dirs and objects in dir creation #1681

Merged
merged 12 commits into from
Jul 13, 2021

Conversation

wenzeslaus
Copy link
Member

The current G_make_mapset_element() is used in several places to create
nested directories such as grid3/some_raster3d_name for particular maps/objects.
However, it seems to me that that's very different from what G_make_mapset_element()
should be doing which is creating directories (such as fcell, vector) for these
objects.

This PR introduces new API functions which make distinction between these scenarios.
One function is for what most of the original API calls element that is the directory
for objects of the same type. The other is for the objects themselves.

The three new functions replace two existing ones.
Two replace G_make_mapset_element() covering the two cases and one replaces
G_make_mapset_element_tmp() covering only the common directory case
simply because I didn't see the tmp version to be used in that other way.

Distinguishing the two scenarios allows the code to handle differently a race
condition during creation when another process creates the same directory
between the failed access call and mkdir call.

In case of directory such as fcell or vector, all we asked
for was that the directory exists at the end of the function call which
will be fulfilled even when mkdir happened in another process.

Same race condition for directories such as vector/vector_map_name
is actuall problem and reporting it sooner rather than later is advantageous in
detecting the issue (that is two processes are trying to create map of the same name).

I think part of the issue is the unclear naming. Element sometimes refers to
the particular file such as fcell/raster_map_name, but here it refers to the
directory of these files such as fcell. This new API does not use great names,
but the idea is to refer to element group directories (such as cell) and specific
element files or directories (raster map piece, sub-directory for a specific vector map).

The current G_make_mapset_element() is used in several places to create
nested directories such as grid3/some_raster3d_name for particular maps/objects.
However, it seems to me that that's very different from what G_make_mapset_element()
should be doing which is creating directories (such as fcell, vector) for these
objects.

This PR introduces new API functions which make distinction between these scenarios.
One function is for what most of the original API calls element that is the directory
for objects of the same type. The other is for the objects themselves.

The three new functions replace two existing ones.
Two replace G_make_mapset_element() covering the two cases and one replaces
G_make_mapset_element_tmp() covering only the common directory case
simply because I didn't see the tmp version to be used in that other way.

Distinguishing the two scenarios allows the code to handle differently a race
condition during creation when another process creates the same directory
between the failed access call and mkdir call.

In case of directory such as fcell or vector, all we asked
for was that the directory exists at the end of the function call which
will be fulfilled even when mkdir happened in another process.

Same race condition for directories such as vector/vector_map_name
is actuall problem and reporting it sooner rather than later is advantageous in
detecting the issue (that is two processes are trying to create map of the same name).

I think part of the issue is the unclear naming. Element sometimes refers to
the particular file such as fcell/raster_map_name, but here it refers to the
directory of these files such as fcell. This new API does not use great names,
but the idea is to refer to element group directories (such as cell) and specific
element files or directories (raster map piece, sub-directory for a specific vector map).
@wenzeslaus
Copy link
Member Author

Unresolved cases:

general/g.filename/main.c:79:    G_make_mapset_element(element);
display/d.mon/start.c:153:    G_make_mapset_element(mon_path);

However, the old function is still in place, so it will simply continue to work in the same way as before.

@wenzeslaus wenzeslaus added the bug Something isn't working label Jun 25, 2021
@wenzeslaus
Copy link
Member Author

Suggestions for better naming are welcome! How do you call things in the internal mapset structure?

├── cats  <- element, element group, directory, ...?
│   ├── aspect  <- element, file, name, object, ...?
│   ├── basins
│   ├── bugsites
├── cell
│   ├── aspect
│   ├── basins
│   ├── bugsites
...
├── cell_misc
│   ├── aspect
│   │   ├── f_format
        ...
│   │   └── stats
...
├── sqlite
│   └── sqlite.db
├── vector
│   ├── address_points  <- element, object, subelement, subdirectory? (This is where the API issue is!)
│   │   ├── cidx
        ...
│   │   └── topo
...
└── windows
    └── lakes

Answering this may help me explaining what I'm trying to do here :-)

@HuidaeCho
Copy link
Member

HuidaeCho commented Jun 25, 2021

First, _directory to _dir to be consistent with G_mkdir(), G_is_dirsep(), G_convert_dirseps_to_*(), and find_program_dir_ext().

What about this?

  • G_make_mapset_element_type_directory => G_make_mapset_element_group
  • G_make_mapset_directory_element => G_make_mapset_element (current version as is)
  • G_make_mapset_element_type_directory_tmp => G_make_mapset_element_group_tmp
  • G_make_mapset_directory_element_tmp => G_make_mapset_element_tmp (not implemented in the commit? current version as is)

This way, we only introduce two API functions that ignore race conditions (G_make_mapset_element_group*).

@HuidaeCho
Copy link
Member

HuidaeCho commented Jun 25, 2021

Answering this may help me explaining what I'm trying to do here :-)

Elements optionally in a subelement (vector) in an element group? I think for subelements (per-vector directories), race conditions are not OK, so we can just reuse G_make_mapset_element(_tmp)?

@wenzeslaus
Copy link
Member Author

_directory to _dir to be consistent...

Sounds good!

What about this?

  • G_make_mapset_element_type_directory => G_make_mapset_element_group

I like the idea, the problem is that "element group" is what "element" was or is referring to in many places.

  • G_make_mapset_directory_element => G_make_mapset_element (current version as is)

I also think this would be quite fitting and easy to do, but the confusion about what element is might be the very reason this change is needed. Element definitely sounds like one of the objects in the database, but in many places in the API element is the type of the object (e.g., functions with parameters element and name where element is the type, i.e., directory name, and name is the file name, i.e., name of the raster map etc.).

Maybe the current confusion is somehow rooted in the flipped structure of the raster format which is fcell/name rather than raster/name/fcell. Using the same functions for vector (or 3D raster) is then when the need for two different functions (with whatever names) arises, because it is used for directories which are the objects, i.e., carry names of the vector and 3D raster maps.

Another issue is that if we keep current version as is, element would refer to the actual vector map (its directory), but for rasters it does not correspond to the actual raster, although that's a legacy and perhaps we should no consider that.

With "directory element" (dir_element), I was trying to express that it is just not fcell/name file but rather vector/name directory, i.e., element which is a directory.

"Element group" and "dir(rectory) element" combo would work for me, except that "group element" is or would be something very different.

  • G_make_mapset_element_type_directory_tmp => G_make_mapset_element_group_tmp

This way, we only introduce two API functions that ignore race conditions (G_make_mapset_element_group*).

  • G_make_mapset_directory_element_tmp => G_make_mapset_element_tmp (not implemented in the commit? current version as is)

This would be the race sensitive "unique-name" version. I didn't see it used in the code. Maybe it still should be in the API.

Elements optionally in a subelement (vector) in an element group?

I think for subelements (per-vector directories), race conditions are not OK, so we can just reuse G_make_mapset_element(_tmp)?

Right. The functionality is really the same. I just thought the current API is causing confusion.

This sounds like a lot naming discussion for 2-4 new functions, but confusing naming is what lead to these problems I think.

@wenzeslaus
Copy link
Member Author

Updates on the unresolved functions:

  • general/g.filename/main.c: Looks like it should not be creating that directory. It is a read-only module, so it should not be writing. Any ideas why it is creating that directory?
  • display/d.mon/start.c: Is it using G_make_mapset_element() as sort of mkdir -p (create dir with parents)?

@wenzeslaus
Copy link
Member Author

The new version uses:

G_make_mapset_object_group(const char *type)
G_make_mapset_dir_object(const char *type, const char *name)

I think the second function now more addresses the "vector/name" use case where you are anyway required to call G_make_mapset_object_group first to ensure that the parent directory exists (in the previous version). Here, it is resolved in the library and you don't need to construct any paths manually (as in the original code).

wenzeslaus added a commit to wenzeslaus/grass that referenced this pull request Jun 27, 2021
The previous behavior is creating the corresponding directory in the current
mapset anytime the module is executed (the intention was anytime the path is
in the current mapset according to the doc).

Creating the directory makes sense given the intended use of the module
(notably, the module is used only in one v6 addon as of now).
Given that there is a distinction between creation of an element and file
(even when file is a directory), having the functionality in a module
allows for treating the element case in a special way (see OSGeo#1681).

However, given the primary function of this module (constructing a path regardless of
its existence), writing operation is unexpected, so the new default is not writing
anything and creation needs to be explicitly requested using a new -c flag
which causes fatal error when the constructed path is not in the current mapset
(asking for creation of something outside of the current mapset is considered a usage error).
@wenzeslaus
Copy link
Member Author

As for the failing test, the different temporal tests do not fail here although higher number of repetitions is needed. (They do no fail every time on master branch, so I'm re-ruining the tests to get more repetitions.) Anyway, this is a good sign.

python/grass/pygrass/modules/interface/testsuite/test_pygrass_modules_interface_doctests.py fails here, but it fails also on master branch time to time, so unrelated.

In the tests for 3510140 (Ubuntu 20.04), I unfortunately still see:

Running ./python/grass/script/testsuite/test_parallel_temp_region.sh...
========================================================================
./data/script_using_temporary_region.py
Pure parallel test: Got 48 but expected 50 maps
========================================================================
FAILED ./python/grass/script/testsuite/test_parallel_temp_region.sh

Give that G_make_mapset_dir_object (which I fixed only later) is not used for rasters, it means that the issue there still persists, but fixing this test was not a goal of this PR anyway.

wenzeslaus added a commit that referenced this pull request Jul 13, 2021
The previous behavior is creating the corresponding directory in the current
mapset anytime the module is executed (the intention was only when the path is
in the current mapset according to the doc).

Creating the directory makes sense in general given the intended use of the module
(get path for a file in mapset, possibly in order to create it).
Given that there is a distinction between creation of an element (dir) and file
(even when file is a directory), having the functionality in a module
allows for treating the element case in a special way (not implemented yet, but see #1681).
However, notably, the module is used only in one v6 addon as of now.

However, given the primary function of this module (constructing a path regardless of
its existence), writing operation is unexpected, so the new default is not writing
anything and creation needs to be explicitly requested using a new -c flag
which causes fatal error when the constructed path is not in the current mapset
(asking for creation of something outside of the current mapset is considered a usage error).
@wenzeslaus
Copy link
Member Author

Updates on the unresolved functions:

  • general/g.filename/main.c: Updated to the race ignoring version after update to the module API and doc.
  • display/d.mon/start.c: I may just leave it as is since the I'm keeping G_make_mapset_element() in the API and parallel execution is not that critical to d.mon.

@marisn
Copy link
Contributor

marisn commented Jul 13, 2021

* [general/g.filename/main.c](https://github.com/OSGeo/grass/blob/master/general/g.filename/main.c#L79): Looks like it should not be creating that directory. It is a read-only module, so it should not be writing. Any ideas why it is creating that directory?

Scripting in shell can be real PITA. A lot of legacy functionality exists just to hold hand for shell scripters.

Copy link
Contributor

@marisn marisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always found these internal mapset element management functions confusing. I would not call current naming a huge improvement, but it is not worse than before and I do not have any ideas how to improve. Probably due to weird, inconsistent internal layout of mapset (rast vs vectors). I am certain we will return to this question for GRASS 9 ;-)

…tion of an object which is a dir. Function is internal (double underscore after G, used consistently, and documented such that element is the name of the concrete object, so keeping the name as is.
@wenzeslaus
Copy link
Member Author

I always found these internal mapset element management functions confusing.

Agreed and you are now alone which is what probably caused the current situation. The old API was trying so hard to hide that these are directories, but it never provided enough abstraction for things such as vectors, groups, and cell_misc.

I would not call current naming a huge improvement, but it is not worse than before and I do not have any ideas how to improve.

Thank you. I appreciate your comment. I feel the same. I added some documentation with examples which should make it usable and good starting point for future improvements.

... Probably due to weird, inconsistent internal layout of mapset (rast vs vectors). I am certain we will return to this question for GRASS 9 ;-)

I'm on board for this!

@wenzeslaus
Copy link
Member Author

I'm merging this because it fixes cases when multiple modules at once try to create a shared directory in an empty mapset, the tests are passing, and the new API increases chances that new code will be written correctly.

This fixed the g.filename case, but it does not fix d.mon and several places, esp. in the vector library. Specifically, this does not fix the G_make_mapset_element call in display/d.mon/start.c. Additionally, this does not modify multiple places, esp. in the vector library, where a "%s/%s" construct is used to create a directory, but the G_make* call is not made directly. These cases now use the behavior before 3c37460 (similarly to #1677 since 3c37460 was never intended to enforce the strict fail-on-creation-race behavior anyway). This is an side effect of the API changes. In the future, all places where "%s/%s" is used to construct element, should be replaced by more precise API where the "element name" is never constructed by the caller (whatever element is in the given context).

This fixes all occasionally failing temporal tests (never observed here to fail) and also all three observed errors encountered in the python/grass/script/testsuite/test_parallel_temp_region.sh test (presumably including the originally not fixed case which is likely the cell_misc case). Here are some example (now fixed) errors from the test_parallel_temp_region.sh test:

ERROR: Unable to make mapset element cell (/home/runner/nc_spm_full_v2alpha2/__python_grass_script_test_parallel_temp_region/cell): File exists
ERROR: Unable to make mapset element .tmp/fv-az180-858 (/home/runner/nc_spm_full_v2alpha2/__python_grass_script_test_parallel_temp_region/.tmp): File exists
ERROR: Unable to make mapset element cell_misc/parallel_tmp_region_parallel_2_size_150_nesting_0 (/home/runner/nc_spm_full_v2alpha2/__python_grass_script_test_parallel_temp_region/cell_misc): File exists

@wenzeslaus wenzeslaus merged commit 23a8e6d into OSGeo:master Jul 13, 2021
@wenzeslaus wenzeslaus deleted the new-api-for-element-mkdir branch July 13, 2021 20:27
@neteler neteler added this to the 8.0.0 milestone Dec 9, 2021
ninsbl pushed a commit to ninsbl/grass that referenced this pull request Oct 26, 2022
The previous behavior is creating the corresponding directory in the current
mapset anytime the module is executed (the intention was only when the path is
in the current mapset according to the doc).

Creating the directory makes sense in general given the intended use of the module
(get path for a file in mapset, possibly in order to create it).
Given that there is a distinction between creation of an element (dir) and file
(even when file is a directory), having the functionality in a module
allows for treating the element case in a special way (not implemented yet, but see OSGeo#1681).
However, notably, the module is used only in one v6 addon as of now.

However, given the primary function of this module (constructing a path regardless of
its existence), writing operation is unexpected, so the new default is not writing
anything and creation needs to be explicitly requested using a new -c flag
which causes fatal error when the constructed path is not in the current mapset
(asking for creation of something outside of the current mapset is considered a usage error).
ninsbl pushed a commit to ninsbl/grass that referenced this pull request Oct 26, 2022
The G_make_mapset_element() function is used in several places to create
nested directories such as grid3/some_raster3d_name for particular maps/objects.
However, the original purpose of G_make_mapset_element() seems to be creating directories
(such as fcell, vector) for these objects (objects such as elevation or bridges).

This change introduces new API functions which make distinction between these scenarios.
One function is for what most of the original API calls element that is the directory
for objects of the same type. The other is for the objects themselves.

The three new functions replace two existing ones.
Two replace G_make_mapset_element() covering the two cases and one replaces
G_make_mapset_element_tmp() covering only the common directory case
because there is no direct use of this functionality in the current code.
In the new API, dir_object is for the vector/name case and object_group for the vector and fcell cases.
The vector/name case is a single function call with more parameters rather than two function calls
and replaces the use of %s/%s syntax which is what was used before.

Distinguishing the two scenarios allows the code to handle differently a race
condition during creation when another process creates the same directory
between the failed access call and mkdir call.

In case of directory such as fcell or vector, all we asked
for was that the directory exists at the end of the function call which
will be fulfilled even when mkdir happened in another process.

Same race condition for directories such as cell_misc/raster_map_name or vector/vector_map_name
is actual problem and reporting it sooner rather than later is advantageous in
detecting the issue (assuming that the issue is two processes are trying to create map of the same name).
However, this is not the new behavior for some vector files where %s/%s syntax is still used
to create the parent directories without making any distinctions about the purpose of the directory.
This is the behavior which the code had before 3c37460
which was intended to imporve error message, but it changed also the the on-creation-race behavior.

The underlying issue seems to be, at least partially, the unclear naming. Element sometimes refers to
the particular file such as fcell/raster_map_name, but in the G_make functions, it refers to the
directory of these files such as fcell. This new API does not use any great names,
but the idea is to refer to element group directories (such as cell) and specific
element files or directories (raster map piece, sub-directory for a specific vector map).
The hope is that in the new API, words directory, object, and type are used in somewhat common way.

Creation of misc elements (cell_misc and group) is now the two-phase creation of an object which is a dir. Function is internal (double underscore after G), used consistently, and documented such that element is the name of the concrete object, so keeping the name as is.

The documentation of functions is based on the current usage and new names although the terminology is not a new official terminology.
ninsbl pushed a commit to ninsbl/grass that referenced this pull request Feb 17, 2023
The previous behavior is creating the corresponding directory in the current
mapset anytime the module is executed (the intention was only when the path is
in the current mapset according to the doc).

Creating the directory makes sense in general given the intended use of the module
(get path for a file in mapset, possibly in order to create it).
Given that there is a distinction between creation of an element (dir) and file
(even when file is a directory), having the functionality in a module
allows for treating the element case in a special way (not implemented yet, but see OSGeo#1681).
However, notably, the module is used only in one v6 addon as of now.

However, given the primary function of this module (constructing a path regardless of
its existence), writing operation is unexpected, so the new default is not writing
anything and creation needs to be explicitly requested using a new -c flag
which causes fatal error when the constructed path is not in the current mapset
(asking for creation of something outside of the current mapset is considered a usage error).
ninsbl pushed a commit to ninsbl/grass that referenced this pull request Feb 17, 2023
The G_make_mapset_element() function is used in several places to create
nested directories such as grid3/some_raster3d_name for particular maps/objects.
However, the original purpose of G_make_mapset_element() seems to be creating directories
(such as fcell, vector) for these objects (objects such as elevation or bridges).

This change introduces new API functions which make distinction between these scenarios.
One function is for what most of the original API calls element that is the directory
for objects of the same type. The other is for the objects themselves.

The three new functions replace two existing ones.
Two replace G_make_mapset_element() covering the two cases and one replaces
G_make_mapset_element_tmp() covering only the common directory case
because there is no direct use of this functionality in the current code.
In the new API, dir_object is for the vector/name case and object_group for the vector and fcell cases.
The vector/name case is a single function call with more parameters rather than two function calls
and replaces the use of %s/%s syntax which is what was used before.

Distinguishing the two scenarios allows the code to handle differently a race
condition during creation when another process creates the same directory
between the failed access call and mkdir call.

In case of directory such as fcell or vector, all we asked
for was that the directory exists at the end of the function call which
will be fulfilled even when mkdir happened in another process.

Same race condition for directories such as cell_misc/raster_map_name or vector/vector_map_name
is actual problem and reporting it sooner rather than later is advantageous in
detecting the issue (assuming that the issue is two processes are trying to create map of the same name).
However, this is not the new behavior for some vector files where %s/%s syntax is still used
to create the parent directories without making any distinctions about the purpose of the directory.
This is the behavior which the code had before 3c37460
which was intended to imporve error message, but it changed also the the on-creation-race behavior.

The underlying issue seems to be, at least partially, the unclear naming. Element sometimes refers to
the particular file such as fcell/raster_map_name, but in the G_make functions, it refers to the
directory of these files such as fcell. This new API does not use any great names,
but the idea is to refer to element group directories (such as cell) and specific
element files or directories (raster map piece, sub-directory for a specific vector map).
The hope is that in the new API, words directory, object, and type are used in somewhat common way.

Creation of misc elements (cell_misc and group) is now the two-phase creation of an object which is a dir. Function is internal (double underscore after G), used consistently, and documented such that element is the name of the concrete object, so keeping the name as is.

The documentation of functions is based on the current usage and new names although the terminology is not a new official terminology.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants