-
-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libgis: Distinguish dirs and objects in dir creation #1681
libgis: Distinguish dirs and objects in dir creation #1681
Conversation
The current G_make_mapset_element() is used in several places to create nested directories such as grid3/some_raster3d_name for particular maps/objects. However, it seems to me that that's very different from what G_make_mapset_element() should be doing which is creating directories (such as fcell, vector) for these objects. This PR introduces new API functions which make distinction between these scenarios. One function is for what most of the original API calls element that is the directory for objects of the same type. The other is for the objects themselves. The three new functions replace two existing ones. Two replace G_make_mapset_element() covering the two cases and one replaces G_make_mapset_element_tmp() covering only the common directory case simply because I didn't see the tmp version to be used in that other way. Distinguishing the two scenarios allows the code to handle differently a race condition during creation when another process creates the same directory between the failed access call and mkdir call. In case of directory such as fcell or vector, all we asked for was that the directory exists at the end of the function call which will be fulfilled even when mkdir happened in another process. Same race condition for directories such as vector/vector_map_name is actuall problem and reporting it sooner rather than later is advantageous in detecting the issue (that is two processes are trying to create map of the same name). I think part of the issue is the unclear naming. Element sometimes refers to the particular file such as fcell/raster_map_name, but here it refers to the directory of these files such as fcell. This new API does not use great names, but the idea is to refer to element group directories (such as cell) and specific element files or directories (raster map piece, sub-directory for a specific vector map).
|
Unresolved cases: However, the old function is still in place, so it will simply continue to work in the same way as before. |
|
Suggestions for better naming are welcome! How do you call things in the internal mapset structure? Answering this may help me explaining what I'm trying to do here :-) |
|
First, What about this?
This way, we only introduce two API functions that ignore race conditions (G_make_mapset_element_group*). |
Elements optionally in a subelement (vector) in an element group? I think for subelements (per-vector directories), race conditions are not OK, so we can just reuse G_make_mapset_element(_tmp)? |
Sounds good!
I like the idea, the problem is that "element group" is what "element" was or is referring to in many places.
I also think this would be quite fitting and easy to do, but the confusion about what element is might be the very reason this change is needed. Element definitely sounds like one of the objects in the database, but in many places in the API element is the type of the object (e.g., functions with parameters element and name where element is the type, i.e., directory name, and name is the file name, i.e., name of the raster map etc.). Maybe the current confusion is somehow rooted in the flipped structure of the raster format which is Another issue is that if we keep current version as is, element would refer to the actual vector map (its directory), but for rasters it does not correspond to the actual raster, although that's a legacy and perhaps we should no consider that. With "directory element" ( "Element group" and "dir(rectory) element" combo would work for me, except that "group element" is or would be something very different.
This would be the race sensitive "unique-name" version. I didn't see it used in the code. Maybe it still should be in the API.
Right. The functionality is really the same. I just thought the current API is causing confusion. This sounds like a lot naming discussion for 2-4 new functions, but confusing naming is what lead to these problems I think. |
|
Updates on the unresolved functions:
|
… cases, vector/name case as a single function call
|
The new version uses: I think the second function now more addresses the "vector/name" use case where you are anyway required to call |
The previous behavior is creating the corresponding directory in the current mapset anytime the module is executed (the intention was anytime the path is in the current mapset according to the doc). Creating the directory makes sense given the intended use of the module (notably, the module is used only in one v6 addon as of now). Given that there is a distinction between creation of an element and file (even when file is a directory), having the functionality in a module allows for treating the element case in a special way (see OSGeo#1681). However, given the primary function of this module (constructing a path regardless of its existence), writing operation is unexpected, so the new default is not writing anything and creation needs to be explicitly requested using a new -c flag which causes fatal error when the constructed path is not in the current mapset (asking for creation of something outside of the current mapset is considered a usage error).
|
As for the failing test, the different temporal tests do not fail here although higher number of repetitions is needed. (They do no fail every time on master branch, so I'm re-ruining the tests to get more repetitions.) Anyway, this is a good sign.
In the tests for 3510140 (Ubuntu 20.04), I unfortunately still see: Give that G_make_mapset_dir_object (which I fixed only later) is not used for rasters, it means that the issue there still persists, but fixing this test was not a goal of this PR anyway. |
The previous behavior is creating the corresponding directory in the current mapset anytime the module is executed (the intention was only when the path is in the current mapset according to the doc). Creating the directory makes sense in general given the intended use of the module (get path for a file in mapset, possibly in order to create it). Given that there is a distinction between creation of an element (dir) and file (even when file is a directory), having the functionality in a module allows for treating the element case in a special way (not implemented yet, but see #1681). However, notably, the module is used only in one v6 addon as of now. However, given the primary function of this module (constructing a path regardless of its existence), writing operation is unexpected, so the new default is not writing anything and creation needs to be explicitly requested using a new -c flag which causes fatal error when the constructed path is not in the current mapset (asking for creation of something outside of the current mapset is considered a usage error).
|
Updates on the unresolved functions:
|
Scripting in shell can be real PITA. A lot of legacy functionality exists just to hold hand for shell scripters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always found these internal mapset element management functions confusing. I would not call current naming a huge improvement, but it is not worse than before and I do not have any ideas how to improve. Probably due to weird, inconsistent internal layout of mapset (rast vs vectors). I am certain we will return to this question for GRASS 9 ;-)
…tion of an object which is a dir. Function is internal (double underscore after G, used consistently, and documented such that element is the name of the concrete object, so keeping the name as is.
Agreed and you are now alone which is what probably caused the current situation. The old API was trying so hard to hide that these are directories, but it never provided enough abstraction for things such as vectors, groups, and cell_misc.
Thank you. I appreciate your comment. I feel the same. I added some documentation with examples which should make it usable and good starting point for future improvements.
I'm on board for this! |
|
I'm merging this because it fixes cases when multiple modules at once try to create a shared directory in an empty mapset, the tests are passing, and the new API increases chances that new code will be written correctly. This fixed the g.filename case, but it does not fix d.mon and several places, esp. in the vector library. Specifically, this does not fix the This fixes all occasionally failing temporal tests (never observed here to fail) and also all three observed errors encountered in the |
The previous behavior is creating the corresponding directory in the current mapset anytime the module is executed (the intention was only when the path is in the current mapset according to the doc). Creating the directory makes sense in general given the intended use of the module (get path for a file in mapset, possibly in order to create it). Given that there is a distinction between creation of an element (dir) and file (even when file is a directory), having the functionality in a module allows for treating the element case in a special way (not implemented yet, but see OSGeo#1681). However, notably, the module is used only in one v6 addon as of now. However, given the primary function of this module (constructing a path regardless of its existence), writing operation is unexpected, so the new default is not writing anything and creation needs to be explicitly requested using a new -c flag which causes fatal error when the constructed path is not in the current mapset (asking for creation of something outside of the current mapset is considered a usage error).
The G_make_mapset_element() function is used in several places to create nested directories such as grid3/some_raster3d_name for particular maps/objects. However, the original purpose of G_make_mapset_element() seems to be creating directories (such as fcell, vector) for these objects (objects such as elevation or bridges). This change introduces new API functions which make distinction between these scenarios. One function is for what most of the original API calls element that is the directory for objects of the same type. The other is for the objects themselves. The three new functions replace two existing ones. Two replace G_make_mapset_element() covering the two cases and one replaces G_make_mapset_element_tmp() covering only the common directory case because there is no direct use of this functionality in the current code. In the new API, dir_object is for the vector/name case and object_group for the vector and fcell cases. The vector/name case is a single function call with more parameters rather than two function calls and replaces the use of %s/%s syntax which is what was used before. Distinguishing the two scenarios allows the code to handle differently a race condition during creation when another process creates the same directory between the failed access call and mkdir call. In case of directory such as fcell or vector, all we asked for was that the directory exists at the end of the function call which will be fulfilled even when mkdir happened in another process. Same race condition for directories such as cell_misc/raster_map_name or vector/vector_map_name is actual problem and reporting it sooner rather than later is advantageous in detecting the issue (assuming that the issue is two processes are trying to create map of the same name). However, this is not the new behavior for some vector files where %s/%s syntax is still used to create the parent directories without making any distinctions about the purpose of the directory. This is the behavior which the code had before 3c37460 which was intended to imporve error message, but it changed also the the on-creation-race behavior. The underlying issue seems to be, at least partially, the unclear naming. Element sometimes refers to the particular file such as fcell/raster_map_name, but in the G_make functions, it refers to the directory of these files such as fcell. This new API does not use any great names, but the idea is to refer to element group directories (such as cell) and specific element files or directories (raster map piece, sub-directory for a specific vector map). The hope is that in the new API, words directory, object, and type are used in somewhat common way. Creation of misc elements (cell_misc and group) is now the two-phase creation of an object which is a dir. Function is internal (double underscore after G), used consistently, and documented such that element is the name of the concrete object, so keeping the name as is. The documentation of functions is based on the current usage and new names although the terminology is not a new official terminology.
The previous behavior is creating the corresponding directory in the current mapset anytime the module is executed (the intention was only when the path is in the current mapset according to the doc). Creating the directory makes sense in general given the intended use of the module (get path for a file in mapset, possibly in order to create it). Given that there is a distinction between creation of an element (dir) and file (even when file is a directory), having the functionality in a module allows for treating the element case in a special way (not implemented yet, but see OSGeo#1681). However, notably, the module is used only in one v6 addon as of now. However, given the primary function of this module (constructing a path regardless of its existence), writing operation is unexpected, so the new default is not writing anything and creation needs to be explicitly requested using a new -c flag which causes fatal error when the constructed path is not in the current mapset (asking for creation of something outside of the current mapset is considered a usage error).
The G_make_mapset_element() function is used in several places to create nested directories such as grid3/some_raster3d_name for particular maps/objects. However, the original purpose of G_make_mapset_element() seems to be creating directories (such as fcell, vector) for these objects (objects such as elevation or bridges). This change introduces new API functions which make distinction between these scenarios. One function is for what most of the original API calls element that is the directory for objects of the same type. The other is for the objects themselves. The three new functions replace two existing ones. Two replace G_make_mapset_element() covering the two cases and one replaces G_make_mapset_element_tmp() covering only the common directory case because there is no direct use of this functionality in the current code. In the new API, dir_object is for the vector/name case and object_group for the vector and fcell cases. The vector/name case is a single function call with more parameters rather than two function calls and replaces the use of %s/%s syntax which is what was used before. Distinguishing the two scenarios allows the code to handle differently a race condition during creation when another process creates the same directory between the failed access call and mkdir call. In case of directory such as fcell or vector, all we asked for was that the directory exists at the end of the function call which will be fulfilled even when mkdir happened in another process. Same race condition for directories such as cell_misc/raster_map_name or vector/vector_map_name is actual problem and reporting it sooner rather than later is advantageous in detecting the issue (assuming that the issue is two processes are trying to create map of the same name). However, this is not the new behavior for some vector files where %s/%s syntax is still used to create the parent directories without making any distinctions about the purpose of the directory. This is the behavior which the code had before 3c37460 which was intended to imporve error message, but it changed also the the on-creation-race behavior. The underlying issue seems to be, at least partially, the unclear naming. Element sometimes refers to the particular file such as fcell/raster_map_name, but in the G_make functions, it refers to the directory of these files such as fcell. This new API does not use any great names, but the idea is to refer to element group directories (such as cell) and specific element files or directories (raster map piece, sub-directory for a specific vector map). The hope is that in the new API, words directory, object, and type are used in somewhat common way. Creation of misc elements (cell_misc and group) is now the two-phase creation of an object which is a dir. Function is internal (double underscore after G), used consistently, and documented such that element is the name of the concrete object, so keeping the name as is. The documentation of functions is based on the current usage and new names although the terminology is not a new official terminology.
The current G_make_mapset_element() is used in several places to create
nested directories such as grid3/some_raster3d_name for particular maps/objects.
However, it seems to me that that's very different from what G_make_mapset_element()
should be doing which is creating directories (such as fcell, vector) for these
objects.
This PR introduces new API functions which make distinction between these scenarios.
One function is for what most of the original API calls element that is the directory
for objects of the same type. The other is for the objects themselves.
The three new functions replace two existing ones.
Two replace G_make_mapset_element() covering the two cases and one replaces
G_make_mapset_element_tmp() covering only the common directory case
simply because I didn't see the tmp version to be used in that other way.
Distinguishing the two scenarios allows the code to handle differently a race
condition during creation when another process creates the same directory
between the failed access call and mkdir call.
In case of directory such as fcell or vector, all we asked
for was that the directory exists at the end of the function call which
will be fulfilled even when mkdir happened in another process.
Same race condition for directories such as vector/vector_map_name
is actuall problem and reporting it sooner rather than later is advantageous in
detecting the issue (that is two processes are trying to create map of the same name).
I think part of the issue is the unclear naming. Element sometimes refers to
the particular file such as fcell/raster_map_name, but here it refers to the
directory of these files such as fcell. This new API does not use great names,
but the idea is to refer to element group directories (such as cell) and specific
element files or directories (raster map piece, sub-directory for a specific vector map).