Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wxGUI/datacatalog: avoid ValueError when map name corrupted with multiple @ chars #966

Merged
merged 1 commit into from
Sep 11, 2020

Conversation

petrasovaa
Copy link
Contributor

No description provided.

@petrasovaa petrasovaa merged commit 9ed1a86 into OSGeo:master Sep 11, 2020
@petrasovaa petrasovaa deleted the catalog-fix-split branch September 11, 2020 15:02
@cmbarton
Copy link
Contributor

This solves the problem. Thanks.

@cmbarton
Copy link
Contributor

cmbarton commented Oct 6, 2020

Apparently it did not solve the problem. I just compiled 7.9 again today and the data catalog fails on launch with an unacceptable GRASS file name.

g.gui
Launching <wxpython> GUI in the background, please wait...
GRASS nc_spm_08_grass7/PERMANENT:~ > Process Process-68:
Traceback (most recent call last):
  File "/Applications/GRASS-7.9.app/Contents/Resources/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Applications/GRASS-7.9.app/Contents/Resources/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Applications/GRASS-7.9.app/Contents/Resources/gui/wxpython/datacatalog/tree.py", line 170, in getLocationTree
    maps_dict[mapset].append({'name': name, 'type': ltype})
KeyError: 'APELS@APELS'

@cmbarton
Copy link
Contributor

cmbarton commented Oct 6, 2020

It looks like the proposed fix may not have been applied.

@cmbarton
Copy link
Contributor

cmbarton commented Oct 6, 2020

I guess it was applied. But it still doesn't work. This is an unusual case, but it's hard to anticipate what might go wrong in a filename problem and cover all the bases. It still seems like try/accept is a more robust way to deal with this. No need to even give an error message if you think that is a problem. The problem file just won't be in the catalog--which is OK because GRASS could not read it anyway. But as it is, it blocks launching the important new data catalog.

@cmbarton
Copy link
Contributor

cmbarton commented Oct 6, 2020

replacing line 170 in tree.py with the following does fix this problem

        try:
            maps_dict[mapset].append({'name': name, 'type': ltype})
        except:
            pass 

@cmbarton
Copy link
Contributor

@petrasovaa Not sure if you saw my comment here. I did not reopen, but perhaps we should. The current code will cause the catalog set up to fail if a split on "@" creates a file name that is unacceptable in any way as a GRASS map name. My suggestion last week is a simple trap that lets the catalog build correctly without the offending map name.

@wenzeslaus
Copy link
Member

I'm sorry to be so direct, but the new error only shows how arbitrary the whole hunt for #959 fix is. If I'm guessing correctly, if there would be rsplit instead of split, it would work for you and we would not be having this discussion. If one split works and not the other that suggests that, if there is an issue at all, the issue is in g.list which produces output which cannot be parsed, i.e. produces invalid output.

As for addressing this specific issue, is it worth making the code more complex and thus increase maintenance cost for sake of an issue which seems like it will never happen again?

Generally, if data catalog needs to check map name and mapset name validity and ignore the invalid ones as the title of your original #959 suggests, so be it, sounds reasonable to me, but putting try-except-pass around every other line of code is unlikely to make the code more robust, but the opposite result is quite likely.

@cmbarton
Copy link
Contributor

Hi Vaclav. You miss my point. The new data catalog is a significant change in how GRASS will work. It is a fantastic achievement, especially since it has to parse though the very complicated legacy GRASS file structure that is impenetrable to most users, and deal with GRASS naming conventions for directories and files that are much stricter than those of any modern OS (only alpha-numeric characters plus underlines, must begin with a non-numeric, nothing else allowed). Because GRASS expects the database structure and names of directories and files, that simply live on the hard drive and not in an internally managed database, to conform to these strict organizational rules, it could make the data catalog code brittle, risking failure when it parses through the database. For this reason, every other code chunk in the parsing chain is enclosed in try/except error traps--except the very last parsing code chunk. I don't know why this last bit of code was missed. But the result is that when the code parses the database, if it runs into a single file that does not conform to GRASS naming conventions, the data catalog silently fails and will not initialize.

As you say, there is no easy way to anticipate all the possible deviations from GRASS naming conventions that are perfectly legitimate file names in modern systems. That's why I'm just suggesting to put a try/except trap around this last chunk of parsing code--just like the chunks before it--to keep a single file from silently preventing the data catalog from initializing. All the other preceding code chunks generate an error message for the except clause, which seems like a good idea. But at the least, 3 lines of error trap in this place to let the data catalog initialize does not seem to be an overly complex change and should not have repercussions any place else. I'm not suggesting try/except any place else. If there is a better way to avoid having such a single file crash the catalog on launch, that's great too.

@wenzeslaus
Copy link
Member

Thanks for the detailed explanation. If the issue is dealing with invalid databases, let's talk about that and not about hiding the error. It seems to me that g.list is returning invalid output if the database is corrupt in the way that mapset name or map name are not valid, specifically when they contain @ sign. How should g.list behave then the names are invalid? How it should behave when the invalid names cannot be represented in the output? Should g.list have some more robust output format for scripting such as JSON which could accommodate invalid names?

@cmbarton
Copy link
Contributor

I stole a bit of time today to look at this some more. It is baffling. It is not crashing on a GRASS command, but on a Python one. Here is what is happening.

I have grid3d display files that were stored by GRASS long ago as fully specified names (i.e., with the mapset). These are directories that GRASS reads as 3D maps. For example, one has a directory name of "punxo1_g3d@APELS".

This gets parsed by tree.py into a list called listOfMaps as "raster_3d/punxo1_g3d@APELS@APELS", where the string before the slash is the map type, the string after the last @ is the mapset, and the part in between is the map name.

The first split, line 168, splits on the "/". The first part is "ltype" and the second part is "wholename".

The next split (line 169 and the last of the entire parsing chain) tries to split wholename into the map name and mapset by splitting on the "@"

When I first hit this issue, the split on the list entry "punxo1_g3d@APELS@APELS" was splitting into 3 parts: "punxo1_g3d", "APELS", and "APELS". But the command was trying to parse the split into 2 variables. That caused the data catalog parsing to error out. Here is the original command.

name, mapset = wholename.split('@')

Anna tried to fix this by specifying that the last split should only split once

name, mapset = wholename.split('@', maxsplit=1)

But this command splits on the first "@", not the last one. So for this directory, the mapset is stored as "APELS@APELS". This causes the next statement to error out when it tries to store this in an existing dictionary for the mapset.

maps_dict[mapset].append({'name': name, 'type': ltype})

The maps_dict is made in line 148, with an entry for each mapset (parsed previously). So while there is an entry for maps_dict["APELS"], there is not one for maps_dict["APELS@APELS"]

So what is the best thing to do?

I can easily fix my files of course. But since this error was caused by some weird (if old) GRASS behavior, it might happen to other people. And because it simply stops the data catalog from initializing with no error message at all, it will be very difficult for anyone to figure out what is wrong. It has taken us a lot of effort to do this.

If we want to make the code more robust here, there are 2 ways.

  1. Change line 169 to split on only the last "@" to make sure it gets the mapset correct, regardless of the map name. This will probably solve the problem.

  2. I still lean a bit towards a try/except for this last piece of the parsing chain, like the rest of it, because who knows what other weird issue will pop up in this parsing. Like the preceding error traps, it could have an error message in the except clause that points the user to where the offending map directory lies.

        queue.put(
            (maps_dict,
             _("Failed to read map <{n}> from location <{l}>.").format(
                 l=location, n=wholename)))
        gscript.try_remove(tmp_gisrc_file)

@wenzeslaus
Copy link
Member

name, mapset = wholename.split('@', maxsplit=1)

But this command splits on the first "@", not the last one. So for this directory, the mapset is stored as "APELS@APELS". This causes the next statement to error out when it tries to store this in an existing dictionary for the mapset.

And that's my point about g.list output being invalid in this case. How anybody is supposed parse it? It is the first or second @? What if there are three @ signs? Nobody knows because g.list just returns separator (@) as part of the item/value. Is @ in mapset name or map name in g.list output? g.list knows, but the caller does not, so the problem is in g.list.

So what is the best thing to do?

  1. Change line 169 to split on only the last "@" to make sure it gets the mapset correct, regardless of the map name. This will probably solve the problem.

  2. I still lean a bit towards a try/except for this last piece of the parsing chain, like the rest of it, because who knows what other weird issue will pop up in this parsing. Like the preceding error traps, it could have an error message in the except clause that points the user to where the offending map directory lies.

  1. Change g.list to guarantee a parseable output, either by not returning maps with invalid names, or by not returning those invalid map names which have @ in them, or by providing format such as JSON which works even when map name is invalid.

@cmbarton
Copy link
Contributor

  1. Change g.list to guarantee a parseable output, either by not returning maps with invalid names, or by not returning those invalid map names which have @ in them, or by providing format such as JSON which works even when map name is invalid.

Yes. This is a good 3rd option. I was focusing on the data catalog code alone. g.list acquires maps, mapsets, and map types as separate pieces of information. Instead of parsing them into a text string, if it could also optionally output them in a more parsable format (JSON or even a python dictionary) it would help this kind of issue.

@cmbarton
Copy link
Contributor

While I do not deny that there may be larger issues to consider here. This specific issue remains a potential problem that can cause the data catalog to be non-functional and basically render GRASS non-functional. Until the larger issues are worked out, this specific issue can be fixed for now by 3 short lines of code, replacing line 184 of tree.py with:

            try:
                maps_dict[mapset].append({'name': name, 'type': ltype})
            except:
                pass

@neteler neteler added this to the 8.0.0 milestone Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Need error trap for unacceptable map names launching new data catalog
4 participants