config_archive.xml mistaken regular expressions

The short version:

`h\.bgc\..*?.?[_\d+]+\.nc$`  should be replaced by:
`h\.bgc\..*(\._\d+)?\.nc$`

`ic.[-\d(_\d)?+]+(\._\d*)?\.nc(\.\d*)?$` should be replaced by 
`ic\.[-\d_]+(\._\d*)?\.nc(\.\d*)?$`

---------------
I believe that the target of `[_\d+]` in `h\.bgc\..*?.?[_\d+]+\.nc$`  is the optional instance number; **.**_0001 .
The `[]` around `_\d+` makes the + a literal instead of a metacharacter (1 or more \d).

In addition, the + following the `[_\d+]` means that there _must_ be an instance number (actually, a +,_, or \d), 
which is not the goal.  I tested that the existing RE:
matches CASE.bgc.moredescription._0001.nc
but not   CASE.bgc.moredescription.nc
(Putting an _ between 'more' and 'description' short-circuits the match, which is another flaw of the `[_\d+]`)

One final suggestion is that `.bgc\..*?.?` is hard to parse and maybe needlessly complex.
The `.bgc.` part is clear, but `.*?` is a lazy/non-greedy expression that matches the minimum of 0 or more characters.
If nothing follows .bgc., then it matches 0 characters.  The `[_\d+]` would make it match all of the characters up to
the _, \d, or +.   But the optional single character `.?` between those makes it hard to predict what will match.
My best guess it that it makes the parser look ahead for a _, \d, or +, then look for a character before it,
and then assign any remaining characters before the optional character to the `.*?`.

I think that the `[_\d+]` should be replaced by an optional group (in which the + is a metacharacter)
and the `.*?.?` should be replaced by `.*`:
`h\.bgc\..*(\._\d+)?\.nc`

---------------

Farther down I see `ic.[-\d(_\d)?+]+(\._\d*)?\.nc`.   
`ic.` should probably be `ic\.`.
`?+` seems to mean 'matches the previous token  between zero and one times, as many times as possible, 
without giving back (possessive)'.
But it's inside a `[ ]` list, so the [RE tester](https://regex101.com/) I'm using makes it look like the group syntax is ignored 
and the metacharacters are disabled:
```
- matches the character - with index 4510 (2D16 or 558) literally (case sensitive)
\d matches a digit (equivalent to [0-9])
(_
   matches a single character in the list (_ (case sensitive)
   ( matches the character ( with index 4010 (2816 or 508) literally (case sensitive)
   _ matches the character _ with index 9510 (5F16 or 1378) literally (case sensitive)
\d matches a digit (equivalent to [0-9])
)?+
 matches a single character in the list )?+ (case sensitive)
   ) matches the character ) with index 4110 (2916 or 518) literally (case sensitive)
   ? matches the character ? with index 6310 (3F16 or 778) literally (case sensitive)
   + matches the character + with index 4310 (2B16 or 538) literally (case sensitive)
```
It does seem to find strings of -, _, and integers, but so does `ic\.[-\d_]+\.`, which is simpler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

config_archive.xml mistaken regular expressions #257

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

config_archive.xml mistaken regular expressions #257

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions