Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check only reports one issue per file #60

Open
SID800 opened this issue Oct 3, 2019 · 3 comments
Open

check only reports one issue per file #60

SID800 opened this issue Oct 3, 2019 · 3 comments
Labels
enhancement New feature or request parser-revamp Related to core parser logic/functionality

Comments

@SID800
Copy link

SID800 commented Oct 3, 2019

Env:
ksconf 0.7.7rc1
Windows

Issue:
check only reports one issue per .conf file, even though there are other issues present. here is an censored example. between each run, I would correct the identified issue and save the .conf file.

c:\splunkdev\deployment-apps>ksconf check TA-MyCO_AddonName_SH\default\*.conf
Successfully parsed TA-MyCO_AddonName_SH\default\app.conf
Successfully parsed TA-MyCO_AddonName_SH\default\eventtypes.conf
Successfully parsed TA-MyCO_AddonName_SH\default\props.conf
Error in file TA-MyCO_AddonName_SH\default\savedsearches.conf:  Stanza [dashboard1] has duplicate key 'alert.digest_mode' in file TA-MyCO_AddonName_SH\default\savedsearches.conf
Completed checking 4 files.  rc=20 Breakdown:
   3 files were parsed successfully.
   1 files failed.

c:\splunkdev\deployment-apps>ksconf check TA-MyCO_AddonName_SH\default\*.conf
Successfully parsed TA-MyCO_AddonName_SH\default\app.conf
Successfully parsed TA-MyCO_AddonName_SH\default\eventtypes.conf
Successfully parsed TA-MyCO_AddonName_SH\default\props.conf
Error in file TA-MyCO_AddonName_SH\default\savedsearches.conf:  Stanza [dashboard1] has duplicate key 'request.ui_dispatch_view' in file TA-MyCO_AddonName_SH\default\savedsearches.conf
Completed checking 4 files.  rc=20 Breakdown:
   3 files were parsed successfully.
   1 files failed.

@lowell80 lowell80 added the enhancement New feature or request label Oct 4, 2019
@lowell80
Copy link
Member

lowell80 commented Oct 4, 2019

This is a known limitation. I've had this conversation with other users in the Slack channel about this a few times. (I thought an enhancement request was already open on the topic, but I can't find it now.) So thanks again for taking the time to provide feedback, and documenting this enhancement :-)

There's a fundamental parsing challenge here. Once a parsing error is encountered, the parser ends up in an unknown state, and therefore it's quite difficult to know what exactly is wrong with what comes next.

For example:

[sourcetype-1]
TIME_FORMAT = %s
TZ = UTC
SEDCMD-hide-password = s/password/*******/g

[sourcetype-2
TIME_FORMAT = %Y-%m-%d %T
EVAL-units case(bytes>1073741824 "GB", bytes> 1048576, "MB", bytes> 1024, "Kb", true(), "bytes")
...

So of course we should report that there's a problem with [sourcetype-2. As a human I can easily tell you that somebody simply forgot a closing ]. But parsers aren't that smart. (Also, there are legitimate lines that start with a [ that are NOT new stanzas, so we can't report that directly.) So if we just try to ignore that first error, and force the parser to press on, then we'll also issue a report saying that [sourcetype-1] has a duplicate TIME_FORMAT entry; which clearly is misleading. So we could just say, okay if you encounter an error, skip that stanza and the one after, and then start from a clean slate on sourcetype-3 (for example). Well now any legitimate errors in sourcetype-2 regarding EVAL-units won't get reported (or, maybe that was reported as en error against sourcetype-1?)

There's probably ways to optimize this output, but what you end up with is less consistent and predictable results. In other words, right now we know that ksconf check will only ever report the very first error, the first one it encounters (moving down through the file.) But if we try to optimize this, some of that consistency goes away. And simple code often is less buggy.

I realize this can be frustrating to keep going back to the same file over and over again. I've often found that new users, in particular, find this frustrating. (And it make sense, it's one of the first things you encounter. Basically any ksconf command you run will tell you about issues in your conf file, and you can't do anything "fun" with your conf files until you fix them.) But this should be mostly a one-time problem. (I strongly encourage users to setup a ksconf check hook as part of their normal git commit process, which helps ensures that bad conf entries get fixed quickly before they are propagated. Pre-commit is awesome for this, BTW!) So if you keep on top of these syntax errors, then in the long-run, this is typically quite manageable.

I pretty frequently make mistakes while editing .conf files, (that's a big reason why I wrote this tool in the first place!), but I typically don't make dozens of mistakes each time I make an edit. Typically only one or sometimes two, but not dozens.

I do want to make some sort of "greedy error" reporting option mode available at some point in the future. So that users can "opt-in" for more tentative error reporting which would be helpful when first-importing new files. But I hesitate to invest in that kind of code change right now because I'm planning on completely overhauling the core conf file parsing logic and implementing it a more featured, and proper-parsing sort of way. I'm certainly open to revisiting that priority. My primary motivation here is wanting to avoid implementing this type of logic twice.

TL; DR

  1. This is a current limitation
  2. This is really only a big deal the first time you run ksconf; Tools like pre-commit can be used to force validity checks.
  3. Half-fixing it would likely be worse (error reports could be wrong or misleading)
  4. I do plan on revisiting this in future version; when conf file parsing engine is rewritten. So if you prefer multiple errors over accuracy, then you can can "opt-in" to that mode.

Sorry for that really long response, but since this has come up a few times, I wanted to explain the topic more completely as a reference for others. Hopefully you found that useful.

This certainly is a "final decision" or anything like that. I do have open questions on this myself. Should this be treated as a technical or training issue? Or, should this be a higher priority, simply because it negatively impacts a user's first impression of the tool?

@SID800
Copy link
Author

SID800 commented Oct 4, 2019

Thanks for the detailed explanation. I can see where you're coming from with the code simplicity.

Counter argument would be any modern compiler finds/displays all the errors in the code at once and not the first one ;)

@lowell80
Copy link
Member

lowell80 commented Oct 4, 2019

Fair point. They also using proper parsers developed over decades with thousands of man hours invented. So in terms of maturity curve, friendly and accurate syntax error reporting, who knows, ksconf may even be ahead of the game at this point? LOL

The parser in ksconf right now is about 90 lines of code (+20 for unicode/BOM support); so it's quite simple.) I've looked at adopting some proper parsing frameworks; which will undoubtedly increase the code base, and decrease performance (not that parsing time is all that significant), but the big benefit is that it's not hand-coded so it can do a bunch better job at error reporting and/or attempting to resume after a syntax error.

If you or anyone you know has some hands on experience with TatSu I'd welcome the help!

@lowell80 lowell80 added the parser-revamp Related to core parser logic/functionality label Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request parser-revamp Related to core parser logic/functionality
Projects
None yet
Development

No branches or pull requests

2 participants