You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What exactly happens when an error occurs in VW is not clear. This makes it hard to reason about or have confidence in what is happening. For the benefit of both users and maintainers we should set clear expectations at what should occur in the face of errors.
As can be seen in the current behavior section, the situation differs for per parser. If an error is encountered while parsing text a potentially invalid example is still used for learning. For JSON it depends on how the error originated as to what happens.
Recently --strict_parse was (added)[https://github.com//pull/1906] as a way to turn a parse issue into a fatal error. Prior to this change any issue encountered in the parser was printed and then ignored and VW continued along. Perhaps in a bad state.
This proposal is primarily targeted at clearing up behavior for command line (driver) usage, but library behavior is discussed for completeness.
Goals
Define expected behavior when a failure occurs during:
Initialization
Parsing
Learning/Predicting
Describe difference in error behavior of library vs driver usage
Allow for customization of handling in driver mode
Non-goals
Define or discuss the implementation or mechanism of error handling or propagation
Define or discuss the error-free intention of predict/learn. Errors originating from learn/predict can be interpreted as originating from a validation step prior to the traditional predict/learn routine.
Current behavior
Initialization
If an error is encountered during initialization this is seen as fatal.
Driver: error is considered fatal
Library: error is emitted to caller
Parsing
Text
Driver: if in --strict_parse mode, error is fatal. Otherwise, it is logged and ignored. VW continues using the error-causing example.
Library: if in --strict_parse mode, error is emitted to caller. Otherwise, it is logged and ignored. Caller has no way of knowing there was an issue.
DSJSON
When parsing DSJSON input:
If the error was caused by a RapidJSON error AND --strict_parse is on then it is a fatal error
If the error was caused by a RapidJSON error AND --strict_parse is off then the example is skipped
If the error was caused by a throw in the SAX parser then it results in a fatal error
JSON
When parsing JSON input:
If the error was caused by a RapidJSON error then it results in a fatal error
If the error was caused by a throw in the SAX parser then it results in a fatal error
Predict/learn
Driver: error is considered fatal
Library: error is emitted to caller
Customization
Driver:
--strict_parse allows for errors that are normally ignored to become fatal
Library: errors are emitted and can be handled
Proposed expected behavior
Initialization
No change.
Parsing and Learning/Predicting
Behavior should be consistent no matter which parser is being used.
Driver:
Introduce: --on_parse_error=, [skip,exit]
skip means the entire example is skipped
exit means the process exits
--strict_parse implies exit and is deprecated
To be most consistent with current behavior the default will be skip
Introduce: --on_learn_error=, [skip,exit]
skip means the entire example is skipped
exit means the process exits
To be most consistent with current behavior the default will be exit
Introduce: --on_error=, [skip,exit]
This is shorthand that will set both on_parse_error and on_learn_error to the value. If both on_error and any of on_parse_error or on_learn_error are set then it is a fatal initialization error.
Library: on_parse_error, on_learn_error, on_error, strict_parse will become strictly driver concepts and will not affect library mode.
strict_parse will stop affecting library usage (it does currently)
Errors will always be emitted to the caller
If a parser fails to parse its input it should fail and not attempt retry/reread logic internally
To achieve this a better interface for library based parsing may be required. As currently it is modeled as giving a source and expecting examples to be produced from it.
Note: this proposal intentionally removes the ability to try learning on a bad example as is currently possible with the text parser.
Note 2: metrics are currently reported for the number of lines of input skipped because of parse errors in DSJSON. This metric should be expanded to be for any parser and no longer be a DSJSON concept.
Timeline
Given the current "broken" nature of error behavior I propose this change be brought in for VW 9.
The text was updated successfully, but these errors were encountered:
Proposal: Clear error semantics
What exactly happens when an error occurs in VW is not clear. This makes it hard to reason about or have confidence in what is happening. For the benefit of both users and maintainers we should set clear expectations at what should occur in the face of errors.
As can be seen in the current behavior section, the situation differs for per parser. If an error is encountered while parsing text a potentially invalid example is still used for learning. For JSON it depends on how the error originated as to what happens.
Recently
--strict_parse
was (added)[https://github.com//pull/1906] as a way to turn a parse issue into a fatal error. Prior to this change any issue encountered in the parser was printed and then ignored and VW continued along. Perhaps in a bad state.This proposal is primarily targeted at clearing up behavior for command line (driver) usage, but library behavior is discussed for completeness.
Goals
Non-goals
Current behavior
Initialization
If an error is encountered during initialization this is seen as fatal.
Parsing
Text
--strict_parse
mode, error is fatal. Otherwise, it is logged and ignored. VW continues using the error-causing example.--strict_parse
mode, error is emitted to caller. Otherwise, it is logged and ignored. Caller has no way of knowing there was an issue.DSJSON
When parsing DSJSON input:
--strict_parse
is on then it is a fatal error--strict_parse
is off then the example is skippedJSON
When parsing JSON input:
Predict/learn
Customization
--strict_parse
allows for errors that are normally ignored to become fatalProposed expected behavior
Initialization
No change.
Parsing and Learning/Predicting
Behavior should be consistent no matter which parser is being used.
--on_parse_error=
, [skip
,exit
]skip
means the entire example is skippedexit
means the process exits--strict_parse
impliesexit
and is deprecatedskip
--on_learn_error=
, [skip
,exit
]skip
means the entire example is skippedexit
means the process exitsexit
--on_error=
, [skip
,exit
]on_parse_error
andon_learn_error
to the value. If bothon_error
and any ofon_parse_error
oron_learn_error
are set then it is a fatal initialization error.on_parse_error
,on_learn_error
,on_error
,strict_parse
will become strictly driver concepts and will not affect library mode.strict_parse
will stop affecting library usage (it does currently)Note: this proposal intentionally removes the ability to try learning on a bad example as is currently possible with the text parser.
Note 2: metrics are currently reported for the number of lines of input skipped because of parse errors in DSJSON. This metric should be expanded to be for any parser and no longer be a DSJSON concept.
Timeline
Given the current "broken" nature of error behavior I propose this change be brought in for VW 9.
The text was updated successfully, but these errors were encountered: