Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Clear error semantics #3234

Open
jackgerrits opened this issue Aug 18, 2021 · 0 comments
Open

Proposal: Clear error semantics #3234

jackgerrits opened this issue Aug 18, 2021 · 0 comments
Assignees
Labels
Feature Request New feature requested in system

Comments

@jackgerrits
Copy link
Member

jackgerrits commented Aug 18, 2021

Proposal: Clear error semantics

What exactly happens when an error occurs in VW is not clear. This makes it hard to reason about or have confidence in what is happening. For the benefit of both users and maintainers we should set clear expectations at what should occur in the face of errors.

As can be seen in the current behavior section, the situation differs for per parser. If an error is encountered while parsing text a potentially invalid example is still used for learning. For JSON it depends on how the error originated as to what happens.

Recently --strict_parse was (added)[https://github.com//pull/1906] as a way to turn a parse issue into a fatal error. Prior to this change any issue encountered in the parser was printed and then ignored and VW continued along. Perhaps in a bad state.

This proposal is primarily targeted at clearing up behavior for command line (driver) usage, but library behavior is discussed for completeness.

Goals

  • Define expected behavior when a failure occurs during:
    • Initialization
    • Parsing
    • Learning/Predicting
  • Describe difference in error behavior of library vs driver usage
  • Allow for customization of handling in driver mode

Non-goals

  • Define or discuss the implementation or mechanism of error handling or propagation
  • Define or discuss the error-free intention of predict/learn. Errors originating from learn/predict can be interpreted as originating from a validation step prior to the traditional predict/learn routine.

Current behavior

Initialization

If an error is encountered during initialization this is seen as fatal.

  • Driver: error is considered fatal
  • Library: error is emitted to caller

Parsing

Text

  • Driver: if in --strict_parse mode, error is fatal. Otherwise, it is logged and ignored. VW continues using the error-causing example.
  • Library: if in --strict_parse mode, error is emitted to caller. Otherwise, it is logged and ignored. Caller has no way of knowing there was an issue.

DSJSON

When parsing DSJSON input:

  • If the error was caused by a RapidJSON error AND --strict_parse is on then it is a fatal error
  • If the error was caused by a RapidJSON error AND --strict_parse is off then the example is skipped
  • If the error was caused by a throw in the SAX parser then it results in a fatal error

JSON

When parsing JSON input:

  • If the error was caused by a RapidJSON error then it results in a fatal error
  • If the error was caused by a throw in the SAX parser then it results in a fatal error

Predict/learn

  • Driver: error is considered fatal
  • Library: error is emitted to caller

Customization

  • Driver:
    • --strict_parse allows for errors that are normally ignored to become fatal
  • Library: errors are emitted and can be handled

Proposed expected behavior

Initialization

No change.

Parsing and Learning/Predicting

Behavior should be consistent no matter which parser is being used.

  • Driver:
    • Introduce: --on_parse_error=, [skip,exit]
      • skip means the entire example is skipped
      • exit means the process exits
      • --strict_parse implies exit and is deprecated
      • To be most consistent with current behavior the default will be skip
    • Introduce: --on_learn_error=, [skip,exit]
      • skip means the entire example is skipped
      • exit means the process exits
      • To be most consistent with current behavior the default will be exit
    • Introduce: --on_error=, [skip,exit]
      • This is shorthand that will set both on_parse_error and on_learn_error to the value. If both on_error and any of on_parse_error or on_learn_error are set then it is a fatal initialization error.
  • Library: on_parse_error, on_learn_error, on_error, strict_parse will become strictly driver concepts and will not affect library mode.
    • strict_parse will stop affecting library usage (it does currently)
    • Errors will always be emitted to the caller
    • If a parser fails to parse its input it should fail and not attempt retry/reread logic internally
      • To achieve this a better interface for library based parsing may be required. As currently it is modeled as giving a source and expecting examples to be produced from it.

Note: this proposal intentionally removes the ability to try learning on a bad example as is currently possible with the text parser.

Note 2: metrics are currently reported for the number of lines of input skipped because of parse errors in DSJSON. This metric should be expanded to be for any parser and no longer be a DSJSON concept.

Timeline

Given the current "broken" nature of error behavior I propose this change be brought in for VW 9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request New feature requested in system
Projects
None yet
Development

No branches or pull requests

1 participant