Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Allow optional strict decoding of windows-1251/1252 (new ops) #798

Closed
wants to merge 2 commits into from

Conversation

samcv
Copy link
Member

@samcv samcv commented Feb 7, 2018

This is an RFC for comment on these changes. The op names need to be
changed, so name suggestions are appreciated. Also I would like comment
on whether I should just change the arguments for any function which
is not used in a MoarVM OP, instead of having redundant functions,
would be appreciated.


We have new versions of encode, decode, and encoderep that allow
the new config flag. decoderep never existed before, so this feature
was added to the decoder functions and a function to decode supporting
replacements and a flag was added (decode2rep)

Flags for these op currently have only one setting:
Config flag 1:

  • Throws when it encounters a character which does not map to the
    other encoding.
  • When used under replacement mode, this causes
    replacements to be done with codepoints that may fit into the target
    encoding but are invalid (i.e. 129 in windows-1252).

This adds new ops (names will likely be changed)

  • decode2rep: we never had a decoderep op. This op replaces decoded
    characters that don't have official mappings with a supplied
    replacement string. Currently it is limited to replacement of strings
    with only one grapheme.
  • encode2: like encode except you can set an additional config flag
  • decode2: like decode except you can set an additional config flag
  • encoderep2: like encoderep except you can set an additional config
    flag

@jnthn
Copy link
Member

jnthn commented Feb 8, 2018

Not entirely keen on the 2 in the op name. Maybe conf could be better (so, encodeconf, encoderepconf, decoderepconf, etc.)

I think I'd prefer it slightly if the default were to be strict and we had a flag that made it loose.

One other thought: it's tiresome to implement every decoder twice, once streaming and once not. I wonder if we can do everything in terms of the streaming (not at the op level, but at the function level).

We have new versions of `encode`, `decode`, and `encoderep` that allow
a new config flag. These ops differ from the previous encode/decode
ops in that by default they decode using strict rather than permissive
methods. They throw on encountering a codepoint that is not mapped in
the standard and optionally will decode permissively.
These new ops all end with *config.

While all other ops are `config` variants of current ops, `decoderep`
never existed before, so `decoderepconfig` has been added to allow
decoding with replacements

By default with windows-1252 and windows-1251 these new ops:
* Throw when they encounter a character which does not map to the
  other encoding.
* When used under replacement mode (*repconfig), this causes
  replacements to be done with codepoints that may fit into the target
  encoding but are invalid (i.e. 129 in windows-1252).

This adds new ops:
* `decoderepconfig`: Strict and replaces decoded
   characters that don't have official mappings with a supplied
   replacement string. Currently it is limited to substituting the first
   grapheme of the supplied replacement string (should be useful in most
   cases).
* `encodeconfig`: like `encode` but strict by default and new config flag
* `decodeconfig`: like `decode` but strict by default and new config flag
* `encoderepconfig`: like `encoderep` but strict by default and new config flag
@samcv
Copy link
Member Author

samcv commented Feb 11, 2018

@jnthn I've updated it so that all the ops end with config and the default is to be strict. Full details in the commit message of the updated commit. I will probably merge it after the next release.

@samcv
Copy link
Member Author

samcv commented Apr 11, 2018

This was added in to master. Closing.

@samcv samcv closed this Apr 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants