Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup msgpack str/bytes type mess #968

Closed
ThomasWaldmann opened this issue Apr 23, 2016 · 3 comments
Closed

cleanup msgpack str/bytes type mess #968

ThomasWaldmann opened this issue Apr 23, 2016 · 3 comments
Assignees
Labels
Milestone

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Apr 23, 2016

old msgpack

  • pack with use_bin_type=False: str -> raw, bytes -> raw
  • unpack with raw=True: raw -> bytes (so we never get str back automatically)
  • used by borg 0.x .. 1.2 (and attic)

new msgpack (2.0 spec)

  • pack with use_bin_type=True: str -> text, bytes -> bin
  • unpack with raw=False: text -> str, bin -> bytes
  • text is identical to raw!
  • encoding_errors defaults to strict for both pack and unpack, but guess it could be surrogateescapealso.

msgpack and borg

using old msgpack format is the cause for the ugly bytes-type keys in item dict (item[b'source']), see also #926.

by using an Item class it got a bit prettier a while ago, see #981.

borg 1.2 added borg.helpers.msgpack compatibility wrapper.

Breaking:

  • obviously, serializer output can not be the same as before (repo, manifest, archives, items, key files, RPC stream)
  • old code can't deal with new output
  • PR WIP: implement msgpack wrapper so that bytes and str roundtrip #973 (not merged) could not deal with old repo
  • if same new method is implemented for key files and RPC stream, the new code maybe can't deal with such old key files and old remote borg.

are we lucky?

new packb serialises 'foo' in same way as old packb serialized b'foo', so we can just drop the b everywhere.

>>> packb('string', use_bin_type=False) # old
b'\xa6string'

>>> packb(b'bytes!', use_bin_type=False) # old
b'\xa6bytes!'

>>> packb('string', use_bin_type=True) # new
b'\xa6string'

>>> packb(b'bytes!', use_bin_type=True) # new
b'\xc4\x06bytes!'

we just need to be careful:

  • new packb with bytes: generates different output.
  • new unpackb used with old data: would try to decode to str, which isn't wanted for binary data.
>>> m = packb(b'bytes!', use_bin_type=False)  # m == old bytes data (same for str)

>>> unpackb(m, raw=False) # new, "usual way" -> decodes to str
'bytes!'

>>> unpackb(m, raw=True) # new, "unusual way" -> does not decode, keeps bytes
b'bytes!'

borg 1.3

  • pack use_bin_type = True to generate new data in the future format
  • unpack raw=True + we know what we want and decode to str manually if desired

borg 2.0

  • we only have new data, change to raw=False
@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented May 4, 2022

Just tried switching msgpack to the 2.0 spec format.

Lots of changes in the code and tests are needed for that (but it gets prettier, usually no b'key', but just 'key', also less encode/decode).

I'ld like to have the new msgpack format at least for all persistent data structures (repo, archives, items, keys, ...).

But: the RPC protocol also uses msgpack, so we would need the old format to talk to old borg servers. Also, in the old archives, items, ... we of course also have serialized with the old format.

Any idea how that could nicely work without duplicating lots of code?

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 5, 2022
see ticket and borg.helpers.msgpack docstring.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 5, 2022
see ticket and borg.helpers.msgpack docstring.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 5, 2022
see ticket and borg.helpers.msgpack docstring.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 5, 2022
see ticket and borg.helpers.msgpack docstring.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 5, 2022
see ticket and borg.helpers.msgpack docstring.
@ThomasWaldmann
Copy link
Member Author

Hmm, weird idea about how to finish with this right now:

If we would tell the unpacker to use raw=False now, it would:

  • unpack old stuff always to str (decoding it internally using surrogateescape) - if that was wrong at the value should be bytes, we could just encode it again. as surrogatescape deals with anything, that should work.
  • unpack new stuff (correctly) to bytes or str

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 6, 2022
see ticket and borg.helpers.msgpack docstring.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 6, 2022
see ticket and borg.helpers.msgpack docstring.

this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).

still needed compat to the past is done via want_bytes decoder in borg.item.
@ThomasWaldmann ThomasWaldmann self-assigned this May 6, 2022
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 6, 2022
see ticket and borg.helpers.msgpack docstring.

this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).

still needed compat to the past is done via want_bytes decoder in borg.item.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 6, 2022
see ticket and borg.helpers.msgpack docstring.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 6, 2022
see ticket and borg.helpers.msgpack docstring.

this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).

still needed compat to the past is done via want_bytes decoder in borg.item.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 18, 2022
see ticket and borg.helpers.msgpack docstring.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue May 18, 2022
see ticket and borg.helpers.msgpack docstring.

this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).

still needed compat to the past is done via want_bytes decoder in borg.item.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Jun 9, 2022
see ticket and borg.helpers.msgpack docstring.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Jun 9, 2022
see ticket and borg.helpers.msgpack docstring.

this changeset implements the full migration to
msgpack 2.0 spec (use_bin_type=True, raw=False).

still needed compat to the past is done via want_bytes decoder in borg.item.
@ThomasWaldmann ThomasWaldmann modified the milestones: fluorine, 2.0.0a2 Jun 26, 2022
@ThomasWaldmann
Copy link
Member Author

This is mostly done for now, see the code in borg2 branch.

The only thing left to do is all the "we know what it should be" style type conversion code in borg.item.

@ThomasWaldmann ThomasWaldmann removed this from archive / item in breaking Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant