Incomplete functionality #3

snake-biscuits · 2021-07-03T01:48:40Z

I was looking at using this library for converting some python objects that use the struct module into C++ definitions

Unfortunately I found that this repo only handles a small subset of valid struct format strings

struct allows spaces in format strings, as well as integers preceding an identifier.
"I 128s 2h" is a valid format for the struct module, but not for this repo

Instead it fails on the first space, since it isn't listed as a valid character, rather than skipping it
The same for any integer.

I've made my own implementation to handle this, rather than a fork, since I also have to handle my custom objects
(The binary data is represented in nested structures, so my definitions also have to take that into account)

One common special case to look out for is strings:
With struct both "5c" & "5s" represent the equivalent of char[5] in C
However, the objects returned are different:

>>> import struct
>>> struct.unpack("5c", b"\x00\x01\x02\x03\x04")
(0, 1, 2, 3, 4)
>>> struct.unpack("5s", b"words")
b"words"

This is an important restriction, in writing my own function to standardise the format strings,
I had to include a case for strings, to ensure I recognised it as the single object struct would expect

Unfortunately my solution is a little clunky, but I'll share it here if you think you'd find it useful:

def simplify(_format):
	"""Converts a format string for the struct module into a list of type chars"""
    _format = _format.replace(" ", "")
    if _format[-1].isnumeric():
        raise RuntimeError("_format is invalid (ends in a number)")
    out, i = list(), 0
    while i < len(_format):
        char = _format[i]
        if char.isnumeric(): # find the whole number
            j = 1
            while _format[i:i+j].isnumeric():
                j += 1
            j -= 1
            count = int(_format[i:i + j])
            i += j
            f = _format[i]
            # NOTE: f is only assumed to be valid, validity is not checked here
            if f == "s":  # string
                out.append(f"{count}s")
            else:
                out.extend([f] * count)  # "2I" -> ["I", "I"]
        elif char.isalpha():  # lazily collect type
            out.append(_format[i])
        elif char == "?":  # bool is not an alphabetical character, but is still valid
            out.append("?")
        else:
            raise RuntimeError(f"Invalid character '{char}' in _format") 
        i += 1
    return out

This could probably be tidier but in my testing it seems to work with the format strings I'm using
(no pointer types, padding, size_t or 0 length types)

Using this with the dictionary lookups in struct_parse should handle both spaces and types preceeded by counts
Strings are passed through as "128s" etc., so a special case would need to be added
Presumably noting the type as string and also the size

I hope this is useful in expanding this repo to support all valid format strings the struct module accepts

The text was updated successfully, but these errors were encountered:

snake-biscuits · 2021-07-07T03:59:10Z

I've cleaned up my solution considerably using regex, via the built in re module

def split_format(_format):
    _format = re.findall(r"[0-9]*[xcbB\?hHiIlLqQnNefdspP]", _format.replace(" ", ""))
    out = list()
    for f in _format:
        match_numbered = re.match(r"([0-9]+)([xcbB\?hHiIlLqQnNefdpP])", f)
        # NOTE: does not decompress strings
        if match_numbered is not None:
            count, f = match_numbered.groups()
            out.extend(f * int(count))
        else:
            out.append(f)
    return out

this returns a tuple of strings, with all types but strings being multiplied out
should be far easier to work with than my old solution
one major point to note is that the second regex pattern does not match for s, leaving strings unchanged

zmarvel · 2021-07-09T03:04:26Z

Thanks for taking the time to report this bug and for sharing your code! I'll leave this issue open and take a look at it implementing it when I get some time 🙂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete functionality #3

Incomplete functionality #3

snake-biscuits commented Jul 3, 2021

snake-biscuits commented Jul 7, 2021

zmarvel commented Jul 9, 2021

Incomplete functionality #3

Incomplete functionality #3

Comments

snake-biscuits commented Jul 3, 2021

snake-biscuits commented Jul 7, 2021

zmarvel commented Jul 9, 2021