Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What would you like to be added to Construct? #824

Open
arekbulski opened this issue Jan 17, 2020 · 67 comments
Open

What would you like to be added to Construct? #824

arekbulski opened this issue Jan 17, 2020 · 67 comments
Labels

Comments

@arekbulski
Copy link
Member

@arekbulski arekbulski commented Jan 17, 2020

@DanseMacabre
I am opening this ticket for a chat...

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 17, 2020

@yannayl
I also welcome your input. How is Lazy* stuff working for ya?

@arekbulski arekbulski changed the title Chat with DanseMacabre What would you like to be added to Construct? Jan 17, 2020
@Evidlo

This comment has been minimized.

Copy link

@Evidlo Evidlo commented Jan 18, 2020

I'm still looking for a way to differentiate exceptions raised by Checksum.

I have multiple Checksums in my application which check different conditions (e.g. file validation, credentials authentication), but I have to print out a really generic error message when any of these fail because there's no way to tell the exceptions apart.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 18, 2020

I dont think this is really necessary to be available to everyone. But the solution is simple, just copy paste the source of Checksum into your script and change the constructor to take an exception type. Simple.

class Checksum(Construct):
    def __init__(self, checksumfield, hashfunc, bytesfunc, exception):
        super(Checksum, self).__init__()
        self.checksumfield = checksumfield
        self.hashfunc = hashfunc
        self.bytesfunc = bytesfunc
        self.flagbuildnone = True
        self.exception = exception

    def _parse(self, stream, context, path):
        hash1 = self.checksumfield._parsereport(stream, context, path)
        hash2 = self.hashfunc(self.bytesfunc(context))
        if hash1 != hash2:
            raise self.exception("wrong checksum, read %r, computed %r" % (
                hash1 if not isinstance(hash1,bytestringtype) else binascii.hexlify(hash1),
                hash2 if not isinstance(hash2,bytestringtype) else binascii.hexlify(hash2), ))
        return hash1

    def _build(self, obj, stream, context, path):
        hash2 = self.hashfunc(self.bytesfunc(context))
        self.checksumfield._build(hash2, stream, context, path)
        return hash2

    def _sizeof(self, context, path):
        return self.checksumfield._sizeof(context, path)
@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 18, 2020

This approach was kinda already written in the docs, just not so directly. See for yourself:
https://construct.readthedocs.io/en/latest/extending.html#constructs
I added the green box just now.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 18, 2020

I just found your old Issue that you had for this: #731

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 19, 2020

I guess we can also post here #826 which is showing path in exceptin messages. Already done.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 19, 2020

MESSAGE TO EVERYONE: I am about to remove Embedded and EmbeddedSwitch from the core library. I know its in use here and there but please understand, the semantics have changed so much that using this as is now is pointless.

@sbasharati

This comment has been minimized.

Copy link

@sbasharati sbasharati commented Jan 20, 2020

@arekbulski Some code examples to migrate Embedded and EmbeddedSwitch to 2.10 would be useful.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 20, 2020

The alternative to embedding is simply nesting. Nesting is already covered in the docs through and through. Nesting is the recommended way of doing things anyway.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 20, 2020

@sbasharati

This comment has been minimized.

Copy link

@sbasharati sbasharati commented Jan 21, 2020

@arekbulski GreedyByte is no longer able to build from bytearray in python 3.
the example below works in python 2.7 but not 3. prior to 2.10 this was working for both 2 and 3. I believe btyearray should be treated as bytes.

d = bytearray(b'\x11\x01\x05')
GreedyBytes.build(d)

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 21, 2020

This must have been an issue at 2.9 or earlier, because 2.10 didnt touch these things.
But yes, I will fix this.
Fixed it.

@yannayl

This comment has been minimized.

Copy link

@yannayl yannayl commented Jan 23, 2020

Sorry for being late to the party. I moved to different projects, I still use construct regularly but I am very satisfied with the current feature set. Thanks, btw :)

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 24, 2020

TBH I am also very satisfied with the current feature set. :) Aside of 1-2 open tickets to be fixed in foreseeable time, there is little for me to do here. Makes me feel kinda useless.

@yannayl

This comment has been minimized.

Copy link

@yannayl yannayl commented Jan 24, 2020

There is always a place for a talented programmer.
If you are looking for a major feature then the only thing I can think of is fuzzing support. But I don't have a concrete project for it in hand (I use other fuzzing engines that also offer complementary features such as process management in addition to input generation).
You may take a look at this https://github.com/enigmatos/Fuzzlon I used construct to generate random 802.15.4 packets.

@arekbulski arekbulski reopened this Jan 24, 2020
@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 24, 2020

Do you mean adding something like build but in a way that all values are random and not provided to the build function? I dont think I will be adding something like that.

@yannayl

This comment has been minimized.

Copy link

@yannayl yannayl commented Jan 24, 2020

Yes, something along these lines more or less. In a fuzzing course I took 3 years ago in BlackHat they gave us a library for fuzzing that was heavily inspired by construct. So there was a need back then. Not sure if there is now.

@guywhataguy

This comment has been minimized.

Copy link

@guywhataguy guywhataguy commented Jan 24, 2020

It would be very helpful if there were examples of using construct to parse and manipulate complex yet common protocols such as IPv4, ICMP, TCP, etc...

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 24, 2020

The deprecated_gallery code is there for you to bring it up to 2.10...
PR me the code after you translated it.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 24, 2020

@yannayl I was looking over FuzzIon source and found this naughty boy:

def LEBitsInteger(n):
    return ct.ByteSwapped(ct.BitsInteger(n))
@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 24, 2020

@guywhataguy Sorry for being condescending. Sometimes I just get too much of it, people coming and just demanding code. The gallery is there, but someone familiar with the actual data formats should rewrite the code for 2.10. Mostly removing embeddings and such. Networking packets are tricky, especially DNS packets. I never got that right.

@yannayl

This comment has been minimized.

Copy link

@yannayl yannayl commented Jan 24, 2020

@arekbulski haha yes, you remember we talked about it?
Many twisted protocols out there. I think I even PRed you the code to support it as part of the library.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 24, 2020

Yes, the insane idea of applying little endianness to non-multiples of 8 bits. Jeez luize.

@yannayl

This comment has been minimized.

Copy link

@yannayl yannayl commented Jan 25, 2020

Bitwise endianness is a thing 🤷
#770 (comment)

@yannayl

This comment has been minimized.

Copy link

@yannayl yannayl commented Jan 26, 2020

@arekbulski as much as I feel for GHidra users, I see no reason to support python 2 any further. Of course it requires updating a major so nobody cries too much.
Worst case, if python2 user wants to use it, this version is fine. After all, we are having hard time to find any new features to implement.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 26, 2020

Yes, if you want Python 2 support then use 2.9.*.
PS: What is GHidra and whats all about it?

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 26, 2020

Now I understand why you were hinting at type annotations. Yeah, python 3.5 or 3.6 support those. The thing is, I do not like those. It adds yet-another-one syntax and complicates things beyond need.

@danilobellini

This comment has been minimized.

Copy link

@danilobellini danilobellini commented Jan 26, 2020

AFAIK, Python 2.7.18 will be released on April 2020, and it was claimed that it would be the last Python 2.7.x version. Nevertheless, a lot of packages are dropping support for Python 2 these days.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 26, 2020

I think I will also change the Container class to derive from OrderedDict and not implement all of this stuff from scratch. Should make less lines of code nicely.

@yannayl

This comment has been minimized.

Copy link

@yannayl yannayl commented Jan 26, 2020

Yes, type hinting complicates things a little but it plays really nicely with static analysis tools which are very helpful (e.g. IDE suggestions).

Ghidra is the NSA's reverse engineering tool that became open source last year. Its written in Java and can be scripted with Jython. That's the only legit use of python2 I'm aware of.

https://twitter.com/matalaz/status/1107989329631215616?s=19

@danilobellini

This comment has been minimized.

Copy link

@danilobellini danilobellini commented Jan 26, 2020

Check RawCopy class, it provides a byte length too.

I've found this way:

ExprAdapter(
    RawCopy(Prefixed(
        lengthfield=Int16ub,
        subcon=Union(
            0,
            "integer" / Int32sb,
            "bytes" / Bytes(4),
        ),
        includelength=True,
    )),
    lambda obj, ctx: Container(size=obj.length, **obj.value),
    lambda obj, ctx: {"value": obj},
)

Indeed, that puts the size of the object in the container on parsing, and that should work even when the lengthfield has variable length.

Anyway, I still think this behavior can be included as a storelength="size" keyword argument for Prefixed, one that could also check the length value on building if it's part of the input dictionary.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 27, 2020

Its too custom made to be included in the mainstream version of the library. Copy paste Prefixed and modify it to your needs.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 27, 2020

Its official, I just dropped 2.7 and 3.5 from the supported runtimes.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 27, 2020

Readthedocs just broken. Assisstance required:
sphinx-doc/sphinx#7069

@tmr232

This comment has been minimized.

Copy link

@tmr232 tmr232 commented Jan 28, 2020

@yannayl @arekbulski
I played a bit with type annotations & construct recently.
I kinda like the results

from construct import *
from typing import Type

def attruct(cls: Type):
    return Struct(**cls.__annotations__)


@attruct
class Format:
    signature: Const(b"BMP")
    width: Int8ub
    height: Int8ub
    pixels: Array(this.width * this.height, Byte)


@attruct
class OtherFormat:
    @attruct
    class SomeFmt:
        signature: Const(b"BMP")
        width: Int8ub
        height: Int8ub
        pixels: Array(this.width * this.height, Byte)

    format: SomeFmt

and PyCharm seems to like them too. Figured some actual examples may help the discussion.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 28, 2020

This leaves me on the fence. On one hand, I like the examples, on the other hand, there is already way too many ways of writing structs. That includes + operator, Struct(a=Byte), Struct("a"/Byte). I suggest to everyone to use the last syntax.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 28, 2020

Those annotations already have a ticket: #814 (comment)

@tmr232

This comment has been minimized.

Copy link

@tmr232 tmr232 commented Jan 28, 2020

Given how little code is required to make it work, I'm happy as long as we keep the Struct(a=Byte) option.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 28, 2020

Fine, I was thinking of removing it but sure okay.

@tmr232

This comment has been minimized.

Copy link

@tmr232 tmr232 commented Jan 28, 2020

Well, my decorator from the previous example could easily be replaced with:

def attruct(cls: Type):
    return Struct(*(key / value for key, value in cls.__annotations__.items()))

So it doesn't really matter. Though I personally prefer the = syntax over the / one.

@yannayl

This comment has been minimized.

Copy link

@yannayl yannayl commented Jan 28, 2020

I think I will switch to the = operator as well. It looks more readable imho

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 29, 2020

I get it guys. Syntax stays.
Unfortunately all work stops until RTD docs start building successfully.

@tmr232

This comment has been minimized.

Copy link

@tmr232 tmr232 commented Jan 30, 2020

@arekbulski it seems like the RTD build is still using Python2.7.
Adding a .readthedocs.yml and requiring that the docs get built with 3.7 should probably do the trick (though I don't know how to test it myself)
https://docs.readthedocs.io/en/latest/config-file/v2.html

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 30, 2020

Could you PR it? I dont know how to write it myself.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 30, 2020

@tmr232

This comment has been minimized.

Copy link

@tmr232 tmr232 commented Jan 30, 2020

Created #835
I think it has what it needs to work, but I guess it needs to be a branch in the project so that we can test it.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 30, 2020

IT WORKS!!! IT WORKS!!!! I owe you a beer, man.
Now we just need something to work on...

@tmr232

This comment has been minimized.

Copy link

@tmr232 tmr232 commented Jan 30, 2020

I guess it's high-time to drop all the extra Python2 code.

@tmr232

This comment has been minimized.

Copy link

@tmr232 tmr232 commented Jan 30, 2020

@arekbulski would you accept PRs that remove legacy code / simplify internals? (Not at all sure that I'll submit any, but I might)

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 30, 2020

Sure, I would consider such PRs. You can also tell me in advance what you want to remove so I can tell you yay or nay on the spot.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Jan 30, 2020

I also like when people submit new test cases, to extend the test suite. Thats something worthy.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Feb 6, 2020

I am taking a short leave since there is little for me to do.
Priority ONE; fixing imp in compilation'
I would very much like to dedicate this work to our user base.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Feb 6, 2020

@jpsnyder You are welcome to share with us, your ideas for changes.

@arekbulski

This comment has been minimized.

Copy link
Member Author

@arekbulski arekbulski commented Feb 6, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.