-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add symbols API #505
Add symbols API #505
Conversation
I haven't looked at this closely enough to properly understand the techniques being used here to translate from the symbol info to the high-level API, but I'm concerned that using regexes to interpret the C types could be pretty fragile. The pmdsky-debug tooling does this to some extent, but at least in that case those assumptions are all tightly coupled to the headers themselves since they live together in the same repo. Making the same sort of tacit assumptions outside of the original repo makes me a bit worried. C is pretty flexible in the way things can be declared. Do these regexes break down if there's some slight variation on the "normal" syntax/style used in the pmdsky-debug headers currently? For example: // typedef'd struct
typedef struct {
int x;
} typedef_struct;
struct bare_struct {
typedef_struct y;
};
// keywords in weird places
char const *const const_string;
const char *volatile *volatile volatile_string;
// parens in weird places
int *(array_of_ptrs[8]);
int (*ptr_to_array)[8];
// un-typedef'd function pointer types
void (*function_ptr)(int);
void (*signal(int, void (*fp)(int)))(int);
// macro magic, like the existing ENUM_8_BIT and ENUM_16_BIT macros
#define MAKE_PAIR(a, b) \
struct pair_##a##_##b { \
a first; \
b second; \
};
MAKE_PAIR(int, char);
struct pair_int_char pair; Not that all of these constructs are particularly likely, but they are still possible. Have you considered using some proper AST-based approach? Something like |
You're right, it won't work with weirder declarations. However, right now the only headers that are parsed are those under the |
Well if we're only best-effort parsing I don't expect the extremely weird constructs, but you never know. The more tame ones could conceivably slip through? I remember somebody at some point tried to use the // Declaration that's commented out
// extern int foo;
extern int bar /* inline comment */ ;
extern struct baz
var; // long comment that causes the formatter to wrap the line halfway through the declaration So maybe it's worth making sure the regexes are resilient to that sort of thing, at least. |
Would the |
It shouldn't be a big deal, but I'd probably just make sure the parsing code here doesn't rely on the
Yeah it is rather weird, I probably wouldn't willingly allow it in code reviews. It's probably fine not to worry about it. I brought it up because there are C projects that put comments in the middle of code like this for various reasons. Some people like to use it to comment out stuff. Though I think probably it's more common in function prototypes for commenting individual parameters within the same line. |
Isn't the tag required when declaring a variable of that type? I'm not that familiar with C. |
"tag" refers to the "name" part of the struct. For |
So C supports the same syntax as more recent languages where struct and enum types can be referred to using only the tag? Why is |
I wouldn't say it "supports" it. You can only do it if you define the struct with typedefs, i.e. typedef struct {...} MyStruct; It's a stylistic preference. Some projects, like the Linux kernel, prefer to write the |
Oh, I see. |
After over two hours of pain, I have realized that you can't just keep a copy of the I really wish SkyTemple had better documentation. It is quite frustrating to realize you've been doing everything wrong because there were no hints as to what the proper way of implementing things is. |
@End45 I merged #506 so tests should now run through again when you update your branch. I'll do a review once all the checks run through, but in general I think this is great so I don't expect to have many change requests! |
Alright! I still expect to add a few more commits here though. Likely just minor fixes. |
Alright, now that the UI part is implemented, I think the backend is ready! The only thing that is missing would be formatting. However, I have some issues with the suggested changes there, especially with all the unnecessary line breaks the formatter wants to add. Some sections become borderline unreadable afterwards. I think the formatter settings would need to be reviewed. |
@End45 The formatter has next to no settings by design. You can disable the formatter for some lines: |
Looking at the docs, I think one of the reasons why so many line breaks are introduced is because the default max line length is 88 characters. Maybe it could be bumped to 120. I just tested it and that gets rid of most of the problematic lines. The rest are either not too bad after reformatting or warrant using the suppression tag. |
No, we will keep the defaults. |
Why though? 120 is the default I see the most nowadays, and the current setting really makes a lot of stuff almost unreadable. I could just litter the code with suppression tags, but I think it would be an even worse solution. |
79 is the maximum required by PEP 8. |
That makes no sense to me, tbh. Code is viewed on a single window 99% percent of the time, so that's what it should be optimized for. You can't tell me that it makes sense to hinder readability and inflate the line count just so we don't have to use the horizontal scrollbar in the 1% of cases where code is viewed on a split window. Again, if you think it's a better solution, I'll just add suppression statements anywhere I consider them to be necessary, but I really don't get why we would make everything harder for ourselves just becuase an arbitrary standard says so. |
Personally I work with multiple panes and zoomed in code all the time, I use 80 columns for that reason everywhere. |
I'll move this conversation to a thread on the Discord server, since I don't want to flood the PR with pictures. |
I raised the character line limit to 120. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Really nothing to add, just overall really cool!
Adds an API that combines binary editing and the pmdsky-debug symbols. Requires SkyTemple/pmdsky-debug-py#16.
The following components have been added:
RWValue
New class under
common/
that can be used to read/write a value of a given type at a given offset on a certain binary. Has one subclass for each supported data type. It mostly encapsulates theread_<type>
andwrite_<type>
methods that are currently implemented inutils.py
, while adding support for new data types (such as fixed-point numbers and instruction immediates).RWSymbol
New class under
hardcoded/symbols
. Used to link symbol information (normally pulled fromSymbol
instances) to RWValue instances. Combined, this allows reading/writing the value of a symbol with a single method call, while abstracting everything related to its type or offset.CType
Small helper class used to parse some C Types.
SymbolPath
Allows accessing the fields and elements of complex symbol types (such as arrays and structs) using a path string.
BinaryDataGetter
API class that allows retrieving some information about the game's binaries and the symbols contained in it. It's based on an existing implementation in one of SkyTemple's modules, which directly uses reflection to get the list of symbols for the current ROM. I thought it would be nice to abstract all that.
I've only implemented the methods I'm going to need, but more could be added in the future if required.
Manual structs
I wanted to have support for structs, but that would have required parsing every C type in pmdsky-debug in order to figure out the size and fields of each possible type. That's too much work, so I instead decided to manually implement an API for the most relevant structs I could find. They are described in
symbols/manual/structs.py
.The main downside is that the info could get outdated if any of those structs change in pmdsky-debug, but it's better than parsing the repo's entire C headers.
Manual enums
Similar idea, but with enums. I've included several types I considered relevant, listing each of their possible values.
Tests
I wrote tests for all the relevant classes implemented and included them in the
tests/
subfolder. I didn't look too much into how tests are normally set up because it seemed quite complicated. I didn't need a real ROM to run them either.So I don't know if the implementation needs changes. If so, let me know.
The tests will fail until the pmdsky-debug-py PR mentioned above is merged. They all pass in my dev environment.
Current TODOs
I'll mark the PR as a draft for now. Suggestions are welcome.