Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

m68k: data race initializing global g_instruction_table in build_opcode_table() #1397

Closed
tmfink opened this issue Feb 26, 2019 · 30 comments
Closed

Comments

@tmfink
Copy link
Contributor

tmfink commented Feb 26, 2019

The global array g_instruction_table init in M680XDisassembler.c:build_opcode_table() is a race condition.

Declaration:
https://github.com/aquynh/capstone/blob/de952a3e5a519b4a0d0dd2ef921a755891860ca6/arch/M68K/M68KDisassembler.c#L250-L251

Initialization:
https://github.com/aquynh/capstone/blob/de952a3e5a519b4a0d0dd2ef921a755891860ca6/arch/M68K/M68KDisassembler.c#L3787-L3827

This caused problems while working on the Rust language bindings. The tests are run in parallel:
- PR: capstone-rust/capstone-rs#60
- Travis CI failure: https://travis-ci.org/capstone-rust/capstone-rs/jobs/498529801#L623

Possible solutions:

  • Introduce synchronization in build_opcode_table()
    • Would require adding a threading library dependency (like pthread)
  • Declare array with appropriate values statically
    • Most efficient (run time) method
    • Does not add extra dependencies or change API
    • Requires deeper knowledge of m68k code
    • May require writing a script to generate C declaration
  • Introduce capstone_init() function that must be called before any other Capstone functions
    • API breaking change
    • Less ergonomic for users
  • Move the global variable into the private cs_struct
    • Less efficient: wastes time initializing and memory

Similar past issue: #1171

@aquynh
Copy link
Collaborator

aquynh commented Feb 26, 2019

@emoon, i think the solution (2) is the best.

@radare
Copy link
Contributor

radare commented Feb 26, 2019 via email

@emoon
Copy link
Contributor

emoon commented Feb 26, 2019

I would agree with @radare that a init function would be cleaner but (2) wouldn't break anything so I guess is what needs to be done. I guess what I need to do is to write a script that generates the table. I can include the script as an optional thing that people that changes the code may need to use. Users in general will never need to poke around in this code so having a manual step isn't the biggest issue I would say and in many cases one wouldn't need to change this anyway.

I can try to look at this during the week and hopefully today.

@aquynh
Copy link
Collaborator

aquynh commented Feb 26, 2019

I would agree with @radare that a init function would be cleaner but (2) wouldn't break anything so I guess is what needs to be done. I guess what I need to do is to write a script that generates the table. I can include the script as an optional thing that people that changes the code may need to use. Users in general will never need to poke around in this code so having a manual step isn't the biggest issue I would say and in many cases one wouldn't need to change this anyway.

m68k is a stable architecture, without any future update (?), so this is one time change, and should work well.

@emoon
Copy link
Contributor

emoon commented Feb 26, 2019

Yeah I think it will be fine.

@tmfink
Copy link
Contributor Author

tmfink commented Feb 26, 2019

@emoon definitely include the script that is used to generate the array. That will make future contributions easier. :-)

@emoon
Copy link
Contributor

emoon commented Feb 26, 2019

@tmfink yes, that is the plan :)

@radare
Copy link
Contributor

radare commented Feb 26, 2019 via email

@emoon
Copy link
Contributor

emoon commented Feb 26, 2019

Well the fix in this case won't involve any threading setup but I agree it would be nice with an init call but I will do the pre-create table approach here.

@emoon
Copy link
Contributor

emoon commented Feb 27, 2019

It's important to realize with this change there will be a file with ~65535 lines of code in it. While no compilers these days will have any issues with (esp as it's data only) it will likely bump the compile times a bit.

@tmfink
Copy link
Contributor Author

tmfink commented Feb 27, 2019 via email

@emoon
Copy link
Contributor

emoon commented Feb 27, 2019

Sure it can be an include.

@radare
Copy link
Contributor

radare commented Feb 27, 2019 via email

@aquynh
Copy link
Collaborator

aquynh commented Feb 27, 2019 via email

@aquynh
Copy link
Collaborator

aquynh commented Feb 27, 2019 via email

@emoon
Copy link
Contributor

emoon commented Feb 28, 2019

Alright. I will implement this on Friday. Expect a PR then

@fanfuqiang
Copy link
Contributor

fanfuqiang commented Mar 1, 2019

Why not add a C11 atomic flag?

typedef struct {

    void (*instruction)(m68k_info *info);    /* handler function */

     _Atomic _Bool is_init;         /* C11 atomic type */

    uint word2_mask;              /* mask the 2nd word */

    uint word2_match;             /* what to match after masking */

instruction_struct;

@aquynh
Copy link
Collaborator

aquynh commented Mar 1, 2019 via email

@fanfuqiang
Copy link
Contributor

Because we want to support all compilers?

It's part of standard C11 language, all platforms, and compilers can support.

@aquynh
Copy link
Collaborator

aquynh commented Mar 1, 2019 via email

@fanfuqiang
Copy link
Contributor

Even older compiler versions?

Well, maybe not.
compilers released after 2011 should be works.

@emoon
Copy link
Contributor

emoon commented Mar 1, 2019

I can use the C11 atomics for this yes if @aquynh think it's acceptable to require a C11 compatible compiler. Also there is no need to have is_init per entry. It's possible to rewrite the init loop instead and have several threads working on setting up the array with just one atomic counter.

@aquynh
Copy link
Collaborator

aquynh commented Mar 1, 2019 via email

@emoon
Copy link
Contributor

emoon commented Mar 1, 2019

Ouch :( ok, well I have a PR for the static table anyway.

@tmfink
Copy link
Contributor Author

tmfink commented Mar 1, 2019

Other issue with using atomics:

  • It's slower/more complicated than just having the table defined statically
  • There could be complications with calling into the C code from other language (such as Rust) where the caller needs to obey some extra logic
  • C11 means the standard was defined in 2011; compilers can be slow to adopt the standard. Different compilers may also only support parts of the standard.

@emoon
Copy link
Contributor

emoon commented Mar 1, 2019

To be fair having the table static has it's issue also:

  • Slower compile times
  • More memory usage during compilation
  • Larger source code

Also the calling code from the outside would be identical when using atomic so that wouldn't be much of an issue. Actually the creation of the table would be faster because multiple threads would do it at the same time but that it's not really a big issue as it's only being done once.

But as written above atomics won't be used anyway and I have the PR ready for the static table anyway.

@tmfink
Copy link
Contributor Author

tmfink commented Mar 1, 2019

@emoon Can we redefine instruction_struct to use smaller types (such as uint16_t) instead of uint?

https://github.com/aquynh/capstone/blob/de952a3e5a519b4a0d0dd2ef921a755891860ca6/arch/M68K/M68KDisassembler.c#L240-L244

Also, while you are already modifying the file, can you eliminate the uint #define? The width can change on different architectures. If we only need 32 bits, then we should say uint32_t.
https://github.com/aquynh/capstone/blob/de952a3e5a519b4a0d0dd2ef921a755891860ca6/arch/M68K/M68KDisassembler.c#L62-L64

Thanks for your work!

@emoon
Copy link
Contributor

emoon commented Mar 1, 2019

Sure I can fix that and thanks :) (edit: This is now fixed in the PR)

@emoon
Copy link
Contributor

emoon commented Mar 4, 2019

With #1408 being merged can this be closed now?

@aquynh
Copy link
Collaborator

aquynh commented Mar 5, 2019

close, thanks!

@aquynh aquynh closed this as completed Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants