Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API design: string arguments (static) #7

Closed
Gekkio opened this issue Aug 25, 2015 · 7 comments
Closed

API design: string arguments (static) #7

Gekkio opened this issue Aug 25, 2015 · 7 comments

Comments

@Gekkio
Copy link
Contributor

Gekkio commented Aug 25, 2015

One design decision that is often discussed is how string arguments are handled. This issue is for discussing static string arguments (= string literals). Also, the discussion considers safe Rust code. As usual, in unsafe code, all bets are off.

First, let's look at the facts:

  • ImGui expects UTF-8 strings with a null terminator. Interior nul characters are therefore not supported, and will truncate the string
  • Invalid UTF-8 will not crash ImGui, and is comparable to a typo in text. A crash is a bug, and should be fixed upstream
  • Rust &str slices are guaranteed to be UTF-8
  • Rust CString values are guaranteed to have a null terminator and no interior nuls
  • Rust supports both string literals with type &'static str (e.g "hello") and byte string literals &'static [u8] (e.g. b"Hello")
  • Rust string literals are UTF-8 with support for interior nuls
  • Rust byte string literals are ASCII + optional arbitrary bytes with escapes
  • Rust source code files are UTF-8

See the Rust reference for more details about string and byte string literals.

Syntax choices

Let's look at some different syntax choices (not considering possible conversion using Into):

  1. ui.text(im_str!("Hello")); (macro)
  2. ui.text(ImStr::new("Hello")); (ImStr)
  3. ui.text(CString::new("Hello").unwrap()); (CString)
  4. ui.text("Hello"); (string)
  5. ui.text("Hello\0"); (string with explicit null terminator)
  6. ui.text(b"Hello"); (byte string)
  7. ui.text(b"Hello\0"); (byte string with explicit null terminator)

Most people would agree with me that number 4 is the optimal syntax. Number 6 doesn't add any value and in fact makes it impossible to use non-ASCII characters in the literal. Numbers 5 and 7 require the library user to remember the null terminator. Number 1 uses a macro, which is a very custom way of doing things. Numbers 2 are 3 are quite similar from a syntax point of view, but by using a custom ImStr type we can guarantee some extra safety, and can avoid the extra error check by handling interior nuls in a different way.

Dynamic allocation

One important question is whether passing a string requires dynamic allocation and copying. Dynamic allocation is required in these situations:

  • The string requires modifications (e.g. adding null terminator, skipping interior nuls)
  • The used type wants to take ownership of the string instead of borrowing it (e.g. CString always makes a copy)

I'm not yet sure if avoiding dynamic allocation is an unnecessary micro-optimization or an important design goal. I think that it should at least be easy to avoid dynamic allocation.

Another question in dynamic allocation is whether it should be explicit or implicit. For example, should we use the Into trait to do conversions to ImStr, and force the library user to call .into() if a conversion requires allocation? In general Rust tends to be explicit about things, so explicit calls would probably be more Rust-like.

Null termination

ImGui requires null terminated strings, while Rust embraces string slices with known sizes and no null termination. Therefore, there's a couple ways to handle null termination (in decreasing order of attractiveness):

  1. Conditional null termination at compile-time
  2. Unconditional null termination at compile-time
  3. Conditional null termination at runtime
  4. Unconditional null termination at runtime
  5. No null termination

Conditional termination means that we don't add a null terminator if the last byte is already null. I don't know a way of doing this at compile-time, so 1 doesn't seem to be possible at the moment. Having no null termination would be hazardous and is not acceptable for safe Rust code.

Interior nuls

Since ImGui uses the nul character for termination, interior nuls will truncate the string. There's a couple of ways to approach this:

  1. Interior nuls are a compile-time error
  2. Interior nuls are skipped at compile-time
  3. Interior nuls are a runtime error
  4. Interior nuls are skipped at runtime
  5. Interior nuls are left as is and will cause truncation

Since interior nuls will not cause any crashes, IMHO number 5 is reasonable. I find this behaviour to be similar to many other situations which Rust considers to be not unsafe, but possibly undesirable.

UTF-8 validity

ImGui wants UTF-8 strings but won't crash if invalid UTF-8 is passed to it. Having a compile-time or runtime guarantee of UTF-8 is nice, but should only be included if it doesn't complicate the API.

If the API uses Rust string literals in some way, UTF-8 is guaranteed at compile-time. Therefore, byte strings or other approaches need to have some value over strings to be worth using.

Comparison of approaches

1. Macro

Dynamic allocation: None
Null termination: By default 4. Unconditional null termination at runtime
Interior nuls: By default 5. Interior nuls are left as is and will cause truncation
UTF-8 validity: Guaranteed at compile-time

The im_str macro expands to code that concatenates a null terminator at compile time, and uses an unsafe constructor to create an ImStr instance:

im_str!("Hello")

basically compiles to

unsafe { ImStr::from_bytes("Hello\0".as_bytes()) }

Pros:

  • Null terminator concatenation is done at compile time
  • Based on string literals, so UTF-8 validity is guaranteed without extra cost (UTF-8 correctness is checked compile-time)
  • No runtime extra costs

Cons:

  • The macro is a bit unwieldy to use

2. ImStr

This just makes the ImStr wrapping explicit, so everything depends on how the literal argument works. See 4, 5, 6 and 7.

3. CString

Dynamic allocation: On every use
Null termination: 4. Unconditional null termination at runtime
Interior nuls: 3. Interior nuls are a runtime error
UTF-8 validity: By default not guaranteed

Pros:

  • IMHO none

Cons:

  • Dynamic allocation
  • Requires handling of runtime NulError
  • Not guaranteed to be UTF-8
  • IMHO does not add any value over a custom type

4. String

Dynamic allocation: On every use
Null termination: By default 4. Unconditional null termination at runtime
Interior nuls: By default 5. Interior nuls are left as is and will cause truncation
UTF-8 validity: Guaranteed at compile-time

Pros:

  • Easy to use and read
  • Guaranteed to be valid UTF-8 at compile-time

Cons:

  • Dynamic allocation
  • Null termination needs to be handled at runtime

5. String with null terminator

Dynamic allocation: Conditional
Null termination: By default 3. Conditional null termination at runtime
Interior nuls: By default 5. Interior nuls are left as is and will cause truncation
UTF-8 validity: Guaranteed at compile-time

Pros:

  • Quite easy to use and read
  • Guaranteed to be valid UTF-8 at compile-time

Cons:

  • Non-ASCII literals are hard to use
  • Library user needs to remember to add the null terminator
  • Null termination needs to be handled at runtime

6. Byte string

Dynamic allocation: On every use
Null termination: By default 4. Unconditional null termination at runtime
Interior nuls: By default 5. Interior nuls are left as is and will cause truncation
UTF-8 validity: By default not guaranteed

Pros:

  • Easy to use if only ASCII is used

Cons:

  • Dynamic allocation
  • Non-ASCII literals are hard to use
  • Not guaranteed to be UTF-8
  • Null termination needs to be handled at runtime

7. Byte string with null terminator

Dynamic allocation: Conditional
Null termination: By default 3. Conditional null termination at runtime
Interior nuls: By default 5. Interior nuls are left as is and will cause truncation
UTF-8 validity: By default not guaranteed

Pros:

  • Quite easy to use if only ASCII is used

Cons:

  • Non-ASCII literals are hard to use
  • Not guaranteed to be UTF-8
  • Library user needs to remember to add the null terminator
  • Null termination needs to be handled at runtime
@Gekkio
Copy link
Contributor Author

Gekkio commented Aug 25, 2015

So, after writing that big wall of text I'd like to make a proposal :)

Firstly, I don't want to give up on the macro. It might be a good idea to promote another approach as the primary way of doing things, but for me the macro hits the sweet spot of readability and runtime cost.

Secondly, I'd like to keep allocations explicit in some way, so .into() calls are required when doing conversions.

Thirdly, as I explained in the text, I think interior nul handling is not necessary since the behaviour is not unsafe.

So, there would be three ways of passing strings:

  1. ui.text("Hello".into());
  2. ui.text("Hello\0".into());
  3. ui.text(im_str!("Hello"));

Primary way: string literals + Into + conditional runtime checks

This is based on nwydo's proposal on Reddit

By adding a conversion with Into from &'a str to ImStr<'a>, we can use normal string literals with a runtime cost. Null terminator handling would be conditional at runtime, so if the library user remembers to add null terminators, no copy would be needed.

Approach Runtime cost
"Hello".into() Check + copy + append null
"Hello\0".into() Check
im_str!("Hello") None

Advanced way: macro

The advanced option is to use the current macro as is.

@ocornut
Copy link
Contributor

ocornut commented Aug 26, 2015

To chime in a little: (I am not a Rust programmer but I've used ImGui extensively)

I'm not yet sure if avoiding dynamic allocation is an unnecessary micro-optimization or an important design goal.

I don't think you want or can avoid them in every case, but for passing literal I think it would make meaningful difference. As you start using ImGui for serious work, submitting thousands of items, lists, tree nodes, etc. You don't want every label to be causing an allocation. It would be acceptable if the system required an allocation the first time but none after while (I don't know if that's very compatible with Rust ecosystem but it's close to what ImGui often does itself).

For large amount of UI, unless the user manually clip things, the CPU bottleneck within ImGui is usually the calculation of text size (width/height) which requires a single pass on every character of the string. So any treatment on a string is likely to increase this bottleneck. Once the size is calculated we can advance the layout and perform clipping.

Based on this my intuition is that the macro is preferable. I would call it imstr! without the underscore personally, considering it is a single word.

@emoon
Copy link

emoon commented Dec 26, 2015

Hi, I had/have the same problem so I decided to go with this approach:

I just send in regular str (no macro) and then I have a small StringHandler which has a local array of 512 bytes (on the stack) if the string fits in there I just copy it there, null terminate and and use that when I call down to ImGui otherwise I just use CString instead.

I would say that 99% of the cases will fit in 512 bytes and can of course be changed to larger size if wanted.

I made an implementation over here

https://github.com/emoon/ProDBG/blob/rust/api/rust/prodbg/src/cfixed_string.rs

@bitshifter
Copy link
Contributor

I've taken a slightly different approach to support &str as parameters to support ui.text("Hello"); syntax without allocations or needing to make temporary null-terminated copies of the string.

I've modified ImGui and cimgui to support taking a string slice parameter instead of a char* for strings. The slice is a struct that contains begin and end pointers to the string data. It's easy to create this from a Rust &str.

There's some discussion about the changes on the ImGui issue tracker here ocornut/imgui#494 and the ImGui changes are here https://github.com/bitshifter/imgui/tree/imstr

I've had to also fork cimgui here https://github.com/bitshifter/cimgui/tree/imstr

And finally imgui-rs here https://github.com/bitshifter/imgui-rs/tree/imstr

For example, with these changes the hello_world.rs now looks like:

fn hello_world<'a>(ui: &Ui<'a>) {
    ui.window("Hello world")
        .size((300.0, 100.0), ImGuiSetCond_FirstUseEver)
        .build(|| {
            ui.text("Hello world!");
            ui.text("This...is...imgui-rs!");
            ui.separator();
            let mouse_pos = ui.imgui().mouse_pos();
            ui.text(&format!("Mouse Position: ({:.1},{:.1})", mouse_pos.0, mouse_pos.1));
        })
}

I'm not sure if my ImStr changes will make it into ImGui itself, but even if they don't, it shouldn't (hopefully) be a huge effort to maintain a fork with string slice support.

@Gekkio are you interested in a PR of for this?

@Gekkio
Copy link
Contributor Author

Gekkio commented Jan 24, 2016

This looks really good! If this is implemented in upstream, it would greatly improve the ergonomics of this library.

That being said, I'm not sure yet how I feel about using a fork. Firstly, the fork would need to maintained actively. Secondly, we'd need to make sure the changes are correct and continue to be correct in the future. Upstream has no tests, so maintaining correctness and same semantics in a fork is not completely trivial.

So, right now I can't say yes, but I'm also not saying no, because the difference in ergonomics is huge and warrants attention, and this is very impressive work. I'll think about this and review some of the changes when I have some time

@bitshifter
Copy link
Contributor

That's understandable Joonas.

My intention is to keep the fork updated with official imgui releases, so keep an eye on the branch :)

Obviously it's early days but I'm hoping maintaining this fork won't involve much maintenance, code changes have been reasonably minimal so far. I agree testing isn't entirely trivial, thought the demo does a good job of exercising most of the code paths and it's reasonably quick to do a manual sanity check.

@dbr
Copy link
Contributor

dbr commented Oct 8, 2021

Closing, superseded by #517

@dbr dbr closed this as completed Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants