Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Better handling of null terminator in strings #2678

Open
1 task done
gabrieldemarmiesse opened this issue May 16, 2024 · 4 comments
Open
1 task done
Labels
enhancement New feature or request mojo-repo Tag all issues with this label

Comments

@gabrieldemarmiesse
Copy link
Contributor

gabrieldemarmiesse commented May 16, 2024

Review Mojo's priorities

What is your request?

I would like a discussion to answer the following questions:

  1. Is having a null terminator in String aligned with the goals of Mojo?
  2. If yes, what can we do to make working with String less painful concerning the null terminator? And make String more safe?

What is your motivation for this change?

Once again, I have hit the issue of the null terminator when working with String (I was creating the String from an empty list instead of a list with one zero element).
I cannot count the number of bugs we have had so far because of the null terminator.

The ones I can gather so far:

and I'm sure there are more.

Some might say "skill issue", then it seems we're all "unskilled" 😆

I don't have enough knowledge to say if it's absolutely necessary or not. But I'm sure that IF it is necessary to have the null terminator in string, we can make it a bit more fool-proof. I believe it also closely relates with the goals of Mojo about memory safety as the null terminator caused many memory bugs so far. Can we add guardrails here?

The null terminator has also an impact on permformance. As we may need to resort to clever tricks when implementing SSO to squeeze one more byte out of the inline buffer

This adds some instructions to every access to a String size, and String creation, and it also takes one more byte for every String created (when we have SSO, that's one more byte to set in the inline buffer per String).

@gabrieldemarmiesse gabrieldemarmiesse added enhancement New feature or request mojo-repo Tag all issues with this label labels May 16, 2024
@gryznar
Copy link
Contributor

gryznar commented May 16, 2024

It seems to me, that representing String via pointer and size could be much more reliable and performant

@soraros
Copy link
Contributor

soraros commented May 17, 2024

We should clearly document the rationale for our choice, regardless of what the final decision may be.

@Brian-M-J
Copy link

For references to prior work, we can look at Rust and C++.

Rust treats c-style strings as a separate type: CString

C++ also recommends this in its core guidelines:

F.25: Use a zstring or a not_null<zstring> to designate a C-style string

SL.str.3: Use zstring or czstring to refer to a C-style, zero-terminated, sequence of characters

See zstring in the Guidelines Support Library for more details.

@nmsmith
Copy link
Contributor

nmsmith commented May 24, 2024

What can we do to make working with String less painful concerning the null terminator?

Whether or not String null-terminates its character buffer should be an implementation detail. If the implementation is correct, then people who are merely "working with String" (i.e. using it in their programs) should never encounter any issues.

I think it's fine for people who are implementing String to deal with whatever pain it brings. The data type is fundamentally unsafe, because you're working with a raw memory allocation. There's no way to fix that. (The same is true for the people who are implementing List.)

That said, I have no opinions on whether or not String is null-terminated.

modularbot pushed a commit that referenced this issue May 28, 2024
[External] [stdlib] Add the `normalize_index` function

This PR is some kind of mix between
#2386 and
#2384 which have issues (aborting)
or have too many conflicts because the PR is too big.

This PR solves part of #2251 and
#2337

We try here to give the ground work for indexing correctly. This
function added can then be used wherever we work with sequences.

Two things I noticed during development:
1) The `debug_assert` does not run in unit tests. Is there any way to
enable it? We currently have out-of-bounds bugs in our test suite.
2) The null terminator is causing pain, again, again, and again. Do we
have any plans to make it optional when working with String? I opened
#2678 to discuss this.

To avoid to fix those issues in this PR, I used the `normalize_index` on
the `__refitem__` of `InlineArray` which doesn't have widespread use yet
and isn't impacted by out-of-bounds bugs.

My recommendation would be to merge this PR then to rebase
#2386 and
#2384 on it. We should also
afterwards fix the out of bounds bugs that can be triggered in the test
suite by enabling debug_assert.

The diff might seem big, but no worries, it's mostly the licenses and
docstrings :)

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2677
MODULAR_ORIG_COMMIT_REV_ID: 66e7121a6a333c16284eb33a89eb85c034c296c3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mojo-repo Tag all issues with this label
Projects
None yet
Development

No branches or pull requests

5 participants