-
-
Notifications
You must be signed in to change notification settings - Fork 741
Adds fromStringz #1607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds fromStringz #1607
Conversation
+/ | ||
|
||
inout(char)[] fromStringz(inout(char)* cString) @system pure { | ||
return cString ? cString[0 .. strlen(cString)] : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no point introducing a function that is unsafe by default. conv.to
deliberately allocates when passed a char*
. If you just want to "peek" at the string, then a more appropriate name is peekStringz
. But unsafe functions like these should not be added to Phobos.
As for a safe version of fromStringz
, what should instead be added is a generic fromUTFz
which would mirror toUTFz
. But you'll have to pint @jmdavis for more advice.
Edit: I meant ping, although giving @jmdavis a pint of beer ain't bad either! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur. This function is trivial, and it's doing something that arguably should not be encouraged. to!string
does the safe thing by allocating a new string, which is what we want. If you really want to do something unsafe like this, you should know what you're doing, and it's so trivial to just do it on your own, that this function doesn't buy you much, and it comes with the distinct downside of encouraging people to do the unsafe thing rather than the safe thing - potentially without properly understanding the repercussions of doing so.
So, I vote for closing this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this pull request is for std.conv.to!string
to be consistent, and for the appropriate partner for toStringz
to be module local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this pull request is for
std.conv.to!string
to be consistent,
What's inconsistent about to!string
?
and for the appropriate partner for
toStringz
to be module local.
That would be needless code duplication. to!string
already does the conversion in the other direction, and toStringz
is rather particular in the type of conversion it's doing, which is why it's its own thing (also, I believe that toStringz
predates std.conv.to
by quite a bit). I can understand wanting the functions for converting to and from inout(char)*
to be in the same module, but std.conv.to
is the general function for conversions in Phobos, so it doesn't live in std.string, which only does stuff specific to strings, whereas toStringz
is string-specific. Also, the conversion that toStringz
is doing needs to be clear where in the code where it's used due to its unsafe nature, whereas converting in the other direction does not require that (unless you're doing what the implementation here is doing, which is usually not what you want to do and shouldn't be encouraged).
So, I think that you're just going to have to live with the fact that there is no opposite to toStringz
in std.string. We try and organize Phobos so that it's straightforward and understandable where everything is, but we're not perfect about it, and there are always grey areas and areas where intelligent people can and will disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether it's "safe" or not depends on the caller, just like any string function in D that returns a slice of any of its arguments, which is something like half of the functions in std.string
. The function will always be an unsafe operation because it's the responsibility of the caller to make sure the C string passed is null-terminated (which is the inherent problem with the shortcut in to!string
). The function clearly documents that it returns a slice of the argument, so it's perfectly consistent with the rest of std.string
.
fromStringz
retains the mutability of its argument so it's not like the caller ends up with the convention-ridden string
unless the rarely seen immutable(char)*
was passed, in which case not copying the result is likely perfectly correct.
The to!string
shortcut is a terrible design decision and it should instead be consistent with formattedWrite
. The fact that it unconditionally makes a copy while other parameter types for to!string
may intentionally avoid a copy based on mutability is also a bad design decision. Having the broken, deceptive to
as the only alternative in Phobos for doing this common operation is what makes fromStringz
so desirable, regardless of whether to!string
is fixed or not.
It treats pointers to characters as C strings instead of just being general pointers, even going to the length of doing a memory-unsafe operation in the process. edit: Which, if it wasn't clear, is inconsistent with the formatting functions which sensibly don't assume anything is a C string.
I bind and wrap a lot of C libraries and I rarely want |
How is that inconsistent? It's not like there's any other conversion that it could do when converting a char* to a string, not unless you were looking to slice it, which would be completely inconsistent with how
Well, I suppose it depends on what you're doing with the strings, but doing what |
It should treat it as any other pointer T*. From
That depends entirely on where the C string came from, just as with other When the C string comes from say, the return value of a C function in a C library, the documentation for the C function (or somewhere else in the documentation for the library) will describe the lifetime/ownership/memory of the C string, and just like in C, intelligent decisions regarding copying should be based on that. Unconditionally making a GC copy of the string is not an intelligent decision, it is a nuisance.
... seriously? It's unnecessary boilerplate that depends on the C standard library function // return value is documented to be read-only and with global lifetime
private extern(C) immutable(char)* get_foo() pure;
string getFoo() @trusted pure
{
import core.stdc.string : strlen;
auto p = get_foo();
return p[0 .. strlen(p)];
}
// Vs.
string getFoo() @trusted pure
{
import std.string : fromStringz; // Although std.string is a more likely pre-existing dependency than core.stdc.string
return fromStringz(get_foo());
} The second version avoids introducing the variable When people ask how to convert a C string to a D slice we should point them to |
Making a copy in |
@JakobOvrum I think that we're going to have to agree to disagree on this one. I think that it's almost always better to use |
I haven't seen a counter-argument to any of the points I raised.
That's not how memory management in C works; you act based on the documentation, not implicitly relying on any particular behaviour. When GC management is appropriate for the caller,
I just demonstrated the disadvantages to this. If you think some of my particular points are invalid, then I invite you to address them directly.
I'm perfectly open to be convinced, and you should too. |
The only disadvantage I see is the possibility of an unnecessary allocation. If you really know that the allocation is unnecessary, than you can slice the |
Whether fixing |
I disagree that it's the dumb decision. I think that it most cases, it's very much the right decision, because it's safer. And that seems to be the core of our disagreement. However, even if we agreed that |
I don't think safety is a matter for comparatives like "safer". Either it's safe or it's unsafe. When escaping the slice is desired, by convention, it will in almost every case require the type In my experience with working with C libraries in D, it's rarely the correct decision. It's only the correct decision when the memory backing the C string is transient and the caller wants to escape a slice to it. The type system saves us from making a mistake here. When I say almost every case, I'm referring to the cases when GC memory is not involved, where escaping strings of const or mutable element types is sometimes legitimate, when accompanied with the relevant documentation, exactly like in C. However, I don't think those cases are relevant. When I say "dumb" I don't mean it's always wrong, I mean that it's an unnecessary, damaging simplification, in this case exacerbated by the fact that it has no actual advantage.
I agree and I think that's an orthogonal issue, my only point pertaining this was that if the current behaviour of |
@JakobOvrum It's a good point about the type system helping us out here, since as long as we don't anything like cast to |
If Secondly, changing what So if As for |
I don't see the inconsistency.
Deprecations have been known to happen through
The hypothetical |
Late, but another point is that |
I think many of us have agreed that doing those |
Yes, I'm one of them. However, the intention is the important bit; we want to avoid allocations when possible. With |
I just want to separate out these into two functions. I typically use |
Do you mean you'd want a
I think |
Ping. This just came up on IRC again, where it became apparent that the manual slicing code has another disadvantage, in that it's easy to end up with code like: c_function(args)[0 .. strlen(c_function(args))]; Which is needlessly inefficient. |
How about using a different name for the function, then? Surely if it was called |
As I've repeatedly argued in this PR, it's no less safe than edit: A third level of explicitness is added by the explicit |
While there is nothing unsafe about what the function does, I'm assuming most people would expect a conversion function returning D |
Yes, |
I disagree completely! If the returned type has the same constness as the given pointer, the type system will force the function user to make the appropriate decision:
So, this pull looks great to me just the way it is. As for the |
I'd like to note that often the best place to introduce immutable is in the C binding, e.g. making a function return |
When is this getting pulled in, then? I don't see any remaining arguments against. |
Since we have |
I think I must have missed the |
Yes, it would be nice if we could unify |
Well, changing Side-note: We keep daydreaming about the compiler automatically fixing code between releases, e.g. automatic to!string(char_) => fromStringz(char_). I think we should actually start experimenting with this, to make transitions easy and deprecations faster. |
It would have to go through a lengthy deprecation process. As pointed out months ago, it's not true that deprecation is impossible here. |
Lengthy or not it still doesn't fix your code automagically. I want to delegate the work from the user to the compiler. It's unrelated to this pull, of course. Btw, I'm not against this pull anymore. |
Needs code rewriting and/or compiler plugins, which needs compiler-as-a-library, which needs ddmd? |
It all depends on how soon DDMD is a reality. :) |
Returns a $(D string) slice of a C-style null-terminated string. | ||
|
||
$(RED Important Note:) The returned $(D string) is a slice of the original buffer. | ||
+/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a string
if the pointer argument is not immutable, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah just update the docs here and this will be ready to merge methinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @dcousens on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcousens ping on the docs. Other than that this is fine. Maybe a more explanatory name, e.g. stringzAsString
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the docs aren't changed within the day, I'll merge and make a fixup pull.
As for stringzAsString
, we already have toStringz
so it makes sense to call this one fromStringz
, it's the first thing users are going to look for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay, I had not caught up on this thread in a small while.
Updated documentation to reflect that it is not a |
/++ | ||
Returns a $(D char[]) slice of a C-style null-terminated string. | ||
|
||
$(RED Important Note:) The returned $(D char[]) is a slice of the original buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just name it array
, not char[]
since it can be any of these: immutable(char)[]
/ const(char)[]
/ char[]
.
@blackwhale oic. At any rate, the need for proof can stop safely at enregistering. |
@andralex: wrong pull! :) |
@@ -136,6 +136,22 @@ unittest | |||
}); | |||
} | |||
|
|||
/++ | |||
Returns a $(D array) slice of a C-style null-terminated string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns a D-style array of
Let's move forward with this after correx to the dox. |
Oh, the question remains whether this belongs in |
Finally! :) edit:
|
In this pull request, instead of the currently present special casing in
std.conv.to!string
forchar*
,std.string.fromStringz
is provided as an alternative solution.This more readily compliments the existing
std.string.toStringz
function, and allowsto!string
to be consistent with semantics similar toformattedWrite
/writeln
etc.While
std.conv.to!string(const(char*) s)
has not been deprecated, it is open to discussion as to what path would be the best to take if this request is accepted.