Skip to content

Conversation

dcousens
Copy link
Contributor

In this pull request, instead of the currently present special casing in std.conv.to!string for char*, std.string.fromStringz is provided as an alternative solution.

This more readily compliments the existing std.string.toStringz function, and allows to!string to be consistent with semantics similar to formattedWrite/writeln etc.

While std.conv.to!string(const(char*) s) has not been deprecated, it is open to discussion as to what path would be the best to take if this request is accepted.

+/

inout(char)[] fromStringz(inout(char)* cString) @system pure {
return cString ? cString[0 .. strlen(cString)] : null;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no point introducing a function that is unsafe by default. conv.to deliberately allocates when passed a char*. If you just want to "peek" at the string, then a more appropriate name is peekStringz. But unsafe functions like these should not be added to Phobos.

As for a safe version of fromStringz, what should instead be added is a generic fromUTFz which would mirror toUTFz. But you'll have to pint @jmdavis for more advice.

Edit: I meant ping, although giving @jmdavis a pint of beer ain't bad either! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur. This function is trivial, and it's doing something that arguably should not be encouraged. to!string does the safe thing by allocating a new string, which is what we want. If you really want to do something unsafe like this, you should know what you're doing, and it's so trivial to just do it on your own, that this function doesn't buy you much, and it comes with the distinct downside of encouraging people to do the unsafe thing rather than the safe thing - potentially without properly understanding the repercussions of doing so.

So, I vote for closing this pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this pull request is for std.conv.to!string to be consistent, and for the appropriate partner for toStringz to be module local.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this pull request is for std.conv.to!string to be consistent,

What's inconsistent about to!string?

and for the appropriate partner for toStringz to be module local.

That would be needless code duplication. to!string already does the conversion in the other direction, and toStringz is rather particular in the type of conversion it's doing, which is why it's its own thing (also, I believe that toStringz predates std.conv.to by quite a bit). I can understand wanting the functions for converting to and from inout(char)* to be in the same module, but std.conv.to is the general function for conversions in Phobos, so it doesn't live in std.string, which only does stuff specific to strings, whereas toStringz is string-specific. Also, the conversion that toStringz is doing needs to be clear where in the code where it's used due to its unsafe nature, whereas converting in the other direction does not require that (unless you're doing what the implementation here is doing, which is usually not what you want to do and shouldn't be encouraged).

So, I think that you're just going to have to live with the fact that there is no opposite to toStringz in std.string. We try and organize Phobos so that it's straightforward and understandable where everything is, but we're not perfect about it, and there are always grey areas and areas where intelligent people can and will disagree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether it's "safe" or not depends on the caller, just like any string function in D that returns a slice of any of its arguments, which is something like half of the functions in std.string. The function will always be an unsafe operation because it's the responsibility of the caller to make sure the C string passed is null-terminated (which is the inherent problem with the shortcut in to!string). The function clearly documents that it returns a slice of the argument, so it's perfectly consistent with the rest of std.string.

fromStringz retains the mutability of its argument so it's not like the caller ends up with the convention-ridden string unless the rarely seen immutable(char)* was passed, in which case not copying the result is likely perfectly correct.

The to!string shortcut is a terrible design decision and it should instead be consistent with formattedWrite. The fact that it unconditionally makes a copy while other parameter types for to!string may intentionally avoid a copy based on mutability is also a bad design decision. Having the broken, deceptive to as the only alternative in Phobos for doing this common operation is what makes fromStringz so desirable, regardless of whether to!string is fixed or not.

@JakobOvrum
Copy link
Contributor

What's inconsistent about to!string?

It treats pointers to characters as C strings instead of just being general pointers, even going to the length of doing a memory-unsafe operation in the process.

edit:

Which, if it wasn't clear, is inconsistent with the formatting functions which sensibly don't assume anything is a C string.

(unless you're doing what the implementation here is doing, which is usually not what you want to do and shouldn't be encouraged).

I bind and wrap a lot of C libraries and I rarely want to!strings "dumb" behaviour on this.

@jmdavis
Copy link
Member

jmdavis commented Sep 29, 2013

It treats pointers to characters as C strings instead of just being general pointers, even going to the length of doing a memory-unsafe operation in the process.

How is that inconsistent? It's not like there's any other conversion that it could do when converting a char* to a string, not unless you were looking to slice it, which would be completely inconsistent with how std.conv.to functions in all other cases.

I bind and wrap a lot of C libraries and I rarely want to!strings "dumb" behaviour on this.

Well, I suppose it depends on what you're doing with the strings, but doing what fromStringz here is doing forces you to be very careful about what you do with the resulting string, since it risks becoming invalid if it's kept around in the D code at all. Regardless, what fromStringz here is doing is so trivial to do on your own, that I don' t think that it's worth adding. All it does is save you a few characters, and anyone who wants this knows what to do.

@JakobOvrum
Copy link
Contributor

How is that inconsistent? It's not like there's any other conversion that it could do when converting a char* to a string, not unless you were looking to slice it, which would be completely inconsistent with how std.conv.to functions in all other cases.

It should treat it as any other pointer T*. From to!strings perspective there is nothing to suggest that the C string convention is being applied. std.format makes the right decision here and so should std.conv.

Well, I suppose it depends on what you're doing with the strings, but doing what fromStringz here is doing forces you to be very careful about what you do with the resulting string, since it risks becoming invalid if it's kept around in the D code at all.

That depends entirely on where the C string came from, just as with other std.string functions that can return aliases where it depends on where the slices came from. (Intuitively,) any concern the caller has for the input C string transfers directly to the output slice.

When the C string comes from say, the return value of a C function in a C library, the documentation for the C function (or somewhere else in the documentation for the library) will describe the lifetime/ownership/memory of the C string, and just like in C, intelligent decisions regarding copying should be based on that. Unconditionally making a GC copy of the string is not an intelligent decision, it is a nuisance.

Regardless, what fromStringz here is doing is so trivial to do on your own, that I don' t think that it's worth adding. All it does is save you a few characters, and anyone who wants this knows what to do.

... seriously? It's unnecessary boilerplate that depends on the C standard library function strlen, and even in the most trivial cases the boilerplate is quite significant:

// return value is documented to be read-only and with global lifetime
private extern(C) immutable(char)* get_foo() pure;

string getFoo() @trusted pure
{
    import core.stdc.string : strlen;
    auto p = get_foo();
    return p[0 .. strlen(p)];
}

// Vs.

string getFoo() @trusted pure
{
    import std.string : fromStringz; // Although std.string is a more likely pre-existing dependency than core.stdc.string
    return fromStringz(get_foo());
}

The second version avoids introducing the variable p which is helpful for readability of longer functions, and conveys the intent much more clearly than the raw boilerplate.

When people ask how to convert a C string to a D slice we should point them to fromStringz, not explain them how to implement it themselves. The name and location of the function can also be intuitively deduced by users familiar with toStringz.

@JakobOvrum
Copy link
Contributor

Making a copy in fromStringz also prevents users from leveraging fromStringz when they wanted to use D functions or slice vector operations etc. to mutate the C string in place.

@jmdavis
Copy link
Member

jmdavis commented Sep 29, 2013

@JakobOvrum I think that we're going to have to agree to disagree on this one. I think that it's almost always better to use to!string and ensure that the memory is managed by the GC rather than relying on C functions not freeing the memory on you. Sure, there are cases where it's better to slice the pointer, but all that requires is slicing and calling strlen, which I think is trivial. And as I think that to!string is almost always the better way to go, I really don't like the idea of encouraging people to slice char* by introducing fromeStringz. But from the sounds of it, it's highly unlikely that we're going to agree on this.

@JakobOvrum
Copy link
Contributor

@JakobOvrum I think that we're going to have to agree to disagree on this one.

I haven't seen a counter-argument to any of the points I raised.

I think that it's almost always better to use to!string and ensure that the memory is managed by the GC rather than relying on C functions not freeing the memory on you.

That's not how memory management in C works; you act based on the documentation, not implicitly relying on any particular behaviour. When GC management is appropriate for the caller, fromStringz(zstr).idup is a trivial operation that is even conventionally enforced by the type system as zstr is usually typed const.

Sure, there are cases where it's better to slice the pointer, but all that requires is slicing and calling strlen, which I think is trivial.

I just demonstrated the disadvantages to this. If you think some of my particular points are invalid, then I invite you to address them directly.

But from the sounds of it, it's highly unlikely that we're going to agree on this.

I'm perfectly open to be convinced, and you should too.

@jmdavis
Copy link
Member

jmdavis commented Sep 29, 2013

I just demonstrated the disadvantages to this. If you think some of my particular points are invalid, then I invite you to address them directly.

The only disadvantage I see is the possibility of an unnecessary allocation. If you really know that the allocation is unnecessary, than you can slice the char*, but using to!string guarantees that you don't have to worry about the lifetime of the char*, and it makes it so that it plays nicely with all of the D code which takes string explicitly. I would consider slicing the char* to be an optimization that should only be done when necessary.

@JakobOvrum
Copy link
Contributor

fromStringz(zstr).idup is the equivalent of to!string(zstr) for a non-immutable C string. It makes the allocation explicit, and the responsibility of the caller which is how the string vs const(char)[] issue is conventionally handled in D. Not adding the idup is a compile-time error. There's no advantage to to!string's behaviour here, and it also has other issues, such as abstracting away an inherently @system operation in a function template that is usually memory safe, all the while being inconsistent with std.format which rightfully treats pointers to characters as any other pointer, which is memory safe.

Whether fixing to!string is an option or not, having only the deceptively convenient, copy-happy to!string and the tedious option of manual slicing, users are lured into making the "dumb" decision. With fromStringz, the user is empowered to make the right decision without uglifying the code with boilerplate.

@jmdavis
Copy link
Member

jmdavis commented Sep 29, 2013

Whether fixing to!string is an option or not, having only the deceptively convenient, copy-happy to!string and the tedious option of manual slicing, users are lured into making the "dumb" decision. With fromStringz, the user is empowered to make the right decision without uglifying the code with boilerplate.

I disagree that it's the dumb decision. I think that it most cases, it's very much the right decision, because it's safer. And that seems to be the core of our disagreement.

However, even if we agreed that to!string's behavior was a bad idea, I think that it's pretty clear that at this point, it would break way too much code to change it.

@JakobOvrum
Copy link
Contributor

I disagree that it's the dumb decision. I think that it most cases, it's very much the right decision, because it's safer. And that seems to be the core of our disagreement.

I don't think safety is a matter for comparatives like "safer". Either it's safe or it's unsafe. When escaping the slice is desired, by convention, it will in almost every case require the type string, i.e. immutable elements. As such there's no way to make a mistake with fromStringz that results in unsafe code because of reasons explained in my previous comment (except when the input is not null-terminated, but that only pertains the memory safety of the implementation itself, which is not the issue at hand).

In my experience with working with C libraries in D, it's rarely the correct decision. It's only the correct decision when the memory backing the C string is transient and the caller wants to escape a slice to it. The type system saves us from making a mistake here.

When I say almost every case, I'm referring to the cases when GC memory is not involved, where escaping strings of const or mutable element types is sometimes legitimate, when accompanied with the relevant documentation, exactly like in C. However, I don't think those cases are relevant.

When I say "dumb" I don't mean it's always wrong, I mean that it's an unnecessary, damaging simplification, in this case exacerbated by the fact that it has no actual advantage.

However, even if we agreed that to!string's behavior was a bad idea, I think that it's pretty clear that at this point, it would break way too much code to change it.

I agree and I think that's an orthogonal issue, my only point pertaining this was that if the current behaviour of to!string on C strings can be seen as undesirable, then that's another point in favour of introducing fromStringz.

@jmdavis
Copy link
Member

jmdavis commented Sep 29, 2013

@JakobOvrum It's a good point about the type system helping us out here, since as long as we don't anything like cast to string, we're not going to end up with string, and if we don't end up with string, it's unlikely that someone is going to keep the resultant string around and end up with a freed string. That being the case, fromStringz isn't such a bad idea, so I'm not quite as against it. However, I still think that unless you're going to just briefly do something with the string and throw it away, converting it to string by allocating a new one like to!string does is the correct thing to do.

@ghost
Copy link

ghost commented Sep 29, 2013

If fromStringz is going to mirror toStringz, it has to allocate. You cannot have such an inconsistency with two functions that complement each other (from|to-Stringz).

Secondly, changing what to!S(char*) does is too late. We cannot even introduce a deprecation stage for this call, the best we could do is add a big red warning in the ddoc about to changing at some point. But people who already know what to does do not re-read its documentation. And there's an insane amount of code that would suddenly do the wrong thing at runtime due to this semantics change.

So if to will never change, then fromStringz is not needed. At best it should be an alias to to!string. But it's unnecessary, there's really no need to have multiple aliases that do the same thing as it would only serve to confuse the reader.

As for toStringz, it was introduced before toUTFz and was even suggested to be removed (or was that toUTF16z?), but it would have broken a lot of code so it was decided to keep these aliases around (there was a large debate in the forums). Changing what to does would do even worse by breaking what the code does rather than issuing a compiler error.

@JakobOvrum
Copy link
Contributor

If fromStringz is going to mirror toStringz, it has to allocate. You cannot have such an inconsistency with two functions that complement each other (from|to-Stringz).

I don't see the inconsistency. toStringz is an additive operation so it may need a reallocation (but not always), while fromStringz is strictly an in-place transformation.

Secondly, changing what to!S(char*) does is too late. We cannot even introduce a deprecation stage for this call, the best we could do is add a big red warning in the ddoc about to changing at some point.

Deprecations have been known to happen through pragma(msg, ...). However, it's orthogonal to this PR.

So if to will never change, then fromStringz is not needed.

The hypothetical fromStringz that unconditionally allocates, sure; but that was never proposed in a PR.

@JakobOvrum
Copy link
Contributor

If fromStringz is going to mirror toStringz, it has to allocate. You cannot have such an inconsistency with two functions that complement each other (from|to-Stringz).

Late, but another point is that toStringz tries rather desperately to not allocate.

@ghost
Copy link

ghost commented Oct 20, 2013

Late, but another point is that toStringz tries rather desperately to not allocate.

I think many of us have agreed that doing those if ((cast(size_t) p & 3) && *p == 0) checks is unreliable.

@JakobOvrum
Copy link
Contributor

I think many of us have agreed that doing those if ((cast(size_t) p & 3) && *p == 0) checks is unreliable.

Yes, I'm one of them. However, the intention is the important bit; we want to avoid allocations when possible. With toStringz, that's generally not possible. With fromStringz, that's always possible; and as I've tried to illustrate in this discussion, pushing the decision to copy up to the caller is both the more general/composable interface and the most idiomatic.

@ghost
Copy link

ghost commented Oct 20, 2013

I just want to separate out these into two functions. I typically use peekStringz for this purpose (implemented in my own code), so it's obvious that I'm not the owner of the string. It adds a bit of documentation at the call site, kind of like assumeUnique does over a simple cast.

@JakobOvrum
Copy link
Contributor

I just want to separate out these into two functions.

Do you mean you'd want a fromStringz that unconditionally copies? It would be a really frivolous wrapper for peekStringz(zstr).idup, and the explicit idup is arguably much more idiomatic D, with the convention of pushing array copies upwards in code.

I typically use peekStringz for this purpose (implemented in my own code), so it's obvious that I'm not the owner of the string. It adds a bit of documentation at the call site, kind of like assumeUnique does over a simple cast.

I think peekStringz is a misleading name; when the input is immutable(char)*, the result is string, which would then be escapable without error. assumeUnique is a poor comparison because fromStringz doesn't cast or otherwise raise a red flag, the escapability of the result is neatly encoded in the type for the vast majority of code (sometimes one may wish to escape char[] or const(char)[], but then one must be aware of the volatile source of the string in the first place to do the right thing).

@JakobOvrum
Copy link
Contributor

Ping.

This just came up on IRC again, where it became apparent that the manual slicing code has another disadvantage, in that it's easy to end up with code like:

c_function(args)[0 .. strlen(c_function(args))];

Which is needlessly inefficient.

@minexew
Copy link

minexew commented Jan 25, 2014

How about using a different name for the function, then? Surely if it was called fromStringzUnsafe most people wouldn't just slap it into their code without reading the documentation. Or we could have a pair of functions fromStringzRef & fromStringzDup or something like that, i don't know...

@JakobOvrum
Copy link
Contributor

As I've repeatedly argued in this PR, it's no less safe than to!string. The only inherent unsafety of the function is the C-string assumption made by strlen, which is stealthily inherent in to!string too. With fromStringz, it's reflected in both the name and documentation of the function.

edit:

A third level of explicitness is added by the explicit @system attribute that can be seen in its signature, which is necessarily absent from to.

@minexew
Copy link

minexew commented Jan 25, 2014

While there is nothing unsafe about what the function does, I'm assuming most people would expect a conversion function returning D string to return a value that will stay valid (and immutable) no matter what happens to the source data. Am I misinterpreting something?

@JakobOvrum
Copy link
Contributor

Yes, fromStringz does not return string.

@CyberShadow
Copy link
Member

If fromStringz is going to mirror toStringz, it has to allocate.

I disagree completely!

If the returned type has the same constness as the given pointer, the type system will force the function user to make the appropriate decision:

  • If the string pointed to volatile memory, the mutable/const attribute indicates that. The situation is no different than working with the result of byLine.
  • If the user wants to convert the result to a string, they are now forced to choose a way to do that by the type system:
  • If the user knows that the string is unique or immutable, and there are no lifetime/ownership issues, they can cast the result to a string. Overriding the type system by casting puts the burden on the user.
  • In all other cases, the user is only one .idup away from getting a unique immutable copy of the string, all without introducing multiple functions or whatnot.

So, this pull looks great to me just the way it is.

As for the to!T mess, how about deprecating it, and making the deprecation message tell the user to disambiguate by either using format("%s", p) or fromStringz(p) instead? format interprets char pointers as any other pointers.

@JakobOvrum
Copy link
Contributor

If the user knows that the string is unique or immutable, and there are no lifetime/ownership issues, they can cast the result to a string. Overriding the type system by casting puts the burden on the user.

I'd like to note that often the best place to introduce immutable is in the C binding, e.g. making a function return immutable(char)* instead of const(char)* based on the documentation of the function. Then there's not even a need for casting.

@minexew
Copy link

minexew commented Feb 1, 2014

When is this getting pulled in, then? I don't see any remaining arguments against.
Also note that there is currently an inconsistency between to!string(char*) and to!wstring(wchar*). While the former assumes NUL-terminated string, the latter just prints the pointer. What a mess! :)

@ghost
Copy link

ghost commented Feb 1, 2014

Since we have toUTFz it would make sense to also have fromUTFz, where fromStringz would just call or be an alias of a fromUTFz instance.

@ghost
Copy link

ghost commented Feb 1, 2014

I think I must have missed the inout part when I first reviewed that code.

@JakobOvrum
Copy link
Contributor

Since we have toUTFz it would make sense to also have fromUTFz, where fromStringz would just call or be an alias of a fromUTFz instance.

Yes, it would be nice if we could unify to/fromStringz and to/fromUTFz into something consistent.

@ghost
Copy link

ghost commented Feb 1, 2014

Well, changing to will definitely break a lot of code. I'd say it would ultimately be for the better.. but I hate code breakage. :|

Side-note: We keep daydreaming about the compiler automatically fixing code between releases, e.g. automatic to!string(char_) => fromStringz(char_). I think we should actually start experimenting with this, to make transitions easy and deprecations faster.

@JakobOvrum
Copy link
Contributor

Well, changing to will definitely break a lot of code. I'd say it would ultimately be for the better.. but I hate code breakage. :|

It would have to go through a lengthy deprecation process. As pointed out months ago, it's not true that deprecation is impossible here.

@ghost
Copy link

ghost commented Feb 1, 2014

Lengthy or not it still doesn't fix your code automagically. I want to delegate the work from the user to the compiler. It's unrelated to this pull, of course.

Btw, I'm not against this pull anymore.

@yebblies
Copy link
Contributor

yebblies commented Feb 1, 2014

We keep daydreaming about the compiler automatically fixing code between releases, e.g. automatic to!string(char_) => fromStringz(char_). I think we should actually start experimenting with this, to make transitions easy and deprecations faster.

Needs code rewriting and/or compiler plugins, which needs compiler-as-a-library, which needs ddmd?

@ghost
Copy link

ghost commented Feb 1, 2014

Needs code rewriting and/or compiler plugins, which needs compiler-as-a-library, which needs ddmd?

It all depends on how soon DDMD is a reality. :)

@yebblies
Copy link
Contributor

yebblies commented Feb 1, 2014

It all depends on how soon DDMD is a reality. :)

The technical problems have all been solved, the only thing left is fixing the code layout, and then testing. I've done about 5/25 of the visitor refactorings needed, and the bottleneck is review and merge speed! (hint hint)

Returns a $(D string) slice of a C-style null-terminated string.

$(RED Important Note:) The returned $(D string) is a slice of the original buffer.
+/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a string if the pointer argument is not immutable, is it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just update the docs here and this will be ready to merge methinks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @dcousens on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcousens ping on the docs. Other than that this is fine. Maybe a more explanatory name, e.g. stringzAsString?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the docs aren't changed within the day, I'll merge and make a fixup pull.

As for stringzAsString, we already have toStringz so it makes sense to call this one fromStringz, it's the first thing users are going to look for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, I had not caught up on this thread in a small while.

@dcousens
Copy link
Contributor Author

Updated documentation to reflect that it is not a string returned but a char[].
Assuming this was the desired change, this should be ready to go @AndrejMitrovic.

/++
Returns a $(D char[]) slice of a C-style null-terminated string.

$(RED Important Note:) The returned $(D char[]) is a slice of the original buffer.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just name it array, not char[] since it can be any of these: immutable(char)[] / const(char)[] / char[].

@andralex
Copy link
Member

@blackwhale oic. At any rate, the need for proof can stop safely at enregistering.

@ghost
Copy link

ghost commented Mar 15, 2014

@andralex: wrong pull! :)

@@ -136,6 +136,22 @@ unittest
});
}

/++
Returns a $(D array) slice of a C-style null-terminated string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns a D-style array of $(D char) (possibly qualified with $(D immutable) or $(D const) depending on the source) given a zero-terminated C-style string. The underlying data is the same. The original string is not changed and not copied.

@andralex
Copy link
Member

Let's move forward with this after correx to the dox.

@andralex
Copy link
Member

Oh, the question remains whether this belongs in std.conv or std.string.

@JakobOvrum
Copy link
Contributor

Finally! :)

edit:

Oh, the question remains whether this belongs in std.conv or
std.string.

toStringz is in std.string. If we choose std.conv, we would have to move toStringz there as well. At that point, I think it's best we just stick with std.string for the time being.

andralex added a commit that referenced this pull request Mar 16, 2014
@andralex andralex merged commit 32c72bb into dlang:master Mar 16, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants