Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.conv.to should allow conversion between any pair of string/wstring/dstring/char*/wchar*/dchar* #9929

Open
dlangBugzillaToGithub opened this issue Jul 13, 2012 · 25 comments

Comments

@dlangBugzillaToGithub
Copy link

dlang-bugzilla (@CyberShadow) reported this on 2012-07-13T05:23:29Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=8384

CC List

Description

import std.conv;
import std.string;

unittest
{
	static void test(T)(T lp)
	{
		assert(format("%s", lp) == "Hello, world!");
		assert(to!string(lp)    == "Hello, world!");
	}

	test("Hello, world!" .ptr);
	test("Hello, world!"w.ptr);
	test("Hello, world!"d.ptr);
}

wchar* conversion is commonly needed for Windows programming, as UTF-16 is the native encoding for Unicode Windows API functions.
@dlangBugzillaToGithub
Copy link
Author

issues.dlang (@jmdavis) commented on 2012-07-13T12:00:53Z

So, you expect %s on a pointer to give you the string that it points to? Why? It's pointer, not a string. It's going to convert the pointer. That works as expected.

to!string should take null-terminated string and give you a string, and it does that. This code passes:

import std.conv;
import std.string;

void main()
{
    static void test(T)(T lp)
    {
        assert(to!string(lp), "hello world");
    }

    test("Hello, world!" .ptr);
    test("Hello, world!"w.ptr);
    test("Hello, world!"d.ptr);
}

So, I'd say that as far as your code goes, there's nothing wrong with it. It functions exactly as expected. What _doesn't_ work is this:

import std.conv;
import std.string;

void main()
{
    static void test(T)(T lp)
    {
        assert(to!wstring(lp), "hello world");
        assert(to!dstring(lp), "hello world");
    }

    test("Hello, world!" .ptr);
    test("Hello, world!"w.ptr);
    test("Hello, world!"d.ptr);
}

The code doesn't even compile, giving these errors:

/home/jmdavis/dmd2/linux/bin/../../src/phobos/std/conv.d(819): Error: incompatible types for ((cast(immutable(dchar)[])_adDupT(&_D12TypeInfo_Aya6__initZ,value[cast(ulong)0..strlen(cast(const(char*))value)])) ? (null)): 'immutable(dchar)[]' and 'string'
/home/jmdavis/dmd2/linux/bin/../../src/phobos/std/conv.d(268): Error: template instance std.conv.toImpl!(immutable(dchar)[],immutable(char)*) error instantiating
q.d(8):        instantiated from here: to!(immutable(char)*)
q.d(11):        instantiated from here: test!(immutable(char)*)
q.d(8): Error: template instance std.conv.to!(immutable(dchar)[]).to!(immutable(char)*) error instantiating
q.d(11):        instantiated from here: test!(immutable(char)*)
q.d(11): Error: template instance q.main.test!(immutable(char)*) error instantiating

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2012-07-13T13:36:05Z

> to!string should take null-terminated string and give you a string, and it does
> that. This code passes:

Is it something that was fixed recently (within the last two weeks)? My two-week-old dmd git build and dpaste still print offsets for wchar* and dchar*: http://dpaste.dzfl.pl/26a2b284

> So, you expect %s on a pointer to give you the string that it points to? Why?

I think that, before all else, we should be looking for good reasons why format("%s", foo) and to!string(foo) produce different results. Why should one format the offset and the other do a conversion?

Second, I believe that the principle of least surprise is making this case rather clear: if the programmer tries to print a char*, it's almost certain that they want to print the null-terminated string at the given address, rather than a hexadecimal representation of the address (which are rarely useful to the end-user). Generic code is the only exception I can think of, in which case a cast to void* is in order.

> What _doesn't_ work is this:

I think this should call the appropriate toUTFx functions from std.utf.

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2012-07-13T13:42:17Z

> I think this should call the appropriate toUTFx functions from std.utf.

Sorry about that, misread your example. I guess, ideally, conversion between any pair of {|w|d}{char*|string} should work.

@dlangBugzillaToGithub
Copy link
Author

issues.dlang (@jmdavis) commented on 2012-07-13T13:59:09Z

format and writeln are supposed to behave the same, because they both operate on format strings (they _don't_ currently behave 100% the same, but format's current implementation will be replaced with the new xformat's implementation in a few months - after the "scheduled for deprecation" time period). to!string is an entirely different beast.

std.conv.to is asking for an explicit conversion to string, whereas format and writeln are converting according to the format specifiers, and %s indicates the default string representation of the type. char*, wchar*, and dchar* are pointers - _not_ strings - and should not be treated as strings. Pointers print their address with %s. Making char*, wchar*, and dchar* print themselves as strings would be inconsistent with other pointer types, and operating on char*, wchar*, and dchar* should be discouraged, not encouraged.

to!string is treated differently, because you're asking for an explicit conversion, and we _do_ need to be able to convert null-terminated strings to D strings.

So, while I can see your point, I really don't think that having format or writeln treat char*, wchar*, or dchar* as null-terminated strings is a good idea. We should provide a means of converting them to D strings but not do anything to encourage using them as-is without converting them.

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2012-07-13T14:25:36Z

OK, fair enough.

I've updated the enhancement request's title according to my previous comment.

Test:

-----------------------------------------------------------------------------

import std.conv;

void test1(T)(T lp)
{
    test2!( string)(lp);
    test2!(wstring)(lp);
    test2!(dstring)(lp);
    test2!(  char*)(lp);
    test2!( wchar*)(lp);
    test2!( dchar*)(lp);
}

void test2(D, S)(S lp)
{
    D dest = to!D(lp);
    assert(to!string(dest) == "Hello, world!");
}

unittest
{
    test1("Hello, world!" );
    test1("Hello, world!"w);
    test1("Hello, world!"d);
    test1("Hello, world!" .ptr);
    test1("Hello, world!"w.ptr);
    test1("Hello, world!"d.ptr);
}

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2012-07-13T14:31:04Z

Oh, I forgot about constness.

I guess that raises the number of combinations to (2*3*3)^2 = 324.

@dlangBugzillaToGithub
Copy link
Author

code (@MartinNowak) commented on 2012-07-13T14:37:07Z

Hooray for using "static" foreach to conveniently enumerate all the cases to test!

@dlangBugzillaToGithub
Copy link
Author

issues.dlang (@jmdavis) commented on 2012-07-13T14:48:31Z

> Hooray for using "static" foreach to conveniently enumerate all the cases to
test!

Yeah. I do that all of the time when I have to test with multiple types (especially with strings), and I always push for string-related tests to do that when I see that someone is looking to submit code to Phobos for a function that takes one or more strings as templated types, and their tests don't do that. It's just one of those things that everyone who writes much in the way of unit tests in D should learn and know about.

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2012-08-15T13:24:08Z

Another case of confusion due to format treating C strings as pointers:

http://stackoverflow.com/q/11975353/21501

I still think that the current behavior, regardless of how much it makes sense from a design/consistency/orthogonality/etc. perspective, is simply not useful and fails the principle of least surprise in most expected cases.

I strongly believe that we should either forbid passing char pointers to format/writeln (and force the user to cast to void* or convert to a D string), or print them as C null-terminated strings.

@dlangBugzillaToGithub
Copy link
Author

issues.dlang (@jmdavis) commented on 2012-08-15T13:35:28Z

char* acts identically to the other pointer types, and I fully believe that it should stay that way. We've pretty much removed all of the D features which involved either treating a string as char* or a char* as a string (including disallowing implicit conversion of string to const char*). The _only_ feature that the language has which supports that is the fact that string literals have a null character one past their end and will implicitly convert to const char*.

It would be a huge mistake IMHO to support doing _anything_ with character pointers which treats them as strings without requiring an explicit conversion of some kind. Anyone who continues to think of char* as being a string in D is just asking for trouble. They need to learn to use strings correctly.

If you really want to use char* as a string in functions like format or writeln, then simply either use to!string or ptr[0 .. strln(ptr)].

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2012-08-15T13:48:30Z

Sorry, I don't think that your categorical point of view is constructive. As long as D will interface with C libraries and programs, people will continue to attempt to use C strings together or in place of D strings, and issues like the above will continue to appear.

How often would a typical D user want to print / format the address of a character, versus the null-terminated string at that address?

> It would be a huge mistake IMHO to support doing _anything_ with character
> pointers which treats them as strings without requiring an explicit conversion
> of some kind. 

Why would it be a mistake? What exactly do we lose by allowing writeln/format to understand C strings?

> Anyone who continues to think of char* as being a string in D is
> just asking for trouble.

What kind of trouble?

> They need to learn to use strings correctly.

D printing an address when text was expected will sooner generate a "D sucks" reaction than a "Oops, I need to learn to use strings correctly" one.

> If you really want to use char* as a string in functions like format or
writeln, then simply either use to!string or ptr[0 .. strln(ptr)].

That's not really simple, considering some spots where that (verbose) modification needs to be made would be discovered only late at runtime, and even then the actual problem is not obvious to identify (as seen in the SO question above).

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2012-08-15T13:56:00Z

I would like to stress out a point that I hope could clear up my view of the logic that writeln/format should use.

Printing/formatting memory addresses is extremely rarely useful!

Except for some dirty debugging, I can't imagine a case where the user expects that passing a pointer to something to format would yield the hex representation of that address.

I believe that printing a pointer as a hex address should be the fallback, last-resort behavior, if there is no better representation for the said type. (This also allows discussion of calling toString() on struct pointers.)

For the rare case that the user intends to actually print a pointer, this is easily accomplished by a cast to size_t and using the appropriate hex format specifier.

@dlangBugzillaToGithub
Copy link
Author

issues.dlang (@jmdavis) commented on 2012-08-15T13:57:15Z

Anyone who does not understand that char* is _not_ a string will continue to make mistakes like trying to concatenate a char* to a string ( http://stackoverflow.com/questions/11914070/why-can-i-not-concatenate-a-constchar-to-a-string-in-d ) or try and pass string directly to a C function. They will constantly run into problems when dealing with strings. char* is _not_ a string and should not be treated as such. Treating it as a string with something like writeln will just help further the misconception that char* is a string and hinder people learning and using D. D programmers need to understand the difference between char* and string. char* should _not_ be treated as special, because it's not.

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2012-08-15T14:01:42Z

First of all, you are conflating ignorance between the two string types with my arguments. Users who are aware that D has its own way of handling strings are still open to making frustrating mistakes.

Second, getting unexpected output is not a good way to teach people about this. Hence my earlier proposal to make writeln/format REJECT char pointer types, on the basis that the user's intention is ambiguous (I don't think so personally, but obviously that's just my opinion).

@dlangBugzillaToGithub
Copy link
Author

issues.dlang (@jmdavis) commented on 2012-08-15T14:06:49Z

I'm saying that we shouldn't treat char* differently from int* just because some newbies expect char* to act like a string. And if you know D, then you know that char* is _not_ a string, and I don't see how you could expect it to be treated as one. Either making char* act like a string or disallowing printing it would make it act differently from other pointer types just to appease the folks who mistakingly think that char* is a string.

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2012-08-15T14:08:44Z

Well, then how about removing the pointer-printing feature entirely, and issue a compile-time error on all pointer types?

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2012-08-15T14:12:50Z

> And if you know D, then you know that char* is _not_ a string,
> and I don't see how you could expect it to be treated as one.

I don't think this argument is valid, because it assumes that all D users are always aware of the types they pass to writeln/format. In the SO case, the argument is a function result, and the function's return type is not explicitly written in the user's code.

People often expect the compiler to shout at them if they try to pass incompatible types to a function. writeln/format accept char pointers, but ultimately do something with them that in 99% of cases is simply not useful, and put the user in search of their mistake all across the data flow.

@dlangBugzillaToGithub
Copy link
Author

destructionator commented on 2012-08-15T14:34:54Z

I think rejecting might be the best option because if you treat it as a string, what if it doesn't have a 0 terminator? That could easily happen if you pass it a pointer to a D string.

I don't think that is technically un-@safe, but it could be a problem anyway to get an unexpected crash because of it. At least with to!string(char*) you might think about it for a minute and avoid the problem.


So on one hand, I think it should just work, but on the other hand the compile time error might be the most sane.

@dlangBugzillaToGithub
Copy link
Author

issues.dlang (@jmdavis) commented on 2012-08-15T14:40:14Z

> Well, then how about removing the pointer-printing feature entirely, and issue
a compile-time error on all pointer types?

So, you're suggesting that we remove a useful feature because newbies coming from C/C++ keep mistakingly thinking that char* is a string?

@dlangBugzillaToGithub
Copy link
Author

dlang-bugzilla (@CyberShadow) commented on 2012-08-15T14:44:20Z

Your formulation is misrepresenting the weight of the scales. Please seriously take into account the overall benefit for D for both decisions. The feature is nearly useless and more harmful, and "newbies coming
from C/C++" is, again, a misrepresentation as discussed above. It is also incorrect - someone used to e.g. using SDL bindings on another language may expect that the types returned by the binding would be compatible with the language's native functionality.

@dlangBugzillaToGithub
Copy link
Author

andrej.mitrovich (@AndrejMitrovic) commented on 2013-01-13T10:34:43Z

*** Issue 6157 has been marked as a duplicate of this issue. ***

@dlangBugzillaToGithub
Copy link
Author

andrej.mitrovich (@AndrejMitrovic) commented on 2013-01-13T10:35:51Z

(In reply to comment #21)
> *** Issue 6157 has been marked as a duplicate of this issue. ***

FYI: http://d.puremagic.com/issues/show_bug.cgi?id=6157 has an experimental implementation in the attachment (for conv.to), but I'm not an expert on things unicode.

@dlangBugzillaToGithub
Copy link
Author

dfj1esp02 commented on 2014-02-20T10:21:15Z

(In reply to comment #19)
> So, you're suggesting that we remove a useful feature because newbies coming
> from C/C++ keep mistakingly thinking that char* is a string?

char* is the way to represent null-terminated strings and C programmers are not mistaken in that.

As to the useful feature, it can be done with %p format specifier - that's what printf does.

@dlangBugzillaToGithub
Copy link
Author

simen.kjaras commented on 2016-04-14T21:41:58Z

https://github.com/D-Programming-Language/phobos/pull/4199

PR covers conversion from {X}char* to {Y}char[], but not the other way around. no such conversions are currently supported at all, so took the liberty of not implementing that without a bit more discussion.

Are there convincing reasons to support any of those conversions at all?

@dlangBugzillaToGithub
Copy link
Author

github-bugzilla commented on 2016-04-26T20:14:12Z

Commits pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/60a233372a96abab810f030b4e3ff494987aa25e
Partial fix of Issue 8384 - std.conv.to should allow conversion between any pair of string/wstring/dstring/char*/wchar*/dchar*

https://github.com/dlang/phobos/commit/22c7f11265d62ad1ac387bc9aaa90b742f9563b2
Merge pull request #4199 from Biotronic/fix-8384

Partial fix of Issue 8384 - std.conv.to should allow conversion betwe…

@LightBender LightBender removed the P4 label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants