Skip to content

Conversation

jmdavis
Copy link
Member

@jmdavis jmdavis commented Jul 27, 2013

This makes it so that if you're operating on chars or wchars with
std.ascii functions, you don't have to cast the return values back to
char or wchar, but it still allows you to operate on dchars without
mangling any Unicode values.

@monarchdodra
Copy link
Collaborator

Why does ubyte get special treatment? From my tests, any integral with size smaller than 4* gets accepted. This code works:

void foo(dchar){}

void main()
{
    uint d = uint.max;
    foo(d);
}

That said, I'm unsure about the the uint to dchar cast. Is the fact it is accepted a bug?

void main()
{
    dchar f = uint.max; //Illegal
    uint d = uint.max;
    dchar e = d; //But this is OK?
}

@jmdavis
Copy link
Member Author

jmdavis commented Jul 27, 2013

The purpose of accepting ubyte is to deal with cases where you want to operate on ASCII-specifically and so you have arrays of ubyte rather than string in order to avoid having them be treated as ranges of dchar. I think that we should move towards having string functions in general accept arrays ubyte when it makes sense for them to operate on ASCII without decoding.

The isX functions will accept all integral types of 4 bytes or smaller whether we like it or not, because those integral types will implicitly convert to dchar. The toX functions are templated so that they can return the same type that they're passed in and as such the template constraint has to choose what it accepts, and so in order to accept ubyte in addition to the 3 character types, those functions explicitly accept ubyte. I suppose that I could make them just accept any integral type for consistency's sake, though it was only really ever the intention for any of them to accept dchar and ubyte. The other integral types just get accepted because of the conversion rules.

As for your example with dchar.max, I don't know why the first line compiles and the others don't. In general, the character types are treated very much like integral types and such conversions just work. I suspect that the first line is a bug, but I don't know.

@monarchdodra
Copy link
Collaborator

OK. Your explanation makes sense. As for the whole dchar valid range thing, I'll ask around the boards and maybe file it. It still isn't very clear to me if a dchar can go past 0x10FFFF.

@jmdavis
Copy link
Member Author

jmdavis commented Jul 27, 2013

Okay. I updated it so that toLower and toUpper will accept anything which implicitly converts to dchar (including user-defined types, for better or worse, since that would have worked before - though in that case, they return dchar rather than the original type, which was also how it would have worked before). I also removed all of the stuff that I had added which talked about ubyte, since the fact that toLower and toUpper now explicitly state that they accept anything which implicitly converts to dchar covers ubyte.

@monarchdodra
Copy link
Collaborator

Implemented as:

auto toUpper(C)(C c)
    if (is(C : dchar))
out(result)
{
    assert(!isLower(result));
}
body
{
    alias Ret = Select!(isScalarType!C, C, dchar);
    return cast(Ret)(c - ('a' - 'A'));
}

Helps keep the public interface cleaner, and the cognitive overhead down. This is always desired (IMO).

@monarchdodra
Copy link
Collaborator

One of the things that I noticed is that this behavior just changed:

    string s = "hello";
    foreach(ref e; s)
    {
        auto a = toUpper(e);
        static assert(is(typeof(a) == char));
    }

Never mind that a is a char and not a dchar now (that's the point), but I'm wondering if it is good to preserve the qualification? EG, maybe use:

Select!(isScalarType!C, Unqual!C, dchar);

(or equivalent of course)

I don't see much point in returning a built-in value type as const/immutable anyways. As for "shared", I think it becomes outright wrong?

@monarchdodra
Copy link
Collaborator

A third comment: This changes a non-template function into a template function, just because of the return type, yet the implementation remains unchanged. This might be premature optimization (or even non-optimization due to the size of said function), but maybe it should be implemented as a forward to a non-template function? EG:

//Public template that only filters and changes return type
auto toUpper(C)(C c) @safe pure nothrow
    if (is(C : dchar))
out(result)
{
    assert(!isLower(result));
}
body
{
    alias Ret = Select!(isScalarType!C, C, dchar);
    return cast(Ret) toUpperImpl(c);
}

//Non template that does actual job.
private dchar toUpperImpl(dchar c) @safe pure nothrow
{
    return isLower(c) ? cast(dchar)(c - ('a' - 'A')) : c;
}

I don't know. I don't think it makes much of a change either way, so either way is fine for me.

@jmdavis
Copy link
Member Author

jmdavis commented Jul 28, 2013

Okay, update per your excellent suggestions. I had made the overloads separate primarily because of wanting to mark the overload for the built-in types with @safe pure nothrow, but it looks like that does get inferred properly in spite of the cast (which I suppose I should have realized given that I got away with marking it as @safe rather than @trusted). And I didn't know about Select, so thanks for the tip, though in this case, I think that static if works better, as it avoids the need for another function.

@monarchdodra
Copy link
Collaborator

Hum... I thought I learned Select from you? It's used in algorithm.find anyways.

One of the issues with your re-entering approach is that the output contract get validated twice, which is sub-optimal for non-release builds. It's not a big deal, but I guess we should avoid it if we can.

characters, and all $(D toX) functions do nothing to unicode characters.
All of the functions in std.ascii accept Unicode characters but effectively
ignore them. All $(D isX) functions return $(D false) for Unicode
characters, and all $(D toX) functions do nothing to Unicode characters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to get technical, these functions will ignore unicodes outside of the Basic Latin (EG ASCII) Block. ASCII characters are still Unicode characters.

Maybe re-writing this in the form of "Will accept any unicode charater, but ignore those that aren't ASCII" would be more accurate?

@jmdavis
Copy link
Member Author

jmdavis commented Jul 28, 2013

Hum... I thought I learned Select from you? It's used in algorithm.find anyways.

Well, if I knew about it previously, I'd completely forgotten about it since.

One of the issues with your re-entering approach is that the output contract get validated twice, which is sub-optimal for non-release builds. It's not a big deal, but I guess we should avoid it if we can.

Then I'll just remove the contract and avoid the extra code. I don't particularly like out contracts anyway. Aside from inheritance, they're pretty useless. They're just testing what the unit tests are supposed to test.

This makes it so that if you're operating on chars or wchars with
std.ascii functions, you don't have to cast the return values back to
char or wchar, but it still allows you to operate on dchars without
mangling any Unicode values.
@jmdavis
Copy link
Member Author

jmdavis commented Jul 28, 2013

Okay. Updated.

@monarchdodra
Copy link
Collaborator

Then I'll just remove the contract and avoid the extra code.

I expected you'd do that. I don't really think that was meant to be in an out contract either anyways.

As far as I'm concerned, this pull is ready to go. I'll let it sit for a couple of days, just in case, and then I'll merge it.

@jmdavis
Copy link
Member Author

jmdavis commented Jul 29, 2013

I don't really think that was meant to be in an out contract either anyways.

If you're going to have an out contract for toLower or toUpper, then using isLower and isUpper in their respective out contracts is exactly the correct thing to do. I just don't agree with the very principle of out contracts. They incur constant overhead doing what unit tests are supposed to do (without incurring the overhead). And in this case, it's trivial for the unit tests to actually cover every possible input, making the out contract make that much less sense.

@monarchdodra
Copy link
Collaborator

We're going to go off topic now, but in any case, the way I understood it, a "contract" is meant to validate values at the boundaries of a function, that may be wrong, but through no fault of the actual function. For example, taking the classic "sqrt" function, this is the "I work fine on positive numbers. The contract here, is you don't feed me negative numbers, those are outside of my legal range".

An out contract, (again, the way I understood it), is to validate output that could be wrong due to user input. For example, if you write "POW(int)", then the contract could be "I work fine, as long as the input you give me won't cause my output to overflow".

I'm definitely not arguing that the contract implementations were wrong, just that, as you said, that whole out contract had no business being there in the first place. That was the stuff of unittests.

@jmdavis
Copy link
Member Author

jmdavis commented Jul 29, 2013

I think that you misunderstand the purpose of out contracts blocks, much as you seem to have the basic idea of DbC correct (though it could be that we both understand perfectly well and just aren't quite communicating it properly).

The purpose of the in contract block is to check that the input is legal according to the "contract" of the function - what the function expects and requires. The function - per the contract - then guarantees that it will generate the correct output for that input. The purpose of the out contract block is to verify that the function kept up its part of the bargain and that the output is therefore correct for the given input. The out contract block doesn't care whether the input was valid or not, because the in contract block should have already caught all of the cases where the input was invalid. It assumes that the input was valid and is just verifying that the output is correct. The "perfect" out contract block would throw an AssertError every time that the function generated incorrect output and never throw if it generated correct output.

As such, in the case of sqrt, the "perfect" out contract block would verify that the result was exactly the square root of the input, and it would never even be run if the input violated the function's contract (as the code would never get past the in contract block). However, in most cases, actually verifying the correctness of the output is at least as expensive as calculating it in the first place, making having out contracts prohibitive, and proper unit tests make the redundant - without the overhead - making out contracts rather pointless in general.

And if the output of toLower is supposed to be the lowercase version of the letter passed in, then the perfect out contract block would test that the return value was the lowercase version of the input. However, D doesn't support saving the input for the out contract block to see (the out contract block only has access to the return value), so the best that it could do is check that the output was lowercase. So, what the code was doing before was technically as correct as we could make the out contract block in D. The only problem with it was that proper unit tests make out contract blocks redundant.


static assert(isSafe!(toUpper!char));
static assert((functionAttributes!((){'a'.toUpper();}) & FunctionAttribute.pure_) != 0);
static assert((functionAttributes!((){'a'.toUpper();}) & FunctionAttribute.nothrow_) != 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like for is, you can omit the () in front of the block/lambda here. EG:

is(typeof({T t = t;}));
or
if (functionAttributes!({'a'.toUpper();}) != FunctionAttribute.pure_)

Also, I wouldn't complain about a != 0 in an if, but it is adding an extra pair of parenthesis.

I'll merge this evening regardless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functionAttributes!({'a'.toUpper();}) != FunctionAttribute.pure_ would be wrong, because the result of functionAttributes is a bunch of |ed flags, and pure_ is just one of them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you are saying. I think there might have been a misunderstanding. Are you saying this is wrong?

static assert(functionAttributes!({'a'.toUpper();}) & FunctionAttribute.safe);
static assert(functionAttributes!({'a'.toUpper();}) & FunctionAttribute.pure_);
static assert(functionAttributes!({'a'.toUpper();}) & FunctionAttribute.nothrow_); 

It works on my machine...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. That will work. What won't work is if (functionAttributes!({'a'.toUpper();}) != FunctionAttribute.pure_), because in that example you're comparing the result of functionAttributes against pure_, whereas in your last example, you're checking that the result of & on pure_ is non-zero.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh... right '-_- . I see it now. That above test was non-sense :D I missed that I changed & into a !=.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, while I'm not necessarily against not having the != 0, I would point out that it does help avoid someone thinking that the & should have been a &&. Closer inspection of the expression should make it fairly clear anyway, but that's always a risk when using & or | in boolean expressions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point too. I'm fine with leaving it the way it is too. As you want of course.

monarchdodra added a commit that referenced this pull request Aug 2, 2013
Fix issue# 10717.

Merged.
@monarchdodra monarchdodra merged commit f7a593f into dlang:master Aug 2, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants