Skip to content

Conversation

DmitryOlshansky
Copy link
Member

Second attempt to fold it in after review.

Of things new:

  • to{Lower,Upper}InPlace that tries hard to be in-place as far as Unicode allows it.
  • std.string versions are aliased to std.uni ones
  • better management of tables. Now only the ones that are actually get used are pulled into into the executable (kudos to Rainer!)

Still unknown:

  • FreeBSD '-release' bug

@monarchdodra
Copy link
Collaborator

Nitpick: The undocumented function toTitlecase should be called toTitle. It might sound vague at first, because we aren't that use to the concept of "titlecase", but it is better to stay consistent with toUpper and toLower (and not toUppercase and toLowercase).

Also nitpick: Since you are bumping our Unicode version, you are introducing Title Case. This breaks a few things, as noted in this bug report:
http://d.puremagic.com/issues/show_bug.cgi?id=10204
Your current pull fixes toUpper and toLower, so no problem there, but capitalize is now broken according to the new Unicode standard.

The fix is pretty trivial, if you can do it. If not, if nothing else, a TODO would be nice.

@DmitryOlshansky
Copy link
Member Author

Nitpick: The undocumented function toTitlecase should be called toTitle.

Maybe, for the moment it's both undocumented and private. I would go for a separate pull that enables it and adds new interface for toXYZ family. Then fix capitalize, etc. Let's eat this elephant by bits and pieces :)

Also nitpick: Since you are bumping our Unicode version, you are introducing Title Case. This breaks a few things, as noted in this bug report...

BTW Titlecase is not anything new, we simply neglected it for years.
http://www.unicode.org/Public/4.1.0/ucd/SpecialCasing.txt

E.g. even 4.1 explicitly mentions titlecase!
And I haven't seen any big news in 6.2 vs 6.1, just a set of minor additions.

capitalize is now broken according to the new Unicode standard.

Was broken and a we have a bugzilla entry now, so we are in a good shape ... :)

@monarchdodra
Copy link
Collaborator

so we are in good shape ... :)

Didn't notice it was also private. I think 6.2 introduced a lot of new code that actually made use of titlecase, as opposed to before. In any case, noted. We'll process these minor issues later.

@DmitryOlshansky
Copy link
Member Author

@braddr Excuse me for pinging, but what kind of FreeBSD these boxes are running?
It seems I can't get 9.1 to pass the test suite as is at all (fails in std.conv, std.datetime etc.).

@braddr
Copy link
Member

braddr commented Jun 19, 2013

8.1 and 8.2, I believe. I'm sure neither is 9.x

@DmitryOlshansky
Copy link
Member Author

8.1 and 8.2, I believe. I'm sure neither is 9.x

Darn.. and on 9.1 at least std.uni release/debug tests are passing. Downloading older ones.

@quickfur
Copy link
Member

W00t! Looks like the new std.uni is ready to merge? Any chance of seeing this pulled anytime soon?

@quickfur
Copy link
Member

@andralex poke

@DmitryOlshansky
Copy link
Member Author

Okay, looks like our patient has stabilized :)

However I have bad news as it seems to uncovered a compiler bug (with release build flags).
Looking at the last diff the problem revolves around 2 things: local non-static template function and a slice of an r-value (taking address of r-value in expression):

  Grapheme g = decompose!T(ch); 
  assert(equalS(g[], r), text(g[], " vs ", r));        

vs

  assert(equalS(decompose!T(ch)[], r), text(decompose!T(ch)[], " vs ", r));        

opSlice here is taking an address of a stack-allocated temporary Grapheme and uses that to access elements.

My understansding is that D guarantees that r-values survive untill full expression is evaluated (in this case CallExp). Hence both pieces of code should be equivalent. Thus I think it'd be worthwhile to compare disassembly of 2 versions.

@DmitryOlshansky
Copy link
Member Author

@braddr Any chance to get an ssh account on these FreeBSDs? I can't reproduce this ellusive segfault somewhere in jemalloc free on my VMs (8.3 and 9.1) with or without my latest patch.

@@ -257,8 +257,8 @@ private:
enum IR:uint {
Char = 0b1_00000_00, //a character
Any = 0b1_00001_00, //any character
CodepointSet = 0b1_00010_00, //a most generic CodepointSet [...]
Trie = 0b1_00011_00, //CodepointSet implemented as Trie
CodepointSetOld = 0b1_00010_00, //a most generic CodepointSetOld [...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another proof that vertical alignment sucks. Please align, or (better) get rid of alignment altogether.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, it's rather me being to hasty to do find/replace.
Among others cases like comments the ones with IR opcodes should have stayed intact too, will fix.

@andralex
Copy link
Member

andralex commented Jul 1, 2013

Why all the renames from CodePointSet to CodePointSetOld?

@DmitryOlshansky
Copy link
Member Author

Why all the renames from CodePointSet to CodePointSetOld?

Name clash of old internal CodepointSet of std.regex with the new public one from std.uni.
Since they are not exactly drop-in replacements (internal ones are more hackish) the code needs a fair amount of work and I'd rather do this step by step. The mass rename was a nice stop gap solution to this when I hacked my way through to integrate std.uni with the rest of phobos and reading hundreds of pages of errors.

I guess something less intrusive (now that everything is stable) is in order... renamed imports/selective imports?

@DmitryOlshansky
Copy link
Member Author

Here, now the full diff should be far simpler ;)

@ghost
Copy link

ghost commented Jul 4, 2013

I recommend marking any functions which should not be used by user code to be private, even if they're not documented. We've had cases where people used undocumented code before and ended up breaking code.

@ghost
Copy link

ghost commented Jul 4, 2013

E.g. copyBackwards.

@DmitryOlshansky
Copy link
Member Author

Good catch, these cludges should have been below private: section.

@ghost
Copy link

ghost commented Jul 4, 2013

Do you think by any chance Trie can be moved into another module? I'm not worried about line length, but I've seen Trie's used for other purposes before (in other libs) and it might be useful to put the structure somewhere where it's more visible, so people don't have to ask "is there a Trie in Phobos?".

@DmitryOlshansky
Copy link
Member Author

Do you think by any chance Trie can be moved into another module?

Absolutely. That is going to happen once I tackled the following:
a) Figure out few more generic ways about it, as it is now it may be called FixedTrie
b) Need to finalize a few things about interface that goes hand in hand with a)
c) Review also showed that Trie is not quite ready to be a public artifact and might warrant a review of its own once fully generalized

@ghost
Copy link

ghost commented Jul 4, 2013

P.S. there's an unfortunate thing about Sequence, DMD internally counts the instance count and if you try to instantiate foreach (i; Sequence!(0, 499)) you'll get this error:

test.d(10): Error: template instance std.typetuple.TypeTuple!() recursive expansion

500 is basically a hardcoded number of max recursive instantiations.

It's just something to keep in mind. I'm not sure if this can be written in a simpler way to avoid many template instances.

@DmitryOlshansky
Copy link
Member Author

About Sequence - truth be told it can be fixed, but I never use it beyond about 4-5 items.
I'd love to see it as StaticIota or whatever in the std library. I think I've seen it resurface a few times in Phobos already.

@monarchdodra
Copy link
Collaborator

Well, let's not forget these are static iterations... If you iterate 500
times, the compiler will have to create (at least) 500 lines of code. This
is a limitation, but I'd say hardly a problematic one... Worst case
scenario, a two-level sequence could get you to 250000 iterations, no?

On Fri, Jul 5, 2013 at 11:40 AM, Dmitry Olshansky
notifications@github.comwrote:

About Sequence - truth be told it can be fixed, but I never use it beyond
about 4-5 items.
I'd love to see it as StaticIota or whatever in the std library. I think
I've seen it resurface a few times in Phobos already.


Reply to this email directly or view it on GitHubhttps://github.com//pull/1347#issuecomment-20509382
.

@DmitryOlshansky
Copy link
Member Author

Funnly here is my Sequence used in std.regex (this is why I recall seeing it somewhere):

//generate code for TypeTuple(S, S+1, S+2, ... E)
@system string ctGenSeq(int S, int E)
{
    string s = "alias TypeTuple!(";
    if(S < E)
        s ~= to!string(S);
    for(int i = S+1; i < E;i++)
    {
        s ~= ", ";
        s ~= to!string(i);
    }
    return s ~") Sequence;";
}

//alias to TypeTuple(S, S+1, S+2, ... E)
template Sequence(int S, int E)
{
    mixin(ctGenSeq(S,E));
}

No extra templates but some costly CTFE. Anyway I think it's safe to put aside this matter for a separate ER/pull pair.

@DmitryOlshansky
Copy link
Member Author

Rebased. Anything else preventing the merge?

As per recent pull for std.string icmp/sicmp are checked to work in CTFE here as well.

Poured in some work to let DMD also get decent performance (even w/o inlining) otherwise debug builds would take a horrible performance hit.

O.T. I've found an interesting issue about inlining is that -noboundscheck hugely affects what is scanned to be inlined basically it's everything vs next to nothing (investigating what might cause it).

@DmitryOlshansky
Copy link
Member Author

Update: seems like I failed the first time, wierd. Anyway now finally all green.

@quickfur
Copy link
Member

quickfur commented Aug 5, 2013

@andralex @jmdavis Can we merge this yet? It's green now, and it has been quite a while since it was approved in review.

jmdavis added a commit that referenced this pull request Aug 8, 2013
@jmdavis jmdavis merged commit c109594 into dlang:master Aug 8, 2013
@monarchdodra
Copy link
Collaborator

So that's it? This is pulled? MOST AWESOME. Congratulations. This warrants someone making an announcement on the boards (and a party!)

@quickfur
Copy link
Member

quickfur commented Aug 8, 2013

Woohoo!!! Finally!!! Congrats to all involved, this is definitely a big step for Phobos.

@DmitryOlshansky
Copy link
Member Author

Cool, thanks @jmdavis !
In the meantime I'll port std.regex to use new and shiny abstractions.

@CyberShadow
Copy link
Member

A regression was reported for this pull request:
https://d.puremagic.com/issues/show_bug.cgi?id=12455

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants