Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add std.string.outdent #156

Closed
wants to merge 19 commits into from
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
252 changes: 251 additions & 1 deletion std/string.d
Expand Up @@ -60,7 +60,7 @@ module std.string;

import core.exception : onRangeError;
import core.vararg, core.stdc.stdio, core.stdc.stdlib, core.stdc.string,
std.ascii, std.conv, std.exception, std.format, std.functional,
std.algorithm, std.ascii, std.conv, std.exception, std.format, std.functional,
std.metastrings, std.range, std.regex, std.stdio, std.traits,
std.typetuple, std.uni, std.utf;

Expand Down Expand Up @@ -3620,6 +3620,256 @@ unittest
assert(wrap("u u") == "u u\n");
}

/******************************************
* Removes indentation from a multi-line string or an array of single-line strings.
*
* This uniformly outdents the text as much as possible.
* Whitespace-only lines are always converted to blank lines.
*
* The indentation style must be consistent (except for whitespace-only lines)
* or else this will throw an Exception.
*
* Works at compile-time.
*
* Example:
* ---
* writeln(q{
* import std.stdio;
* void main() {
* writeln("Hello");
* }
* }.outdent());
* ---
*
* Output:
* ---
*
* import std.stdio;
* void main() {
* writeln("Hello");
* }
*
* ---
*
*/

S outdent(S)(S str) if(isSomeString!S)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it so that it specifically takes a string type (e.g. C[] outdent(C)(C[] str) if(isSomeChar!C)) will make it so that outdent can work with immutable strings. As it is, it'll probably fail to compile with an immutable string (unless you get lucky and both split and join take immutable strings).

{
if(str == "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably doesn't matter that much, but In general, I'd be inclined to argue that empty should be used rather than == "".

return "";

S[] lines;
if(__ctfe)
lines = str.ctfe_split("\n");
else
lines = str.split("\n");

lines = outdent(lines);

if(__ctfe)
return lines.ctfe_join("\n");
else
return lines.join("\n");
}

/// ditto
S[] outdent(S)(S[] lines) if(isSomeString!S)
{
if(lines.length == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again. In general, using empty is probably better, but since it's an array, it doesn't really matter all that much. It does matter with containers though, so it's just generally better to use empty. It's also shorter for whatever that's worth.

return null;

bool isNonWhite(dchar ch)
{
if(__ctfe)
return !ctfe_isWhite(ch);
else
return !std.uni.isWhite(ch);
}
S leadingWhiteOf(S str)
{
return str[ 0 .. $-find!(isNonWhite)(str).length ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return str[0 .. $ - find!(not!(std.uni.isWhite))(str).length];

(not is from std.functional)

In non-CTFE this could also be

return to!S(until!(not!isWhite)(str));

but it may be less efficient (not benchmarked)

}

// Apply leadingWhiteOf, but emit null instead for whitespace-only lines
S[] indents;
indents.length = lines.length;
foreach(i, line; lines)
{
string stripped;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be S rather than string? I don't think that it'll compile if S isn't string. Also, I'd argue that this is a prime example of when a ternary operator should be used, since it allows you to directly initialize the variable rather than having to declare it and then assign it with an if-else statement. That's arguably a stylistic choice though. It would allow you to use auto, however, which completely eliminates the string vs S issue here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be S rather than string?

Yup, good catch.

Also, I'd argue that this is a prime example of when a ternary operator
should be used, since it allows you to directly initialize the variable
rather than having to declare it and then assign it with an if-else
statement. That's arguably a stylistic choice though. It would allow you
to use auto, however, which completely eliminates the string vs S
issue here.

I'm normally a huge fan of "?:". Somehow, my brain tends to think of
"if(__ctfe)" as if it were a static if, so it didn't even occur to me to use
it here. You're right, that'd be a bug improvement.

Your other suggestions sound good too, I'll take care of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other places in the code where that applies, not just here, so if you want to change if(_ctfe) ... to use ternary operators, you're going to want to look over all of your code here to make sure that you catch them all.

if(__ctfe)
stripped = line.ctfe_strip();
else
stripped = line.strip();

indents[i] = stripped==""? null : leadingWhiteOf(line);
}

S shorterAndNonNull(S a, S b) {
if(a is null) return b;
if(b is null) return a;

return (a.length < b.length)? a : b;
};
auto shortestIndent = std.algorithm.reduce!(shorterAndNonNull)(indents);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a stylistic choice, so you're free to do it how you'd like, but I believe that most of use don't use the parens with ! when they're not needed, and it does improve legibility to leave it out. So, I'd definitely suggest that you leave it out, but since it's a stylistic choice, it's up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  •    string stripped;
    
  •    if(__ctfe)
    
  •        stripped = line.ctfe_strip();
    
  •    else
    
  •        stripped = line.strip();
    
  •    indents[i] = stripped==""? null : leadingWhiteOf(line);
    
  • }
  • S shorterAndNonNull(S a, S b) {
  •    if(a is null) return b;
    
  •    if(b is null) return a;
    
  •    return (a.length < b.length)? a : b;
    
  • };
  • auto shortestIndent =
    std.algorithm.reduce!(shorterAndNonNull)(indents);

It's a stylistic choice, so you're free to do it how you'd like, but I
believe that most of use don't use the parens with ! when they're not
needed, and it does improve legibility to leave it out. So, I'd
definitely suggest that you leave it out, but since it's a stylistic
choice, it's up to you.

I agree, I like skipping the parens after ! when possible, too. Must have
just missed this one somehow, probably while refactoring something...


foreach(i; 0..lines.length)
{
if(indents[i] is null)
lines[i] = "";
else if(indents[i].startsWith(shortestIndent))
lines[i] = lines[i][shortestIndent.length..$];
else
{
if(__ctfe)
assert(false, "Inconsistent indentation");
else
throw new Exception("Inconsistent indentation");
}
}

return lines;
}

private S ctfe_join(S)(S[] strs, S delim) if(isSomeString!S)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std.range.join is CTFE-able.

{
S value = "";

foreach(i, str; strs)
value ~= (i==0?"":delim) ~ str;

return value;
}

private S[] ctfe_split(S)(S str, S delim) if(isSomeString!S)
{
S[] arr;
S match;
while((match = ctfe_find(str, delim)).length > 0)
{
arr ~= str[0..$-match.length];
str = match[delim.length..$];
}
arr ~= str;
return arr;
}

private bool ctfe_isWhite(dchar c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but std.uni.isWhite is CTFE-able.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but std.uni.isWhite is CTFE-able.

Hmm, it wasn't when I originally wrote it in my own libs. Looks like it is
now.

(not is from std.functional)
The cast is unnecessary. All char types are implicitly convertible to
dchar.

Thanks, will fix.

std.range.join is CTFE-able.

It isn't in 2.054. Though it looks like it is in the git version I grabbed
yesterday. Maybe I'll use it and submit a separate pull request for some
CTFEability tests to make sure stuff likely to be needed at compile time
stays CTFEable... IIRC, Don did say on the NG that after 2.054 there
shouldn't be more CTFEability regressions.

{
foreach(i; 0..std.ascii.whitespace.length)
if(c == std.ascii.whitespace[i])
return true;

return c == lineSep || c == paraSep ||
c == '\u0085' || c == '\u00A0' || c == '\u1680' || c == '\u180E' ||
(c >= '\u2000' && c <= '\u200A') ||
c == '\u202F' || c == '\u205F' || c == '\u3000';
}

private S ctfe_find(S)(S haystack, S needle) if(isSomeString!S)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find appears to work with CTFE - or at least it did with what I just tried. Did you find a particular case where it didn't? If it does, then you should use it instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find appears to work with CTFE - or at least it did with what I just
tried. Did you find a particular case where it didn't? If it does, then
you should use it instead.

Look like this must be another one that didn't work when I originally wrote
it, but apperently does now. I'll try it.

{
if(haystack.length < needle.length)
return null;

for(size_t i=0; i <= haystack.length-needle.length; i++)
{
if(haystack[i..i+needle.length] == needle)
return haystack[i..$];
}
return null;
}

private S ctfe_strip(S)(S str) if(isSomeString!S)
{
return str.ctfe_stripl().ctfe_stripr();
}

private S ctfe_stripl(S)(S str) if(isSomeString!S)
{
size_t startIndex = str.length;

foreach(i, ElementEncodingType!S ch; str)
if(!ctfe_isWhite(cast(dchar)ch))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast is unnecessary. All char types are implicitly convertible to dchar.

{
startIndex = i;
break;
}

return str[startIndex..$];
}

private S ctfe_stripr(S)(S str) if(isSomeString!S)
{
size_t endIndex = 0;

foreach_reverse(i, ElementEncodingType!S ch; str)
if(!ctfe_isWhite(cast(dchar)ch))
{
endIndex = i+1;
break;
}

return str[0..endIndex];
}

unittest
{
debug(string) printf("string.outdent.unittest\n");

static assert(ctfe_split("a--b-b--ccc---d----e--", "--") == ["a","b-b","ccc","-d","","e",""]);
static assert(ctfe_split("-Xa", "-X") == ["","a"]);

static assert(ctfe_split("a--b-b--ccc---d----e--", "--") == ["a","b-b","ccc","-d","","e",""]);
static assert(ctfe_split("-Xa", "-X") == ["","a"]);

static assert(ctfe_join([""," ","","A","","BC","","D"," ",""], "\n") == "\n \n\nA\n\nBC\n\nD\n \n");

static assert(ctfe_isWhite(' '));
static assert(ctfe_isWhite('\t'));
static assert(ctfe_isWhite('\n'));
static assert(!ctfe_isWhite('d'));
static assert(!ctfe_isWhite('-'));

static assert(ctfe_find("abcde", "d" ) == "de");
static assert(ctfe_find("abcde", "cd") == "cde");
static assert(ctfe_find("abcde", "cX") == "");

static assert(ctfe_find("abc", "a") == "abc");
static assert(ctfe_find("abc", "c") == "c");
static assert(ctfe_find("aabbcc", "aa") == "aabbcc");
static assert(ctfe_find("aabbcc", "cc") == "cc");

static assert(ctfe_find("abc", "abcde") == "");

static assert(ctfe_strip(" \tHi \r\n") == "Hi");
static assert(ctfe_strip("Hi") == "Hi");
static assert(ctfe_strip(" \t \r\n") == "");
static assert(ctfe_strip("") == "");

enum testStr =
"
\t\tX
\tX
\t\t

\t\t\tX
\t ";

enum expected =
"
\tX
X


\t\tX
";

assert(testStr.outdent() == expected);
assert("".outdent() == "");
assert(" \n \t\n ".outdent() == "\n\n");

enum ctfeResult = testStr.outdent();
assert(ctfeResult == expected);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should augment these tests to test multiple string types. For an example, look at std.string.indexOf. It uses foreach and TypeTuple. In the extreme case, you can test every variation of character type and mutability, but at minimum, it should be tested with string, wstring, and dstring. You should also test some non-ASCII characters. '\U00010143' is used in several tests in std.string and std.utf, because it's multiple code units for both char and wchar. That'll help catch bugs due to code that assumes that a code point is a single code unit.


private template softDeprec(string vers, string date, string oldFunc, string newFunc)
{
Expand Down