doc/Type/Str.pod6

=begin pod

=TITLE class Str

=SUBTITLE String of characters

    class Str is Cool does Stringy { }

Built-in class for strings. Objects of type C<Str> are immutable.

=head1 Methods

=head2 routine chop

    multi sub    chop(Str:D --> Str:D)
    multi method chop(Str:D: $chars = 1 --> Str:D)

Returns the string with C<$chars> characters removed from the end.

=head2 routine chomp

Defined as:

    multi sub    chomp(Str:D  --> Str:D)
    multi method chomp(Str:D: --> Str:D)

Returns the string with a logical newline (any codepoint that has the
C<NEWLINE> property) removed from the end.

Examples:

    say chomp("abc\n");       # OUTPUT: «abc␤»
    say "def\r\n".chomp;      # OUTPUT: «def␤» NOTE: \r\n is a single grapheme!
    say "foo\r".chomp;        # OUTPUT: «foo␤»

=head2 routine lc

Defined as:

    multi sub    lc(Str:D  --> Str:D)
    multi method lc(Str:D: --> Str:D)

Returns a lower-case version of the string.

Examples:

    lc("A"); # RESULT: «"a"»
    "A".lc;  # RESULT: «"a"»

=head2 routine uc

    multi sub    uc(Str:D  --> Str:D)
    multi method uc(Str:D: --> Str:D)

Returns an uppercase version of the string.

=head2 routine fc

    multi sub    fc(Str:D  --> Str:D)
    multi method fc(Str:D: --> Str:D)

Does a Unicode "fold case" operation suitable for doing caseless
string comparisons.  (In general, the returned string is unlikely to
be useful for any purpose other than comparison.)

=head2 routine tc

    multi sub    tc(Str:D  --> Str:D)
    multi method tc(Str:D: --> Str:D)

Does a Unicode "titlecase" operation, that is changes the first character in
the string to title case, or to upper case if the character has no title case
mapping

=head2 routine tclc

    multi sub    tclc(Str:D  --> Str:D)
    multi method tclc(Str:D: --> Str:D)

Turns the first character to title case, and all other characters to lower
case

=head2 routine wordcase

=for code
multi sub    wordcase(Cool $x  --> Str)
multi sub    wordcase(Str:D $x --> Str)
multi method wordcase(Str:D: :&filter = &tclc, Mu :$where = True --> Str)

Returns a string in which C<&filter> has been applied to all the words
that match C<$where>. By default, this means that the first letter of
every word is capitalized, and all the other letters lowercased.

=head2 method unival

    multi method unival(Str:D --> Numeric)

Returns the numeric value that the first codepoint in the invocant represents,
or C<NaN> if it's not numeric.

    say '4'.unival;     # OUTPUT: «4␤»
    say '¾'.unival;     # OUTPUT: «0.75␤»
    say 'a'.unival;     # OUTPUT: «NaN␤»

=head2 method univals

    multi method univals(Str:D --> List)

Returns a list of numeric values represented by each codepoint in the invocant
string, and C<NaN> for non-numeric characters.

    say "4a¾".univals;  # OUTPUT: «(4 NaN 0.75)␤»

=head2 routine chars

    multi sub    chars(Cool  $x --> Int:D)
    multi sub    chars(Str:D $x --> Int:D)
    multi sub    chars(str   $x --> int)
    multi method chars(Str:D:   --> Int:D)

Returns the number of characters in the string in graphemes. On the JVM, this
currently erroneously returns the number of codepoints instead.

=head2 method encode

    multi method encode(Str:D: $encoding, $nf --> Blob)

Returns a L<Blob> which represents the original string in the given encoding
and normal form. The actual return type is as specific as possible, so
C<$str.encode('UTF-8')> returns a C<utf8> object,
C<$str.encode('ISO-8859-1')> a C<buf8>.

=head2 routine index

    multi sub    index(Cool $s, Str:D $needle, Cool $startpos = 0 --> Int)
    multi method index(Cool $needle, Cool $startpos = 0 --> Int)

Searches for C<$needle> in the string starting from C<$startpos>. It returns
the offset into the string where C<$needle> was found, and an undefined value
if it was not found.

Examples:

    say index "Camelia is a butterfly", "a";     # OUTPUT: «1␤»
    say index "Camelia is a butterfly", "a", 2;  # OUTPUT: «6␤»
    say index "Camelia is a butterfly", "er";    # OUTPUT: «17␤»
    say index "Camelia is a butterfly", "Camel"; # OUTPUT: «0␤»
    say index "Camelia is a butterfly", "Onion"; # OUTPUT: «Nil␤»

    say index("Camelia is a butterfly", "Onion").defined ?? 'OK' !! 'NOT'; # OUTPUT: «NOT␤»

=head2 routine rindex

    multi sub    rindex(Str:D $haystack, Str:D $needle, Int $startpos = $haystack.chars --> Int)
    multi method rindex(Str:D $haystack: Str:D $needle, Int $startpos = $haystack.chars --> Int)

Returns the last position of C<$needle> in C<$haystack> not after C<$startpos>.
Returns an undefined value if C<$needle> wasn't found.

Examples:

    say rindex "Camelia is a butterfly", "a";     # OUTPUT: «11␤»
    say rindex "Camelia is a butterfly", "a", 10; # OUTPUT: «6␤»

=head2 method match

    method match($pat, :continue(:$c), :pos(:$p), :global(:$g), :overlap(:$ov), :exhaustive(:$ex), :st(:$nd), :rd(:$th), :$nth, :$x --> Match)

Performs a match of the string against C<$pat> and returns a L<Match> object if there is a successful match,
and C<(Any)> otherwise. Matches are stored in C<$/>. If C<$pat> is not a L<Regex> object, match will coerce
the argument to a Str and then perform a literal match against C<$pat>.

A number of optional named parameters can be specified, which alter how the match is performed.

=item :continue

The :continue   adverb takes as an argument the position where the regex should start to search.
If no position is specified for :c it will default to 0 unless $/ is set, in which case it defaults to $/.to.

=item :pos

Takes a position as an argument. Fails if regex cannot be matched from that position, unlike :continue.

=item :global

Instead of searching for just one match and returning a Match object, search for every non-overlapping match and return them in a List.

=item :overlap

Finds all matches including overlapping matches, but only returns one match from each starting position.

=item :exhaustive

Finds all possible matches of a regex, including overlapping matches and matches that start at the same position.

=item :st, :nd, rd, nth

Takes an integer as an argument and returns the nth match in the string.

=item :x

Takes as an argument the number of matches to return, stopping once the specified number of matches has been reached.

Examples:
=begin code

say "properly".match('perl');                     # OUTPUT: «｢perl｣␤»
say "properly".match(/p.../);                     # OUTPUT: «｢perl｣␤»
say "1 2 3".match([1,2,3]);                       # OUTPUT: «｢1 2 3｣␤»
say "a1xa2".match(/a./, :continue(2));            # OUTPUT: «｢a2｣␤»
say "abracadabra".match(/ a .* a /, :exhaustive);
# OUTPUT: «(｢abracadabra｣ ｢abracada｣ ｢abraca｣ ｢abra｣ ｢acadabra｣ ｢acada｣ ｢aca｣ ｢adabra｣ ｢ada｣ ｢abra｣)␤»
say 'several words here'.match(/\w+/,:global);    # OUTPUT: «(｢several｣ ｢words｣ ｢here｣)␤»
say 'abcdef'.match(/.*/, :pos(2));                # OUTPUT: «｢cdef｣␤»
say "foo[bar][baz]".match(/../, :1st);            # OUTPUT: «｢fo｣␤»
say "foo[bar][baz]".match(/../, :2nd);            # OUTPUT: «｢o[｣␤»
say "foo[bar][baz]".match(/../, :3rd);            # OUTPUT: «｢ba｣␤»
say "foo[bar][baz]".match(/../, :4th);            # OUTPUT: «｢r]｣␤»
say "foo[bar][baz]bada".match('ba', :x(2));       # OUTPUT: «(｢ba｣ ｢ba｣)␤»

=end code


=head2 routine parse-base

    multi sub    parse-base(Str:D $num, Int:D $radix --> Numeric)
    multi method parse-base(Str:D $num: Int:D $radix --> Numeric)

Performs the reverse of L«C<base>|/routine/base» by converting a string
with a base-C<$radix> number to its base-10 L«C<Numeric>|/type/Numeric»
equivalent. Will L«C<fail>|/routine/fail» if radix is not in range C<2..36>
or of the string being parsed contains characters that are not valid
for the specified base.

    1337.base(32).parse-base(32).say; # OUTPUT: «1337␤»
    'Perl6'.parse-base(30).say;       # OUTPUT: «20652936␤»
    'FF.DD'.parse-base(16).say;       # OUTPUT: «255.863281␤»

See also: L«:16<FF> syntax for number literals|/syntax/Number%20literals»

=head2 routine parse-names

    sub    parse-names(Str:D $names  --> Str:D)
    method parse-names(Str:D $names: --> Str:D)

Takes string with comma-separated Unicode names of characters and
returns a string composed of those characters. Will L«C<fail>|/routine/fail»
if any of the characters' names are empty or not recognized. Whitespace
around character names is ignored.

    say "I {parse-names 'TWO HEARTS'} Perl"; # OUTPUT: «I 💕 Perl␤»
    'TWO HEARTS, BUTTERFLY'.parse-names.say; # OUTPUT: «💕🦋␤»

=head2 routine split

=for code :skip-test
multi sub    split(  Str:D $delimiter, Str:D $input, $limit = Inf,
  :$skip-empty, :$v, :$k, :$kv, :$p --> Positional)
multi sub    split(Regex:D $delimiter, Str:D $input, $limit = Inf,
  :$skip-empty, :$v, :$k, :$kv, :$p --> Positional)
multi sub    split(List:D $delimiters, Str:D $input, $limit = Inf,
  :$skip-empty, :$v, :$k, :$kv, :$p --> Positional)
multi method split(Str:D:   Str:D $delimiter, $limit = Inf,
  :$skip-empty, :$v, :$k, :$kv, :$p --> Positional)
multi method split(Str:D: Regex:D $delimiter, $limit = Inf,
  :$skip-empty, :$v, :$k, :$kv, :$p --> Positional)
multi method split(Str:D: List:D $delimiters, $limit = Inf,
  :$skip-empty, :$v, :$k, :$kv, :$p --> Positional)

Splits a string up into pieces based on delimiters found in the string.

If C<DELIMITER> is a string, it is searched for literally and not treated
as a regex.  If C<DELIMITER> is the empty string, it effectively returns all
characters of the string separately (plus an empty string at the begin and at
the end).  If C<PATTERN> is a regular expression, then that will be used
to split up the string.  If C<DELIMITERS> is a list, then all of its elements
will be considered a delimiter (either a string or a regular expression) to
split the string on.

The optional C<LIMIT> indicates in how many segments the string should be
split, if possible.  It defaults to B<Inf> (or B<*>, whichever way you look at
it), which means "as many as possible". Note that specifying negative limits
will not produce any meaningful results.

A number of optional named parameters can be specified, which alter the
result being returned.  The C<:v>, C<:k>, C<:kv> and C<:p> named parameters
all perform a special action with regards to the delimiter found.

=item :skip-empty

If specified, do not return empty strings before or after a delimiter.

=item :v

Also return the delimiter.  If the delimiter was a regular expression, then
this will be the associated C<Match> object. Since this stringifies as the
delimiter string found, you can always assume it is the delimiter string if
you're not interested in further information about that particular match.

=item :k

Also return the B<index> of the delimiter.  Only makes sense if a list of
delimiters was specified: in all other cases, this will be B<0>.

=item :kv

Also return both the B<index> of the delimiter, as well as the delimiter.

=item :p

Also return the B<index> of the delimiter and the delimiter as a C<Pair>.

Examples:

    say split(";", "a;b;c").perl;           # OUTPUT: «("a", "b", "c")␤»
    say split(";", "a;b;c", :v).perl;       # OUTPUT: «("a", ";", "b", ";", "c")␤»
    say split(";", "a;b;c", 2).perl;        # OUTPUT: «("a", "b;c").Seq␤»
    say split(";", "a;b;c", 2, :v).perl;    # OUTPUT: «("a", ";", "b;c")␤»
    say split(";", "a;b;c,d").perl;         # OUTPUT: «("a", "b", "c,d")␤»
    say split(/\;/, "a;b;c,d").perl;        # OUTPUT: «("a", "b", "c,d")␤»
    say split(<; ,>, "a;b;c,d").perl;       # OUTPUT: «("a", "b", "c", "d")␤»
    say split(/<[;,]>/, "a;b;c,d").perl;    # OUTPUT: «("a", "b", "c", "d")␤»
    say split(<; ,>, "a;b;c,d", :k).perl;   # OUTPUT: «("a", 0, "b", 0, "c", 1, "d")␤»
    say split(<; ,>, "a;b;c,d", :kv).perl;  # OUTPUT: «("a", 0, ";", "b", 0, ";", "c", 1, ",", "d")␤»

    say "".split("x").perl;                 # OUTPUT: «("",)␤»
    say "".split("x", :skip-empty).perl;    # OUTPUT: «()␤»

    say "abcde".split("").perl;             # OUTPUT: «("", "a", "b", "c", "d", "e", "")␤»
    say "abcde".split("",:skip-empty).perl; # OUTPUT: «("a", "b", "c", "d", "e")␤»

=head2 routine comb

    multi sub    comb(Str:D   $matcher, Str:D $input, $limit = Inf)
    multi sub    comb(Regex:D $matcher, Str:D $input, $limit = Inf, Bool :$match)
    multi sub    comb(Int:D $size, Str:D $input, $limit = Inf)
    multi method comb(Str:D $input:)
    multi method comb(Str:D $input: Str:D   $matcher, $limit = Inf)
    multi method comb(Str:D $input: Regex:D $matcher, $limit = Inf, Bool :$match)
    multi method comb(Str:D $input: Int:D $size, $limit = Inf)

Searches for C<$matcher> in C<$input> and returns a list of all matches
(as C<Str> by default, or as L<Match> if C<$match> is True), limited to at most
C<$limit> matches.

If no matcher is supplied, a list of characters in the string
(e.g. C<$matcher = rx/./>) is returned.

Examples:

    say "abc".comb.perl;                 # OUTPUT: «("a", "b", "c").Seq␤»
    say 'abcdefghijk'.comb(3).perl;      # OUTPUT: «("abc", "def", "ghi", "jk").Seq␤»
    say 'abcdefghijk'.comb(3, 2).perl;   # OUTPUT: «("abc", "def").Seq␤»
    say comb(/\w/, "a;b;c").perl;        # OUTPUT: «("a", "b", "c").Seq␤»
    say comb(/\N/, "a;b;c").perl;        # OUTPUT: «("a", ";", "b", ";", "c").Seq␤»
    say comb(/\w/, "a;b;c", 2).perl;     # OUTPUT: «("a", "b").Seq␤»
    say comb(/\w\;\w/, "a;b;c", 2).perl; # OUTPUT: «("a;b",).Seq␤»

If the matcher is an integer value, it is considered to be a matcher that
is similar to / . ** matcher /, but which is about 30x faster.

=head2 routine lines

    multi sub    lines(Str:D $input, $limit = Inf --> Positional)
    multi method lines(Str:D $input: $limit = Inf --> Positional)

Returns a list of lines (without trailing newline characters), i.e. the
same as a call to C<$input.comb( / ^^ \N* /, $limit )> would.

Examples:

    say lines("a\nb").perl;    # OUTPUT: «("a", "b").Seq␤»
    say lines("a\nb").elems;   # OUTPUT: «2␤»
    say "a\nb".lines.elems;    # OUTPUT: «2␤»
    say "a\n".lines.elems;     # OUTPUT: «1␤»

=head2 routine words

    multi sub    words(Str:D $input, $limit = Inf --> Positional)
    multi method words(Str:D $input: $limit = Inf --> Positional)

Returns a list of non-whitespace bits, i.e. the same as a call to
C<$input.comb( / \S+ /, $limit )> would.

Examples:

    say "a\nb\n".words.perl;       # OUTPUT: «("a", "b").Seq␤»
    say "hello world".words.perl;  # OUTPUT: «("hello", "world").Seq␤»
    say "foo:bar".words.perl;      # OUTPUT: «("foo:bar",).Seq␤»
    say "foo:bar\tbaz".words.perl; # OUTPUT: «("foo:bar", "baz").Seq␤»

=head2 routine flip

    multi sub    flip(Str:D  --> Str:D)
    multi method flip(Str:D: --> Str:D)

Returns the string reversed character by character.

Examples:

    "Perl".flip;  # RESULT: «lreP»
    "ABBA".flip;  # RESULT: «ABBA»

=head2 sub sprintf

 multi sub sprintf( Str:D $format, *@args --> Str:D)

This function is mostly identical to the C library C<sprintf> and
C<printf> functions.  The only difference between the two
functions is that C<sprintf> returns a string while the C<printf> function
writes to a file.

The C<$format> is scanned for C<%> characters. Any C<%> introduces a
format token. Format tokens have the following grammar:

 grammar Str::SprintfFormat {
  regex format_token { '%': <index>? <precision>? <modifier>? <directive> }
  token index { \d+ '$' }
  token precision { <flags>? <vector>? <precision_count> }
  token flags { <[ \x20 + 0 \# \- ]>+ }
  token precision_count { [ <[1..9]>\d* | '*' ]? [ '.' [ \d* | '*' ] ]? }
  token vector { '*'? v }
  token modifier { < ll l h V q L > }
  token directive { < % c s d u o x e f g X E G b p n i D U O F > }
 }

Directives guide the use (if any) of the arguments. When a directive
(other than C<%>) is used, it indicates how the next argument
passed is to be formatted into the string to be created.

NOTE: The information below is for a fully functioning C<sprintf>
implementation which hasn't been achieved yet. Formats or features not
yet implemented are marked NYI.

The directives are:

=begin table

 %   a literal percent sign
 c   a character with the given codepoint
 s   a string
 d   a signed integer, in decimal
 u   an unsigned integer, in decimal
 o   an unsigned integer, in octal
 x   an unsigned integer, in hexadecimal
 e   a floating-point number, in scientific notation
 f   a floating-point number, in fixed decimal notation
 g   a floating-point number, in %e or %f notation
 X   like x, but using uppercase letters
 E   like e, but using an uppercase "E"
 G   like g, but with an uppercase "E" (if applicable)
 b   an unsigned integer, in binary

=end table

Compatibility:

=begin table

 i   a synonym for %d
 D   a synonym for %ld
 U   a synonym for %lu
 O   a synonym for %lo
 F   a synonym for %f

=end table

Perl 5 (non-)compatibility:

=begin table

 n   produces a runtime exception
 p   produces a runtime exception

=end table

Modifiers change the meaning of format directives, but are largely
no-ops (the semantics are still being determined).

=begin table

 h  interpret integer as native "short" (typically int16)
 NYI l  interpret integer as native "long" (typically int32 or int64)
 NYI ll interpret integer as native "long long" (typically int64)
 NYI L  interpret integer as native "long long" (typically uint64)
 NYI q  interpret integer as native "quads" (typically int64 or larger)

=end table

Between the C<%> and the format letter, you may specify several
additional attributes controlling the interpretation of the format. In
order, these are:

B<format parameter index>

An explicit format parameter index, such as C<2$>. By default,
C<sprintf> will format the next unused argument in the list, but this
allows you to take the arguments out of order:

  sprintf '%2$d %1$d', 12, 34;      # OUTPUT: «34 12␤»
  sprintf '%3$d %d %1$d', 1, 2, 3;  # OUTPUT: «3 1 1␤»

B<flags>

One or more of:

=begin table
   space   prefix non-negative number with a space
   +       prefix non-negative number with a plus sign
   -       left-justify within the field
   0       use leading zeros, not spaces, for required padding
   #       ensure the leading "0" for any octal,
           prefix non-zero hexadecimal with "0x" or "0X",
           prefix non-zero binary with "0b" or "0B"
=end table

For example:

  sprintf '<% d>',  12;   # OUTPUT: «< 12>␤»
  sprintf '<% d>',   0;   # OUTPUT: «< 0>"»
  sprintf '<% d>', -12;   # OUTPUT: «<-12>␤»
  sprintf '<%+d>',  12;   # OUTPUT: «<+12>␤»
  sprintf '<%+d>',   0;   # OUTPUT: «<+0>"»
  sprintf '<%+d>', -12;   # OUTPUT: «<-12>␤»
  sprintf '<%6s>',  12;   # OUTPUT: «<    12>␤»
  sprintf '<%-6s>', 12;   # OUTPUT: «<12    >␤»
  sprintf '<%06s>', 12;   # OUTPUT: «<000012>␤»
  sprintf '<%#o>',  12;   # OUTPUT: «<014>␤»
  sprintf '<%#x>',  12;   # OUTPUT: «<0xc>␤»
  sprintf '<%#X>',  12;   # OUTPUT: «<0XC>␤»
  sprintf '<%#b>',  12;   # OUTPUT: «<0b1100>␤»
  sprintf '<%#B>',  12;   # OUTPUT: «<0B1100>␤»

When a space and a plus sign are given as the flags at once, the space
is ignored:

  sprintf '<%+ d>', 12;   # OUTPUT: «<+12>␤»
  sprintf '<% +d>', 12;   # OUTPUT: «<+12>␤»

When the C<#> flag and a precision are given in the C<%o> conversion, the
precision is incremented if it's necessary for the leading "0":

  sprintf '<%#.5o>', 0o12;     # OUTPUT: «<000012>␤»
  sprintf '<%#.5o>', 0o12345;  # OUTPUT: «<012345>␤»
  sprintf '<%#.0o>', 0;        # OUTPUT: «<>␤» # zero precision results in no output!

B<vector flag>

This flag tells Perl 6 to interpret the supplied string as a vector of
integers, one for each character in the string. Perl 6 applies the
format to each integer in turn, then joins the resulting strings with
a separator (a dot, C<'.'>, by default). This can be useful for
displaying ordinal values of characters in arbitrary strings:

=begin code :skip-test
  NYI sprintf "%vd", "AB\x{100}";           # "65.66.256"
=end code

You can also explicitly specify the argument number to use for the
join string using something like C<*2$v>; for example:

=begin code :skip-test
  NYI sprintf '%*4$vX %*4$vX %*4$vX',       # 3 IPv6 addresses
          @addr[1..3], ":";
=end code

B<(minimum) width>

Arguments are usually formatted to be only as wide as required to
display the given value. You can override the width by putting a
number here, or get the width from the next argument (with C<*> ) or
from a specified argument (e.g., with C<*2$>):

 sprintf "<%s>", "a";           # OUTPUT: «<a>␤»
 sprintf "<%6s>", "a";          # OUTPUT: «<     a>␤»
 sprintf "<%*s>", 6, "a";       # OUTPUT: «<     a>␤»
=begin code :skip-test
 NYI sprintf '<%*2$s>', "a", 6; # "<     a>"
=end code
 sprintf "<%2s>", "long";       # OUTPUT: «<long>␤» (does not truncate)

If a field width obtained through C<*> is negative, it has the same
effect as the C<-> flag: left-justification.

B<precision, or maximum width>

You can specify a precision (for numeric conversions) or a maximum
width (for string conversions) by specifying a C<.> followed by a
number. For floating-point formats, except C<g> and C<G>, this
specifies how many places right of the decimal point to show (the
default being 6). For example:

  # these examples are subject to system-specific variation
  sprintf '<%f>', 1;    # OUTPUT: «"<1.000000>"␤»
  sprintf '<%.1f>', 1;  # OUTPUT: «"<1.0>"␤»
  sprintf '<%.0f>', 1;  # OUTPUT: «"<1>"␤»
  sprintf '<%e>', 10;   # OUTPUT: «"<1.000000e+01>"␤»
  sprintf '<%.1e>', 10; # OUTPUT: «"<1.0e+01>"␤»

For "g" and "G", this specifies the maximum number of digits to show,
including those prior to the decimal point and those after it; for
example:

  # These examples are subject to system-specific variation.
  sprintf '<%g>', 1;        # OUTPUT: «<1>␤»
  sprintf '<%.10g>', 1;     # OUTPUT: «<1>␤»
  sprintf '<%g>', 100;      # OUTPUT: «<100>␤»
  sprintf '<%.1g>', 100;    # OUTPUT: «<1e+02>␤»
  sprintf '<%.2g>', 100.01; # OUTPUT: «<1e+02>␤»
  sprintf '<%.5g>', 100.01; # OUTPUT: «<100.01>␤»
  sprintf '<%.4g>', 100.01; # OUTPUT: «<100>␤»

For integer conversions, specifying a precision implies that the
output of the number itself should be zero-padded to this width, where
the C<0> flag is ignored:

(Note that this feature currently works for unsigned integer conversions, but not
for signed integer.)

  =begin code :skip-test
  NYI sprintf '<%.6d>', 1;      # <000001>
  NYI sprintf '<%+.6d>', 1;     # <+000001>
  NYI sprintf '<%-10.6d>', 1;   # <000001    >
  NYI sprintf '<%10.6d>', 1;    # <    000001>
  NYI sprintf '<%010.6d>', 1;   #     000001>
  NYI sprintf '<%+10.6d>', 1;   # <   +000001>
  sprintf '<%.6x>', 1;         # OUTPUT: «<000001>␤»
  sprintf '<%#.6x>', 1;        # OUTPUT: «<0x000001>␤»
  sprintf '<%-10.6x>', 1;      # OUTPUT: «<000001    >␤»
  sprintf '<%10.6x>', 1;       # OUTPUT: «<    000001>␤»
  sprintf '<%010.6x>', 1;      # OUTPUT: «<    000001>␤»
  sprintf '<%#10.6x>', 1;      # OUTPUT: «<  0x000001>␤»
  =end code

For string conversions, specifying a precision truncates the string to
fit the specified width:

  sprintf '<%.5s>', "truncated";   # OUTPUT: «<trunc>␤»
  sprintf '<%10.5s>', "truncated"; # OUTPUT: «<     trunc>␤»

You can also get the precision from the next argument using C<.*>, or
from a specified argument (e.g., with C<.*2$>):

  =begin code :skip-test
  sprintf '<%.6x>', 1;       # OUTPUT: «<000001>␤»
  sprintf '<%.*x>', 6, 1;    # OUTPUT: «<000001>␤»
  NYI sprintf '<%.*2$x>', 1, 6;  # "<000001>"
  NYI sprintf '<%6.*2$x>', 1, 4; # "<  0001>"
  =end code

If a precision obtained through C<*> is negative, it counts as having
no precision at all:

  sprintf '<%.*s>',  7, "string";   # OUTPUT: «<string>␤»
  sprintf '<%.*s>',  3, "string";   # OUTPUT: «<str>␤»
  sprintf '<%.*s>',  0, "string";   # OUTPUT: «<>␤»
  sprintf '<%.*s>', -1, "string";   # OUTPUT: «<string>␤»
  sprintf '<%.*d>',  1, 0;          # OUTPUT: «<0>␤»
  sprintf '<%.*d>',  0, 0;          # OUTPUT: «<>␤»
  sprintf '<%.*d>', -1, 0;          # OUTPUT: «<0>␤»

B<size>

For numeric conversions, you can specify the size to interpret the
number as using C<l>, C<h>, C<V>, C<q>, C<L>, or C<ll>. For integer
conversions (C<d> C<u> C<o> C<x> C<X> C<b> C<i> C<D> C<U> C<O>),
numbers are usually assumed to be whatever the default integer size is
on your platform (usually 32 or 64 bits), but you can override this to
use instead one of the standard C types, as supported by the compiler
used to build Perl 6:

(Note: None of the following have been implemented.)

=begin table

   hh          | interpret integer as C type "char" or "unsigned char"
   h           | interpret integer as C type "short" or "unsigned short"
   j           | interpret integer as C type "intmax_t", only with a C99 compiler (unportable)
   l           | interpret integer as C type "long" or "unsigned long"
   q, L, or ll | interpret integer as C type "long long", "unsigned long long", or "quad" (typically 64-bit integers)
   t           | interpret integer as C type "ptrdiff_t"
   z           | interpret integer as C type "size_t"
=end table

B<order of arguments>

Normally, C<sprintf> takes the next unused argument as the value to
format for each format specification. If the format specification uses
C<*> to require additional arguments, these are consumed from the
argument list in the order they appear in the format specification
before the value to format. Where an argument is specified by an
explicit index, this does not affect the normal order for the
arguments, even when the explicitly specified index would have been
the next argument.

So:

   my $a = 5; my $b = 2; my $c = 'net';
   sprintf "<%*.*s>", $a, $b, $c; # OUTPUT: «<   ne>␤»

uses C<$a> for the width, C<$b> for the precision, and C<$c> as the value to
format; while:

=for code :skip-test
  NYI sprintf '<%*1$.*s>', $a, $b;

would use C<$a> for the width and precision and C<$b> as the value to format.

Here are some more examples; be aware that when using an explicit
index, the C<$> may need escaping:

=for code :skip-test
 sprintf "%2\$d %d\n",      12, 34;     # OUTPUT: «34 12␤␤»
 sprintf "%2\$d %d %d\n",   12, 34;     # OUTPUT: «34 12 34␤␤»
 sprintf "%3\$d %d %d\n",   12, 34, 56; # OUTPUT: «56 12 34␤␤»
 NYI sprintf "%2\$*3\$d %d\n",  12, 34,  3; # " 34 12\n"
 NYI sprintf "%*1\$.*f\n",       4,  5, 10; # "5.0000\n"

=comment TODO: document effects of locale

Other examples:

=for code :skip-test
 NYI sprintf "%ld a big number", 4294967295;
 NYI sprintf "%%lld a bigger number", 4294967296;
 sprintf('%c', 97);                  # OUTPUT: «a␤»
 sprintf("%.2f", 1.969);             # OUTPUT: «1.97␤»
 sprintf("%+.3f", 3.141592);         # OUTPUT: «+3.142␤»
 sprintf('%2$d %1$d', 12, 34);       # OUTPUT: «34 12␤»
 sprintf("%x", 255);                 # OUTPUT: «ff␤»

Special case: 'sprintf("<b>%s</b>\n", "Perl 6")' will not work, but
one of the following will:

 sprintf Q:b "<b>%s</b>\n",  "Perl 6"; # OUTPUT: «<b>Perl 6</b>␤␤»
 sprintf     "<b>\%s</b>\n", "Perl 6"; # OUTPUT: «<b>Perl 6</b>␤␤»
 sprintf     "<b>%s\</b>\n", "Perl 6"; # OUTPUT: «<b>Perl 6</b>␤␤»

=head2 method starts-with

    multi method starts-with(Str:D: Str(Cool) $needle --> True:D)

Returns C<True> if the invocant is identical to or starts with C<$needle>.

    say "Hello, World".starts-with("Hello");     # OUTPUT: «True␤»
    say "https://perl6.org/".starts-with('ftp'); # OUTPUT: «False␤»

=head2 method ends-with

    multi method ends-with(Str:D: Str(Cool) $needle --> True:D)

Returns C<True> if the invocant is identical to or ends with C<$needle>.

    say "Hello, World".ends-with('Hello');      # OUTPUT: «False␤»
    say "Hello, World".ends-with('ld');         # OUTPUT: «True␤»

=head2 method subst

    multi method subst(Str:D: $matcher, $replacement, *%opts)

Returns the invocant string where C<$matcher> is replaced by C<$replacement>
(or the original string, if no match was found).

There is an in-place syntactic variant of C<subst> spelled
C<s/matcher/replacement/>.

C<$matcher> can be a L<Regex>, or a literal C<Str>. Non-Str matcher arguments
of type L<Cool> are coerced to C<Str> for literal matching.

    my $some-string = "Some foo";
    my $another-string = $some-string.subst(/foo/, "string"); # gives 'Some string'
    $some-string.=subst(/foo/, "string"); # in-place substitution. $some-string is now 'Some string'

The replacement can be a closure:

    my $i = 41;
    my $str = "The answer is secret.";
    my $real-answer = $str.subst(/secret/, {++$i}); # The answer to everything

Here are other examples of usage:

    my $str = "Hey foo foo foo";
    $str.subst(/foo/, "bar", :g); # global substitution - returns Hey bar bar bar

    $str.subst(/foo/, "no subst", :x(0)); # targeted substitution. Number of times to substitute. Returns back unmodified.
    $str.subst(/foo/, "bar", :x(1)); #replace just the first occurrence.

    $str.subst(/foo/, "bar", :nth(3)); # replace nth match alone. Replaces the third foo. Returns Hey foo foo bar

The C<:nth> adverb has readable English-looking variants:

    say 'ooooo'.subst: 'o', 'x', :1st; # OUTPUT: «xoooo␤»
    say 'ooooo'.subst: 'o', 'x', :2nd; # OUTPUT: «oxooo␤»
    say 'ooooo'.subst: 'o', 'x', :3rd; # OUTPUT: «ooxoo␤»
    say 'ooooo'.subst: 'o', 'x', :4th; # OUTPUT: «oooxo␤»

The following adverbs are supported

=begin table

    short                       long        meaning
    =====                       ====        =======
    :g                          :global     tries to match as often as possible
    :nth(Int|Callable|Whatever)             only substitute the nth match; aliases: :st, :nd, :rd, and :th
    :ss                         :samespace  preserves whitespace on substitution
    :ii                         :samecase   preserves case on substitution
    :mm                         :samemark   preserves character marks (e.g. 'ü' replaced with 'o' will result in 'ö')
    :x(Int|Range|Whatever)                  substitute exactly $x matches

=end table

Note that only in the C<s///> form C<:ii> implies C<:i> and C<:ss> implies
C<:s>. In the method form, the C<:s> and C<:i> modifiers must be added to the
regex, not the C<subst> method call.

=head2 method subst-mutate

Where C<subst> returns the modified string and leaves the original
unchanged, it is possible to mutate the original string by using
C<subst-mutate>. If the match is successful, the method returns a C<Match>
object representing the successful match; if C<:g> (or C<:global>) argument
is used, returns a C<List> of C<Match> objects. If no matches happen,
returns C<Any>.

    my $some-string = "Some foo";
    my $match = $some-string.subst-mutate(/foo/, "string");
    say $some-string;  # OUTPUT: «Some string␤»
    say $match;        # OUTPUT: «｢foo｣␤»
    $some-string.subst-mutate(/<[oe]>/, '', :g); # remove every o and e, notice the :g named argument from .subst

=head2 routine substr

    multi sub    substr(Str:D $s, Int:D $from, Int:D $chars = $s.chars - $from --> Str:D)
    multi sub    substr(Str:D $s, Range $from-to --> Str:D)
    multi method substr(Str:D $s: Int:D $from, Int:D $chars = $s.chars - $from --> Str:D)
    multi method substr(Str:D $s: Range $from-to --> Str:D)

Returns a part of the string, starting from the character with index C<$from>
(where the first character has index 0) and with length C<$chars>.  If a range is
specified, its first and last indices are used to determine the size of the substring.

Examples:

    substr("Long string", 6, 3);     # RESULT: «tri»
    substr("Long string", 6);        # RESULT: «tring»
    substr("Long string", 6, *-1);   # RESULT: «trin»
    substr("Long string", *-3, *-1); # RESULT: «in»

=head2 method substr-eq

    multi method substr-eq(Str:D:  Str(Cool) $test-string, Int(Cool) $from --> Bool)
    multi method substr-eq(Cool:D: Str(Cool) $test-string, Int(Cool) $from --> Bool)

Returns C<True> if the C<$test-string> exactly matches the C<String> object,
starting from the given initial index C<$from>.  For example, beginning with
the string C<"foobar">, the substring C<"bar"> will match from index 3:

    my $string = "foobar";
    say $string.substr-eq("bar", 3);    # OUTPUT: «True␤»

However, the substring C<"barz"> starting from index 3 won't match even
though the first three letters of the substring do match:

    my $string = "foobar";
    say $string.substr-eq("barz", 3);   # OUTPUT: «False␤»

Naturally, to match the entire string, one merely matches from index 0:

    my $string = "foobar";
    say $string.substr-eq("foobar", 0); # OUTPUT: «True␤»

Since this method is inherited from the C<Cool> type, it also works on
integers.  Thus the integer C<42> will match the value C<342> starting from
index 1:

    my $integer = 342;
    say $integer.substr-eq(42, 1);      # OUTPUT: «True␤»

As expected, one can match the entire value by starting at index 0:

    my $integer = 342;
    say $integer.substr-eq(342, 0);     # OUTPUT: «True␤»

Also using a different value or an incorrect starting index won't match:

    my $integer = 342;
    say $integer.substr-eq(42, 3);      # OUTPUT: «False␤»
    say $integer.substr-eq(7342, 0);    # OUTPUT: «False␤»

=head2 method substr-rw

    method substr-rw($from, $length?)

A version of C<substr> that returns a L<Proxy|/type/Proxy> functioning as a
writable reference to a part of a string variable. Its first argument, C<$from>
specifies the index in the string from which a substitution should occur, and
its last argument, C<$length> specifies how many characters are to be replaced.

For example, in its method form, if one wants to take the string C<"abc">
and replace the second character (at index 1) with the letter C<"z">, then
one does this:

    my $string = "abc";
    $string.substr-rw(1, 1) = "z";
    $string.say;                         # OUTPUT: «azc␤»

Note that new characters can be inserted as well:

    my $string = 'azc';
    $string.substr-rw(2, 0) = "-Zorro-"; # insert new characters BEFORE the character at index 2
    $string.say;                         # OUTPUT: «az-Zorro-c␤»

C<substr-rw> also has a function form, so the above examples can also be
written like so:

    my $string = "abc";
    substr-rw($string, 1, 1) = "z";
    $string.say;                          # OUTPUT: «azc␤»
    substr-rw($string, 2, 0) = "-Zorro-";
    $string.say;                          # OUTPUT: «az-Zorro-c␤»

It is also possible to alias the writable reference returned by C<substr-rw>
for repeated operations:

    my $string = "A character in the 'Flintstones' is: barney";
    $string ~~ /(barney)/;
    my $ref := substr-rw($string, $0.from, $0.to);
    $string.say;
    # OUTPUT: «A character in the 'Flintstones' is: barney␤»
    $ref = "fred";
    $string.say;
    # OUTPUT: «A character in the 'Flintstones' is: fred␤»
    $ref = "wilma";
    $string.say;
    # OUTPUT: «A character in the 'Flintstones' is: wilma␤»

Notice that the start position and length of string to replace has been
specified via the C<.from> and C<.to> methods on the C<Match> object, C<$0>.
It is thus not necessary to count characters in order to replace a
substring, hence making the code more flexible.

=head2 routine samemark

    multi sub samemark(Str:D $string, Str:D $pattern --> Str:D)
    method    samemark(Str:D: Str:D $pattern --> Str:D)

Returns a copy of C<$string> with the mark/accent information for each
character changed such that it matches the mark/accent of the corresponding
character in C<$pattern>. If C<$string> is longer than C<$pattern>, the
remaining characters in C<$string> receive the same mark/accent as the last
character in C<$pattern>. If C<$pattern> is empty no changes will be made.

Examples:

    say 'åäö'.samemark('aäo');                        # OUTPUT: «aäo␤»
    say 'åäö'.samemark('a');                          # OUTPUT: «aao␤»

    say samemark('Pêrl', 'a');                        # OUTPUT: «Perl␤»
    say samemark('aöä', '');                          # OUTPUT: «aöä␤»

=head2 method succ

    method succ(Str:D --> Str:D)

Returns the string incremented by one.

String increment is "magical". It searches for the last alphanumeric
sequence that is not preceded by a dot, and increments it.

    '12.34'.succ;      # RESULT: «13.34»
    'img001.png'.succ; # RESULT: «img002.png»

The actual increment step works by mapping the last alphanumeric
character to a character range it belongs to, and choosing the next
character in that range, carrying to the previous letter on overflow.

    'aa'.succ;   # RESULT: «ab»
    'az'.succ;   # RESULT: «ba»
    '109'.succ;  # RESULT: «110»
    'α'.succ;    # RESULT: «β»
    'a9'.succ;   # RESULT: «b0»

String increment is Unicode-aware, and generally works for scripts where a
character can be uniquely classified as belonging to one range of characters.

=head2 method pred

    method pred(Str:D: --> Str:D)

Returns the string decremented by one.

String decrementing is "magical" just like string increment (see
L<succ>). It fails on underflow

=for code :skip-test
'b0'.pred;           # RESULT: «a9»
'a0'.pred;           # Failure
'img002.png'.pred;   # RESULT: «img001.png»

=head2 routine ord

    multi sub    ord(Str:D  --> Int:D)
    multi method ord(Str:D: --> Int:D)

Returns the codepoint number of the base characters of the first grapheme
in the string.

Example:

    ord("A"); # 65
    "«".ord;  # 171

=head2 method ords

    multi method ords(Str:D: --> Positional)

Returns a list of Unicode codepoint numbers that describe the codepoints making up the string.

Example:

    "aå«".ords; # (97 229 171)

Strings are represented as graphemes. If a character in the string is represented by multiple
codepoints, then all of those codepoints will appear in the result of C<ords>. Therefore, the
number of elements in the result may not always be equal to L<chars>, but will be equal to
L<codes>.

The codepoints returned will represent the string in L<NFC>. See the L<NFD>, L<NFKC>, and
C<NFKD> methods if other forms are required.

=head2 method trans

    multi method trans(Str:D: Pair:D \what, *%n --> Str)
    multi method trans(Str:D: *@changes, :complement(:$c), :squash(:$s), :delete(:$d) --> Str)

Replaces one or many characters with one or many characters. Ranges are
supported, both for keys and values. Regexes work as keys. In case a list of
keys and values is used, substrings can be replaced as well. When called with
C<:complement> anything but the matched value or range is replaced with a
single value. With C<:delete> the matched characters are removed.  Combining
C<:complement> and C<:delete> will remove anything but the matched values.  The
adverb C<:squash> will reduce repeated matched characters to a single
character.

Example:

    my $str = 'say $x<b> && $y<a>';
    $str.=trans( '<' => '«' );
    $str.=trans( '<' => '«', '>' => '»' );

    $str.=trans( [ '<'   , '>'   , '&' ] =>
                 [ '&lt;', '&gt;', '&amp;' ]);

    $str.=trans( ['a'..'y'] => ['A'..'z'] );

    "abcdefghij".trans(/<[aeiou]> \w/ => '');                     # RESULT: «cdgh»

    "a123b123c".trans(['a'..'z'] => 'x', :complement);            # RESULT: «axxxbxxxc»
    "a123b123c".trans('23' => '', :delete);                       # RESULT: «a1b1c»
    "aaa1123bb123c".trans('a'..'z' => 'A'..'Z', :squash);         # RESULT: «A1123B123C»
    "aaa1123bb123c".trans('a'..'z' => 'x', :complement, :squash); # RESULT: «aaaxbbxc»

=head2 method indent

    multi method indent(Int $steps where { $_ == 0 } )
    multi method indent(Int $steps where { $_ > 0  } )
    multi method indent($steps where { .isa(Whatever) || .isa(Int) && $_ < 0 } )

Indents each line of the string by C<$steps>. If C<$steps> is negative,
it outdents instead. If C<$steps> is L<C<*>|*>, then the string is
outdented to the margin:

    "  indented by 2 spaces\n    indented even more".indent(*)
        eq "indented by 2 spaces\n  indented even more"

=head2 method trim

    method trim(Str:D: --> Str)

Remove leading and trailing whitespace. It can be use both as a method
on strings and as a function. When used as a method it will return
the trimmed string. In order to do in-place trimming, once needs to write
C<.=trim>


    my $line = '   hello world    ';
    say '<' ~ $line.trim ~ '>';        # OUTPUT: «<hello world>␤»
    say '<' ~ trim($line) ~ '>';       # OUTPUT: «<hello world>␤»
    $line.trim;
    say '<' ~ $line ~ '>';             # OUTPUT: «<   hello world    >␤»
    $line.=trim;
    say '<' ~ $line ~ '>';             # OUTPUT: «<hello world>␤»

See also L<trim-trailing> and L<trim-leading>

=head2 method trim-trailing

    method trim-trailing(Str:D: --> Str)

Remove the whitespace characters from the end of a string. See also L<trim>.

=head2 method trim-leading

    method trim-leading(Str:D: --> Str)

Remove the whitespace characters from the beginning of a string. See also L<trim>.

=head2 method NFC

    method NFC(Str:D: --> NFC:D)

Returns a codepoint string in L<NFC|/type/NFC> format (Unicode Normalization
Form C / Composed).

=head2 method NFD

    method NFD(Str:D: --> NFD:D)

Returns a codepoint string in L<NFD|/type/NFD> format (Unicode Normalization
Form D / Decomposed).

=head2 method NFKC

    method NFKC(Str:D: --> NFKC:D)

Returns a codepoint string in L<NFKC|/type/NFKC> format (Unicode Normalization
Form KC / Compatibility Composed).

=head2 method NFKD

    method NFKD(Str:D: --> NFKD:D)

Returns a codepoint string in L<NFKD|/type/NFKD> format (Unicode Normalization
Form KD / Compatibility Decomposed).

=head2 method ACCEPTS

    multi method ACCEPTS(Str:D: $other)

Returns C<True> if the string is L<the same as|eq> C<$other>.

=head2 sub val

    multi sub val(Str:D $MAYBEVAL, :$val-or-fail)

Given a C<Str> that may be parsable as a numeric value, it will
attempt to construct the appropriate L<allomorph|/language/glossary#Allomorph>,
returning one of L<IntStr|/type/IntStr>, L<NumStr|/type/NumStr>, L<RatStr|/type/RatStr>
or L<ComplexStr|/type/ComplexStr> or a plain C<Str> if a numeric value cannot
be parsed.  If the C<:val-or-fail> adverb is provided it will return an
L<X::Str::Numeric|/type/X::Str::Numeric> rather than the original string if it
cannot parse the string as a number.

    say val("42").WHAT;    # OUTPUT: «(IntStr)␤»
    say val("42e0").WHAT;  # OUTPUT: «(NumStr)␤»
    say val("42.0").WHAT;  # OUTPUT: «(RatStr)␤»
    say val("42+0i").WHAT; # OUTPUT: «(ComplexStr)␤»

=end pod

# vim: expandtab shiftwidth=4 ft=perl6