DRAFT: Synopsis 32: Setting Library - Str
Created: 19 Mar 2009 (extracted from S29-functions.pod)
Last Modified: 2015-07-24
Version: 13
The document is a draft.
General notes about strings:
The Str
class contains strings encoded at the NFG level. Other standard Unicode normalizations can be found in their appropriately-named types: NFC
, NFD
, NFKC
, and NFKD
. The Uni
type contains a string in a mixture of normalizations (i.e. not normalized). S15 describes these in more detail.
The following are all provided by the Str
class, as well as related classes:
- chop
-
multi method chop(Str $string: $n = 1 --> Str) is export
Returns string with an optional number of characters removed from the end. Defaults to removing one character.
- chomp
-
multi method chomp(Str $string: --> Str) is export
Returns string with one newline removed from the end. An arbitrary terminator can be removed if the input filehandle has marked the string for where the "newline" begins. (Presumably this is stored as a property of the string.) Otherwise a standard newline is removed.
Note: Most users should just let their I/O handles autochomp instead. (Autochomping is the default.)
- lc
-
multi method lc(Str $string: --> Str) is export
Returns the input string after forcing each character to its lowercase form. Note that one-to-one mapping is not in general guaranteed; different forms may be chosen according to context.
- uc
-
multi method uc(Str $string: --> Str) is export
Returns the input string after forcing each character to its uppercase (not titlecase) form. Note that one-to-one mapping is not in general guaranteed; different forms may be chosen according to context.
- fc
-
multi method fc(Str $string: --> Str) is export
Does a Unicode "fold case" operation suitable for doing caseless string comparisons. (In general, the returned string is unlikely to be useful for any purpose other than comparison.)
- tc
-
multi method tc(Str $string: --> Str) is export
Converts the first character of a string to titlecase form, leaving the rest of the characters unchanged, then returns the modified string. If there is no titlecase mapping for the first character, the entire string is returned unchanged. In any case, this function never changes any character after the first. (It is like the old Perl 5
ucfirst
function in that respect.) - tclc
-
multi method tclc(Str $string: --> Str) is export
Forces the first character of a string to titlecase and the rest of the characters to lowercase, then returns the modified string.
- wordcase
-
multi method wordcase(Str $string: :&filter = &tclc, :$where = True --> Str) is export
Performs a substitutional mapping of each word in the string, defaulting to the
tclc
mapping. Words are defined as Perl 6 identifiers, hence admit hyphens and apostrophes when followed by a letter. (Note that trailing apostrophes don't matter when casemapping.) The following should have the same result:.wordcase; .subst(:g, / <ident>+ % <[ \- ' ]> /, *.Str.tclc)
The
filter
function is always applied to the first and last word, and additionally to any intermediate word that smartmatches with thewhere
parameter. Assuming suitable definitions of word lists, standard English capitalization might be handled with something like this:my $where = none map *.fc, @conjunctions, @prepositions; .wordcase(:$where);
(Note that the "standard" authorities disagree on the prepositions!)
[XXX: Is case-insensitive matching on
wordcase
's part necessary?] The smartmatching is done case insensitively, so you should store your exceptions infc
form. If thewhere
smartmatch does not match, then the word will be forced to lowercase.There is no provision for an alternate regex; if you need a custom word recognizer, you can write your own
.subst
as above. - samecase
-
multi method samecase(Str $string: Str $pattern --> Str) is export
Has the effect of making the case of the string match the case pattern in
$pattern
. (Used by s:ii/// internally, see S05.) - samemark
-
multi method samemark(Str $string: Str $pattern --> Str) is export
Has the effect of making the case of the string match the marking pattern in
$pattern
. (Used by s:mm/// internally, see S05.) - length
-
This method does not exist in Perl 6. You must use either
chars
orcodes
, depending on what kind of count you need. - chars
-
multi method chars(Str $string: --> Int) is export
Returns the number of characters in the string. For
Str
this corresponds to the number of graphemes, for other types this is equivalent tocodes
. - codes
-
multi method codes(Str $string: --> Int) is export
Returns the number of codepoints in the string. For
Str
this corresponds to the number of characters as if it were anNFC
type string. - bytes
-
Gone. Use
$str.encode($encoding).bytes
instead. - encode
-
multi method encode($encoding = $?ENC --> Buf)
Returns a
Blob
which represents the original string in the given encoding. The actual return type is as specific as possible, so$str.encode('UTF-8')
returns autf8
object,$str.encode('ISO-8859-1')
ablob8
.Str.encode
is functionally equivalent toNFC.encode
. If you mean one of the other normalization forms, convert theStr
to the appropriate type first. - index
-
multi method index(Str $string: Str $substring, Int $pos) is export
index
searches for the first occurrence of$substring
in$string
, starting at$pos
.If the substring is found, then the value returned represents the position of the first character of the substring. If the substring is not found,
Nil
is returned. Do not evaluate it as a number, because that will assume <0> and issue a warning.[Note: if
$substring
is not of the same string type as$string
, should that cause an error, or should$substring
be converted to$string
's type?] - pack
-
multi pack(*@items where { all(@items) ~~ Pair } --> buf8) multi pack(Str $template, *@items --> buf8)
pack
takes a list of pairs and formats the values according to the specification of the keys. Alternately, it takes a string$template
and formats the rest of its arguments according to the specifications in the template string. The result is a sequence of bytes.Templates are strings of the form:
grammar Str::PackTemplate { regex TOP { ^ <template> $ } regex template { [ <group> | <specifier> <count>? ]* } token group { \( <template> \) } token specifier { <[aAZbBhHcCsSiIlLnNvVqQjJfdFDpPuUwxX\@]> \!? } token count { \* | \[ [ \d+ | <specifier> ] \] | \d+ } }
In the pairwise mode, each key must contain a single
<group>
or<specifier>
, and the values must be either scalar arguments or arrays.[ Note: Need more documentation and need to figure out what Perl 5 things no longer make sense. Does Perl 6 need any extra formatting features? -ajs ]
[I think pack formats should be human readable but compiled to an internal form for efficiency. I also think that compact classes should be able to express their serialization in pack form if asked for it with .packformat or some such. -law]
- rindex
-
multi method rindex(Str $string: Str $substring, Int $pos) is export
Returns the position of the last
$substring
in$string
. If$pos
is specified, then the search starts at that location in$string
, and works backwards. Seeindex
for more detail. - split
-
multi sub split(Str $delimiter, Str $input, Int $limit = Inf, Bool :$all = False --> List) multi sub split(Regex $delimiter, Str $input, Int $limit = Inf, Bool :$all = False --> List) multi method split(Str $input: Str $delimiter, Int $limit = Inf, Bool :$all = False --> List) multi method split(Str $input: Regex $delimiter, Int $limit = Inf, Bool :$all = False --> List)
Splits a string up into pieces based on delimiters found in the string.
Delimiters can be specified as either a
Regex
or a constant string type. Thesplit
function no longer has a default delimiter nor a default invocant. In general you should usewords
to split on whitespace now, orcomb
to break into individual characters. (See below.)If the
:all
adverb is supplied to the string delimiter form, the delimiter will be returned in alternation with the split values. InRegex
delimiter form, the delimiters are returned asMatch
objects in alternation with the split values. Unlike with Perl 5, if the delimiter contains multiple captures they are returned as submatches of singleMatch
object. (And sinceMatch
doesCapture
, whether theseMatch
objects eventually flatten or not depends on whether the expression is bound into a list or slice context.)You may also split lists and filehandles.
$*ARGS.split(/\n[\h*\n]+/)
splits on paragraphs, for instance. Lists and filehandles are automatically fed throughcat
in order to pretend to be string. The resultingCat
is lazy. Accessing a filehandle as both a filehandle and as aCat
is undefined. - comb
-
multi sub comb(Str $matcher, Str $input, Int $limit = Inf, Bool :$match --> List) multi sub comb(Regex $matcher, Str $input, Int $limit = Inf, Bool :$match --> List) multi method comb(Str $input: Str $matcher, Int $limit = Inf, Bool :$match --> List) multi method comb(Str $input: Regex $matcher = /./, Int $limit = Inf, Bool :$match --> List)
The
comb
function looks through a string for the interesting bits, ignoring the parts that don't match. In other words, it's a version of split where you specify what you want, not what you don't want.That means the same restrictions apply to the matcher rule as do to split's delimiter rule.
By default it pulls out all individual characters. Saying
$string.comb(/pat/, $n)
is equivalent to
map {.Str}, $string.match(rx:global:x(0..$n):c/pat/)
You may also comb lists and filehandles.
+$*IN.comb
counts the characters on standard input, for instance.comb(/./, $thing)
returns a list of single character strings from anything that can give you aStr
. Lists and filehandles are automatically fed throughcat
in order to pretend to be string. ThisCat
is also lazy.If the
:match
adverb is applied, a list ofMatch
objects (one per match) is returned instead of strings. This can be used to access capturing subrules in the matcher. The unmatched portions are never returned -- if you want that, usesplit(:all)
. If the function is combing a lazy structure, the return values may also be lazy. (Strings are not lazy, however.) - lines
-
multi method lines(Str $input: Int $limit = Inf --> List) is export
Returns a list of lines, i.e. the same as a call to
$input.comb(/ ^^ \N* /, $limit)
would. - words
-
multi method words(Str $input: Int $limit = Inf --> List) is export
Returns a list of non-whitespace bits, i.e. the same as a call to
$input.comb(/ \S+ /, $limit)
would. - flip
-
The
flip
function reverses a string character by character.multi method flip(Str $str: --> Str) is export
This method will misplace combining characters on non-
Str
types. - sprintf
-
multi method sprintf(Str $format: *@args --> Str) is export
This function is mostly identical to the C library sprintf function.
The
$format
is scanned for%
characters. Any%
introduces a format token. Format tokens have the following grammar:grammar Str::SprintfFormat { regex format_token { '%': ['%' | <index>? <precision>? <directive>] } token index { \d+ '$' } token precision { <flags>? <vector>? <precision_count> } token flags { <[ \x20 + 0 \# \- ]>+ } token precision_count { [ <[1..9]>\d* | '*' ]? [ '.' [ \d* | '*' ] ]? } token vector { '*'? v } token directive { <[csduoxefgXEGbpniDUOF]> } }
Directives guide the use (if any) of the arguments. When a directive (other than
%
) is used, it indicates how the next argument passed is to be formatted into the string.The directives are:
% a literal percent sign (must be literally '%%') c a character with the given codepoint s a string d an integer, in decimal b an integer, in binary o an integer, in octal x an integer, in hexadecimal X like x, but using uppercase letters e a floating-point number, in scientific notation f a floating-point number, in fixed decimal notation g a floating-point number, in %e or %f notation E like e, but using an uppercase "E" G like g, but with an uppercase "E" (if applicable)
Compatibility:
i a synonym for %d u a synonym for %d D a synonym for %d U a synonym for %u O a synonym for %o F a synonym for %f
Perl 5 (non-)compatibility:
n produces a runtime exception p produces a runtime exception
- fmt
-
multi method fmt(Scalar $scalar: Str $format = '%s' --> Str) multi method fmt(List $list: Str $format = '%s', Str $separator = ' ' --> Str) multi method fmt(Hash $hash: Str $format = "%s\t%s", Str $separator = "\n" --> Str) multi method fmt(Pair $pair: Str $format = "%s\t%s" --> Str)
A set of wrappers around
sprintf
. A call to the scalar version$o.fmt($format)
returns the result ofsprintf($format, $o)
. A call to the list version@a.fmt($format, $sep)
returns the result of@a.map({ sprintf($format, $_) }).join($sep)
. A call to the hash version%h.fmt($format, $sep)
returns the result of%h.pairs.map({ sprintf($format, $_.key, $_.value) }).join($sep)
. A call to the pair version$p.fmt($format)
returns the result ofsprintf($format, $p.key, $p.value)
. - substr
-
multi sub substr(Str $string, Int $start, Int $length? --> Str) is export multi sub substr(Str $string, &start, Int $length? --> Str) is export multi sub substr(Str $string, Int $start, &end --> Str) is export multi sub substr(Str $string, &start, &end --> Str) is export multi sub substr(Str $string, Range $start-end --> Str) is export multi method substr(Str $string: Int $start, Int $length? --> Str) is export multi method substr(Str $string: &start, Int $length? --> Str) is export multi method substr(Str $string: Int $start, &end --> Str) is export multi method substr(Str $string: &start, &end --> Str) is export multi method substr(Str $string: Range $start-end --> Str) is export
substr
returns a substring of$string
between the given points. The first character can be specified as either an integer or aCallable
taking the length of the string as its only argument. The endpoint can be specified by either anInt
specifying the length of the substring, or aCallable
taking the length of the string as its only argument and returning the last character to take. The bounds of the substring can be specified by aRange
instead.If the specified length or endpoint goes past the end of the string, or if no endpoint is specified, the rest of the string from the starting point will be returned.
Here is an example of its use:
$initials = substr($first_name,0,1) ~ substr($last_name,0,1);
The function fails if the start position and/or length is negative or undefined. (If the length argument is not given, it defaults to the rest of the string.) Either of start position or end position may be specified relative to the end of the string using a
WhateverCode
whose argument will be the position of the end of the string. While it is illegal for the start position to be outside of the string, it is allowed for the final position to be off the end of the string. - substr-rw
-
multi sub substr-rw(Str $string, Int $start, Int $length? --> Str) is rw is export multi sub substr-rw(Str $string, &start, Int $length? --> Str) is rw is export multi sub substr-rw(Str $string, Int $start, &end --> Str) is rw is export multi sub substr-rw(Str $string, &start, &end --> Str) is rw is export multi sub substr-rw(Str $string, Range $start-end --> Str) is rw is export multi method substr-rw(Str $string: Int $start, Int $length? --> Str) is rw is export multi method substr-rw(Str $string: &start, Int $length? --> Str) is rw is export multi method substr-rw(Str $string: Int $start, &end --> Str) is rw is export multi method substr-rw(Str $string: &start, &end --> Str) is rw is export multi method substr-rw(Str $string: Range $start-end --> Str) is rw is export
A version of
substr
that returns a writable reference to a part of a string variable:my $string = "one of the characters in the Flinstones is: barney"; $string ~~ /(barney)/; substr-rw($string, $0.from, $0.to) = "fred";
This writable reference can be the target of an alias, for repeated operations:
my $r := substr-rw($string, $0.from, $0.to); $r = "fred"; # "barney" replaced by "fred" $r = "wilma"; # "fred" replaced by "wilma"
Please note that only the start point is kept by the reference: any changes to the length of the string before the start point, will render the reference useless. So it is probably safest to keep only one writable reference per string, or make sure that all replacement strings have the same size.
- trim
-
multi method trim() is export; multi method trim-leading() is export; multi method trim-trailing() is export;
The
trim
method returns a copy of the string with leading and trailing whitespace removed. The methodstrim-leading
andtrim-trailing
are similar, but with only leading or trailing whitespace removed, respectively. - unpack
-
XXX To be defined
- match
-
method match(Str $self: Regex $search, *%adverbs --> Match) is export
Returns the result of checking the given string against
$search
. See S05 for details. - subst
-
method subst(Str $self: Regex $search, Str $replacement, *%adverbs --> Str) is export
Returns a string with the portion of the string matching
$search
being replaced with$replacement
. See S05 for details. - trans
-
method trans(Str $self: *@changes where { all(@changes) ~~ Pair }, *%adverbs --> Str) is export;
Takes a list of
Pair
s and replaces each occurence of aPair
's key with its respective value. See S05 for details. - indent
-
multi method indent($str: Int() $steps --> Str) is export multi method indent($str: Whatever $steps --> Str) is export
Returns a re-indented string wherein
$steps
number of spaces have been added to each line. If a line already begins with horizontal whitespace, the new spaces are added to the end of those.If the whitespace at the beginning of the line consists of only
\x20
spaces,\x20
spaces are added as indentation as well. If the whitespace at the beginning of the line consists of some other kind of horizontal whitespace, that kind of whitespace is added as indentation. If the whitespace at the beginning of the line consists of two or more different kinds of horizontal whitespace, again\x20
spaces are used.If
$steps
is negative, removes that many spaces instead. Should any line contain too few leading spaces, only those are removed and a warning is issued. At most one such warning is issued per.indent
call.If
$steps
is*
, removes just enough indentation to make some line have zero indentation.Empty lines don't participate in re-indenting at all. That is, a line with 0 characters will still have 0 characters after the call. It also will not cause a warning to be issued.
The method will assume hard tabs to be equivalent to
($?TABSTOP // 8)
spaces, and will treat any other horizontal whitespace character as equivalent to one\x20
space. If the indenting doesn't "add up evenly", one hard tab needs to be exploded into the equivalent number of spaces before the unindenting of that line.Decisions on how to indent each line are based solely on characters on that line. Thus, an
.indent
call on a multiline string therefore amounts to.lines».indent.join("\n")
, modulo exotic line endings in the original string, and the proviso about empty lines. - IO
-
method IO(--> IO::Path) is export
Returns an IO::Path, using the string as the file path.
- path
-
method path(--> IO::Path) is export
A deprecated form of
IO
. - succ
-
method succ(--> Str) is export
Increments the string to the next numeric or alphabetic value, and returns the resulting string. The autoincrement operator
++
usessucc
to determine the new value.The last portion of the string before the first period (which may be the entire string) is incremented, using
<rangechar>
to determine which characters are eligible to be incremented. See "Autoincrement precedence" in S03 for details. - pred
-
method pred(--> Str) is export
Decrements the string to the next numeric or alphabetic value, and returns the resulting string. The autodecrement operator
--
usespred
to determine the new value.When attempting to decrement a string, such as
"a0"
, where the result would remove the leftmost characters,pred
returns failure instead.The last portion of the string before the first period (which may be the entire string) is incremented, using
<rangechar>
to determine which characters are eligible to be incremented. See "Autoincrement precedence" in S03 for details.
Rod Adams <rod@rodadams.net>
Larry Wall <larry@wall.org>
Aaron Sherman <ajs@ajs.com>
Mark Stosberg <mark@summersault.com>
Carl Mäsak <cmasak@gmail.com>
Moritz Lenz <moritz@faui2k3.org>
Tim Nelson <wayland@wayland.id.au>
Brent Laabs <bslaabs@gmail.com>