/
Str.pod6
1161 lines (812 loc) · 42.2 KB
/
Str.pod6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
=begin pod
=TITLE class Str
=SUBTITLE String of characters
class Str is Cool does Stringy { }
Built-in class for strings. Objects of type C<Str> are immutable.
=head1 Methods
=head2 routine chop
multi sub chop(Str:D --> Str:D)
multi method chop(Str:D: $chars = 1 --> Str:D)
Returns the string with C<$chars> characters removed from the end.
=head2 routine chomp
Defined as:
multi sub chomp(Str:D --> Str:D)
multi method chomp(Str:D: --> Str:D)
Returns the string with a logical newline (any codepoint that has the
C<NEWLINE> property) removed from the end.
Examples:
say chomp("abc\n"); # OUTPUT: «abc»
say "def\r\n".chomp; # OUTPUT: «def» NOTE: \r\n is a single grapheme!
say "foo\r".chomp; # OUTPUT: «foo»
=head2 routine lc
Defined as:
multi sub lc(Str:D --> Str:D)
multi method lc(Str:D: --> Str:D)
Returns a lower-case version of the string.
Examples:
lc("A"); # RESULT: «"a"»
"A".lc; # RESULT: «"a"»
=head2 routine uc
multi sub uc(Str:D --> Str:D)
multi method uc(Str:D: --> Str:D)
Returns an uppercase version of the string.
=head2 routine fc
multi sub fc(Str:D --> Str:D)
multi method fc(Str:D: --> Str:D)
Does a Unicode "fold case" operation suitable for doing caseless
string comparisons. (In general, the returned string is unlikely to
be useful for any purpose other than comparison.)
=head2 routine tc
multi sub tc(Str:D --> Str:D)
multi method tc(Str:D: --> Str:D)
Does a Unicode "titlecase" operation, that is changes the first character in
the string to title case, or to upper case if the character has no title case
mapping
=head2 routine tclc
multi sub tclc(Str:D --> Str:D)
multi method tclc(Str:D: --> Str:D)
Turns the first character to title case, and all other characters to lower
case
=head2 routine wordcase
=for code
multi sub wordcase(Cool $x --> Str)
multi sub wordcase(Str:D $x --> Str)
multi method wordcase(Str:D: :&filter = &tclc, Mu :$where = True --> Str)
Returns a string in which C<&filter> has been applied to all the words
that match C<$where>. By default, this means that the first letter of
every word is capitalized, and all the other letters lowercased.
=head2 method unival
multi method unival(Str:D --> Numeric)
Returns the numeric value that the first codepoint in the invocant represents,
or C<NaN> if it's not numeric.
say '4'.unival; # OUTPUT: «4»
say '¾'.unival; # OUTPUT: «0.75»
say 'a'.unival; # OUTPUT: «NaN»
=head2 method univals
multi method univals(Str:D --> List)
Returns a list of numeric values represented by each codepoint in the invocant
string, and C<NaN> for non-numeric characters.
say "4a¾".univals; # OUTPUT: «(4 NaN 0.75)»
=head2 routine chars
multi sub chars(Cool $x --> Int:D)
multi sub chars(Str:D $x --> Int:D)
multi sub chars(str $x --> int)
multi method chars(Str:D: --> Int:D)
Returns the number of characters in the string in graphemes. On the JVM, this
currently erroneously returns the number of codepoints instead.
=head2 method encode
multi method encode(Str:D: $encoding, $nf --> Blob)
Returns a L<Blob> which represents the original string in the given encoding
and normal form. The actual return type is as specific as possible, so
C<$str.encode('UTF-8')> returns a C<utf8> object,
C<$str.encode('ISO-8859-1')> a C<buf8>.
=head2 routine index
multi sub index(Cool $s, Str:D $needle, Cool $startpos = 0 --> Int)
multi method index(Cool $needle, Cool $startpos = 0 --> Int)
Searches for C<$needle> in the string starting from C<$startpos>. It returns
the offset into the string where C<$needle> was found, and an undefined value
if it was not found.
Examples:
say index "Camelia is a butterfly", "a"; # OUTPUT: «1»
say index "Camelia is a butterfly", "a", 2; # OUTPUT: «6»
say index "Camelia is a butterfly", "er"; # OUTPUT: «17»
say index "Camelia is a butterfly", "Camel"; # OUTPUT: «0»
say index "Camelia is a butterfly", "Onion"; # OUTPUT: «Nil»
say index("Camelia is a butterfly", "Onion").defined ?? 'OK' !! 'NOT'; # OUTPUT: «NOT»
=head2 routine rindex
multi sub rindex(Str:D $haystack, Str:D $needle, Int $startpos = $haystack.chars --> Int)
multi method rindex(Str:D $haystack: Str:D $needle, Int $startpos = $haystack.chars --> Int)
Returns the last position of C<$needle> in C<$haystack> not after C<$startpos>.
Returns an undefined value if C<$needle> wasn't found.
Examples:
say rindex "Camelia is a butterfly", "a"; # OUTPUT: «11»
say rindex "Camelia is a butterfly", "a", 10; # OUTPUT: «6»
=head2 method match
method match($pat, :continue(:$c), :pos(:$p), :global(:$g), :overlap(:$ov), :exhaustive(:$ex), :st(:$nd), :rd(:$th), :$nth, :$x --> Match)
Performs a match of the string against C<$pat> and returns a L<Match> object if there is a successful match,
and C<(Any)> otherwise. Matches are stored in C<$/>. If C<$pat> is not a L<Regex> object, match will coerce
the argument to a Str and then perform a literal match against C<$pat>.
A number of optional named parameters can be specified, which alter how the match is performed.
=item :continue
The :continue adverb takes as an argument the position where the regex should start to search.
If no position is specified for :c it will default to 0 unless $/ is set, in which case it defaults to $/.to.
=item :pos
Takes a position as an argument. Fails if regex cannot be matched from that position, unlike :continue.
=item :global
Instead of searching for just one match and returning a Match object, search for every non-overlapping match and return them in a List.
=item :overlap
Finds all matches including overlapping matches, but only returns one match from each starting position.
=item :exhaustive
Finds all possible matches of a regex, including overlapping matches and matches that start at the same position.
=item :st, :nd, rd, nth
Takes an integer as an argument and returns the nth match in the string.
=item :x
Takes as an argument the number of matches to return, stopping once the specified number of matches has been reached.
Examples:
=begin code
say "properly".match('perl'); # OUTPUT: «「perl」»
say "properly".match(/p.../); # OUTPUT: «「perl」»
say "1 2 3".match([1,2,3]); # OUTPUT: «「1 2 3」»
say "a1xa2".match(/a./, :continue(2)); # OUTPUT: «「a2」»
say "abracadabra".match(/ a .* a /, :exhaustive);
# OUTPUT: «(「abracadabra」 「abracada」 「abraca」 「abra」 「acadabra」 「acada」 「aca」 「adabra」 「ada」 「abra」)»
say 'several words here'.match(/\w+/,:global); # OUTPUT: «(「several」 「words」 「here」)»
say 'abcdef'.match(/.*/, :pos(2)); # OUTPUT: «「cdef」»
say "foo[bar][baz]".match(/../, :1st); # OUTPUT: «「fo」»
say "foo[bar][baz]".match(/../, :2nd); # OUTPUT: «「o[」»
say "foo[bar][baz]".match(/../, :3rd); # OUTPUT: «「ba」»
say "foo[bar][baz]".match(/../, :4th); # OUTPUT: «「r]」»
say "foo[bar][baz]bada".match('ba', :x(2)); # OUTPUT: «(「ba」 「ba」)»
=end code
=head2 routine parse-base
multi sub parse-base(Str:D $num, Int:D $radix --> Numeric)
multi method parse-base(Str:D $num: Int:D $radix --> Numeric)
Performs the reverse of L«C<base>|/routine/base» by converting a string
with a base-C<$radix> number to its base-10 L«C<Numeric>|/type/Numeric»
equivalent. Will L«C<fail>|/routine/fail» if radix is not in range C<2..36>
or of the string being parsed contains characters that are not valid
for the specified base.
1337.base(32).parse-base(32).say; # OUTPUT: «1337»
'Perl6'.parse-base(30).say; # OUTPUT: «20652936»
'FF.DD'.parse-base(16).say; # OUTPUT: «255.863281»
See also: L«:16<FF> syntax for number literals|/syntax/Number%20literals»
=head2 routine parse-names
sub parse-names(Str:D $names --> Str:D)
method parse-names(Str:D $names: --> Str:D)
Takes string with comma-separated Unicode names of characters and
returns a string composed of those characters. Will L«C<fail>|/routine/fail»
if any of the characters' names are empty or not recognized. Whitespace
around character names is ignored.
say "I {parse-names 'TWO HEARTS'} Perl"; # OUTPUT: «I 💕 Perl»
'TWO HEARTS, BUTTERFLY'.parse-names.say; # OUTPUT: «💕🦋»
=head2 routine split
=for code :skip-test
multi sub split( Str:D $delimiter, Str:D $input, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p --> Positional)
multi sub split(Regex:D $delimiter, Str:D $input, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p --> Positional)
multi sub split(List:D $delimiters, Str:D $input, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p --> Positional)
multi method split(Str:D: Str:D $delimiter, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p --> Positional)
multi method split(Str:D: Regex:D $delimiter, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p --> Positional)
multi method split(Str:D: List:D $delimiters, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p --> Positional)
Splits a string up into pieces based on delimiters found in the string.
If C<DELIMITER> is a string, it is searched for literally and not treated
as a regex. If C<DELIMITER> is the empty string, it effectively returns all
characters of the string separately (plus an empty string at the begin and at
the end). If C<PATTERN> is a regular expression, then that will be used
to split up the string. If C<DELIMITERS> is a list, then all of its elements
will be considered a delimiter (either a string or a regular expression) to
split the string on.
The optional C<LIMIT> indicates in how many segments the string should be
split, if possible. It defaults to B<Inf> (or B<*>, whichever way you look at
it), which means "as many as possible". Note that specifying negative limits
will not produce any meaningful results.
A number of optional named parameters can be specified, which alter the
result being returned. The C<:v>, C<:k>, C<:kv> and C<:p> named parameters
all perform a special action with regards to the delimiter found.
=item :skip-empty
If specified, do not return empty strings before or after a delimiter.
=item :v
Also return the delimiter. If the delimiter was a regular expression, then
this will be the associated C<Match> object. Since this stringifies as the
delimiter string found, you can always assume it is the delimiter string if
you're not interested in further information about that particular match.
=item :k
Also return the B<index> of the delimiter. Only makes sense if a list of
delimiters was specified: in all other cases, this will be B<0>.
=item :kv
Also return both the B<index> of the delimiter, as well as the delimiter.
=item :p
Also return the B<index> of the delimiter and the delimiter as a C<Pair>.
Examples:
say split(";", "a;b;c").perl; # OUTPUT: «("a", "b", "c")»
say split(";", "a;b;c", :v).perl; # OUTPUT: «("a", ";", "b", ";", "c")»
say split(";", "a;b;c", 2).perl; # OUTPUT: «("a", "b;c").Seq»
say split(";", "a;b;c", 2, :v).perl; # OUTPUT: «("a", ";", "b;c")»
say split(";", "a;b;c,d").perl; # OUTPUT: «("a", "b", "c,d")»
say split(/\;/, "a;b;c,d").perl; # OUTPUT: «("a", "b", "c,d")»
say split(<; ,>, "a;b;c,d").perl; # OUTPUT: «("a", "b", "c", "d")»
say split(/<[;,]>/, "a;b;c,d").perl; # OUTPUT: «("a", "b", "c", "d")»
say split(<; ,>, "a;b;c,d", :k).perl; # OUTPUT: «("a", 0, "b", 0, "c", 1, "d")»
say split(<; ,>, "a;b;c,d", :kv).perl; # OUTPUT: «("a", 0, ";", "b", 0, ";", "c", 1, ",", "d")»
say "".split("x").perl; # OUTPUT: «("",)»
say "".split("x", :skip-empty).perl; # OUTPUT: «()»
say "abcde".split("").perl; # OUTPUT: «("", "a", "b", "c", "d", "e", "")»
say "abcde".split("",:skip-empty).perl; # OUTPUT: «("a", "b", "c", "d", "e")»
=head2 routine comb
multi sub comb(Str:D $matcher, Str:D $input, $limit = Inf)
multi sub comb(Regex:D $matcher, Str:D $input, $limit = Inf, Bool :$match)
multi sub comb(Int:D $size, Str:D $input, $limit = Inf)
multi method comb(Str:D $input:)
multi method comb(Str:D $input: Str:D $matcher, $limit = Inf)
multi method comb(Str:D $input: Regex:D $matcher, $limit = Inf, Bool :$match)
multi method comb(Str:D $input: Int:D $size, $limit = Inf)
Searches for C<$matcher> in C<$input> and returns a list of all matches
(as C<Str> by default, or as L<Match> if C<$match> is True), limited to at most
C<$limit> matches.
If no matcher is supplied, a list of characters in the string
(e.g. C<$matcher = rx/./>) is returned.
Examples:
say "abc".comb.perl; # OUTPUT: «("a", "b", "c").Seq»
say 'abcdefghijk'.comb(3).perl; # OUTPUT: «("abc", "def", "ghi", "jk").Seq»
say 'abcdefghijk'.comb(3, 2).perl; # OUTPUT: «("abc", "def").Seq»
say comb(/\w/, "a;b;c").perl; # OUTPUT: «("a", "b", "c").Seq»
say comb(/\N/, "a;b;c").perl; # OUTPUT: «("a", ";", "b", ";", "c").Seq»
say comb(/\w/, "a;b;c", 2).perl; # OUTPUT: «("a", "b").Seq»
say comb(/\w\;\w/, "a;b;c", 2).perl; # OUTPUT: «("a;b",).Seq»
If the matcher is an integer value, it is considered to be a matcher that
is similar to / . ** matcher /, but which is about 30x faster.
=head2 routine lines
multi sub lines(Str:D $input, $limit = Inf --> Positional)
multi method lines(Str:D $input: $limit = Inf --> Positional)
Returns a list of lines (without trailing newline characters), i.e. the
same as a call to C<$input.comb( / ^^ \N* /, $limit )> would.
Examples:
say lines("a\nb").perl; # OUTPUT: «("a", "b").Seq»
say lines("a\nb").elems; # OUTPUT: «2»
say "a\nb".lines.elems; # OUTPUT: «2»
say "a\n".lines.elems; # OUTPUT: «1»
=head2 routine words
multi sub words(Str:D $input, $limit = Inf --> Positional)
multi method words(Str:D $input: $limit = Inf --> Positional)
Returns a list of non-whitespace bits, i.e. the same as a call to
C<$input.comb( / \S+ /, $limit )> would.
Examples:
say "a\nb\n".words.perl; # OUTPUT: «("a", "b").Seq»
say "hello world".words.perl; # OUTPUT: «("hello", "world").Seq»
say "foo:bar".words.perl; # OUTPUT: «("foo:bar",).Seq»
say "foo:bar\tbaz".words.perl; # OUTPUT: «("foo:bar", "baz").Seq»
=head2 routine flip
multi sub flip(Str:D --> Str:D)
multi method flip(Str:D: --> Str:D)
Returns the string reversed character by character.
Examples:
"Perl".flip; # RESULT: «lreP»
"ABBA".flip; # RESULT: «ABBA»
=head2 sub sprintf
multi sub sprintf( Str:D $format, *@args --> Str:D)
This function is mostly identical to the C library C<sprintf> and
C<printf> functions. The only difference between the two
functions is that C<sprintf> returns a string while the C<printf> function
writes to a file.
The C<$format> is scanned for C<%> characters. Any C<%> introduces a
format token. Format tokens have the following grammar:
grammar Str::SprintfFormat {
regex format_token { '%': <index>? <precision>? <modifier>? <directive> }
token index { \d+ '$' }
token precision { <flags>? <vector>? <precision_count> }
token flags { <[ \x20 + 0 \# \- ]>+ }
token precision_count { [ <[1..9]>\d* | '*' ]? [ '.' [ \d* | '*' ] ]? }
token vector { '*'? v }
token modifier { < ll l h V q L > }
token directive { < % c s d u o x e f g X E G b p n i D U O F > }
}
Directives guide the use (if any) of the arguments. When a directive
(other than C<%>) is used, it indicates how the next argument
passed is to be formatted into the string to be created.
NOTE: The information below is for a fully functioning C<sprintf>
implementation which hasn't been achieved yet. Formats or features not
yet implemented are marked NYI.
The directives are:
=begin table
% a literal percent sign
c a character with the given codepoint
s a string
d a signed integer, in decimal
u an unsigned integer, in decimal
o an unsigned integer, in octal
x an unsigned integer, in hexadecimal
e a floating-point number, in scientific notation
f a floating-point number, in fixed decimal notation
g a floating-point number, in %e or %f notation
X like x, but using uppercase letters
E like e, but using an uppercase "E"
G like g, but with an uppercase "E" (if applicable)
b an unsigned integer, in binary
=end table
Compatibility:
=begin table
i a synonym for %d
D a synonym for %ld
U a synonym for %lu
O a synonym for %lo
F a synonym for %f
=end table
Perl 5 (non-)compatibility:
=begin table
n produces a runtime exception
p produces a runtime exception
=end table
Modifiers change the meaning of format directives, but are largely
no-ops (the semantics are still being determined).
=begin table
h interpret integer as native "short" (typically int16)
NYI l interpret integer as native "long" (typically int32 or int64)
NYI ll interpret integer as native "long long" (typically int64)
NYI L interpret integer as native "long long" (typically uint64)
NYI q interpret integer as native "quads" (typically int64 or larger)
=end table
Between the C<%> and the format letter, you may specify several
additional attributes controlling the interpretation of the format. In
order, these are:
B<format parameter index>
An explicit format parameter index, such as C<2$>. By default,
C<sprintf> will format the next unused argument in the list, but this
allows you to take the arguments out of order:
sprintf '%2$d %1$d', 12, 34; # OUTPUT: «34 12»
sprintf '%3$d %d %1$d', 1, 2, 3; # OUTPUT: «3 1 1»
B<flags>
One or more of:
=begin table
space prefix non-negative number with a space
+ prefix non-negative number with a plus sign
- left-justify within the field
0 use leading zeros, not spaces, for required padding
# ensure the leading "0" for any octal,
prefix non-zero hexadecimal with "0x" or "0X",
prefix non-zero binary with "0b" or "0B"
=end table
For example:
sprintf '<% d>', 12; # OUTPUT: «< 12>»
sprintf '<% d>', 0; # OUTPUT: «< 0>"»
sprintf '<% d>', -12; # OUTPUT: «<-12>»
sprintf '<%+d>', 12; # OUTPUT: «<+12>»
sprintf '<%+d>', 0; # OUTPUT: «<+0>"»
sprintf '<%+d>', -12; # OUTPUT: «<-12>»
sprintf '<%6s>', 12; # OUTPUT: «< 12>»
sprintf '<%-6s>', 12; # OUTPUT: «<12 >»
sprintf '<%06s>', 12; # OUTPUT: «<000012>»
sprintf '<%#o>', 12; # OUTPUT: «<014>»
sprintf '<%#x>', 12; # OUTPUT: «<0xc>»
sprintf '<%#X>', 12; # OUTPUT: «<0XC>»
sprintf '<%#b>', 12; # OUTPUT: «<0b1100>»
sprintf '<%#B>', 12; # OUTPUT: «<0B1100>»
When a space and a plus sign are given as the flags at once, the space
is ignored:
sprintf '<%+ d>', 12; # OUTPUT: «<+12>»
sprintf '<% +d>', 12; # OUTPUT: «<+12>»
When the C<#> flag and a precision are given in the C<%o> conversion, the
precision is incremented if it's necessary for the leading "0":
sprintf '<%#.5o>', 0o12; # OUTPUT: «<000012>»
sprintf '<%#.5o>', 0o12345; # OUTPUT: «<012345>»
sprintf '<%#.0o>', 0; # OUTPUT: «<>» # zero precision results in no output!
B<vector flag>
This flag tells Perl 6 to interpret the supplied string as a vector of
integers, one for each character in the string. Perl 6 applies the
format to each integer in turn, then joins the resulting strings with
a separator (a dot, C<'.'>, by default). This can be useful for
displaying ordinal values of characters in arbitrary strings:
=begin code :skip-test
NYI sprintf "%vd", "AB\x{100}"; # "65.66.256"
=end code
You can also explicitly specify the argument number to use for the
join string using something like C<*2$v>; for example:
=begin code :skip-test
NYI sprintf '%*4$vX %*4$vX %*4$vX', # 3 IPv6 addresses
@addr[1..3], ":";
=end code
B<(minimum) width>
Arguments are usually formatted to be only as wide as required to
display the given value. You can override the width by putting a
number here, or get the width from the next argument (with C<*> ) or
from a specified argument (e.g., with C<*2$>):
sprintf "<%s>", "a"; # OUTPUT: «<a>»
sprintf "<%6s>", "a"; # OUTPUT: «< a>»
sprintf "<%*s>", 6, "a"; # OUTPUT: «< a>»
=begin code :skip-test
NYI sprintf '<%*2$s>', "a", 6; # "< a>"
=end code
sprintf "<%2s>", "long"; # OUTPUT: «<long>» (does not truncate)
If a field width obtained through C<*> is negative, it has the same
effect as the C<-> flag: left-justification.
B<precision, or maximum width>
You can specify a precision (for numeric conversions) or a maximum
width (for string conversions) by specifying a C<.> followed by a
number. For floating-point formats, except C<g> and C<G>, this
specifies how many places right of the decimal point to show (the
default being 6). For example:
# these examples are subject to system-specific variation
sprintf '<%f>', 1; # OUTPUT: «"<1.000000>"»
sprintf '<%.1f>', 1; # OUTPUT: «"<1.0>"»
sprintf '<%.0f>', 1; # OUTPUT: «"<1>"»
sprintf '<%e>', 10; # OUTPUT: «"<1.000000e+01>"»
sprintf '<%.1e>', 10; # OUTPUT: «"<1.0e+01>"»
For "g" and "G", this specifies the maximum number of digits to show,
including those prior to the decimal point and those after it; for
example:
# These examples are subject to system-specific variation.
sprintf '<%g>', 1; # OUTPUT: «<1>»
sprintf '<%.10g>', 1; # OUTPUT: «<1>»
sprintf '<%g>', 100; # OUTPUT: «<100>»
sprintf '<%.1g>', 100; # OUTPUT: «<1e+02>»
sprintf '<%.2g>', 100.01; # OUTPUT: «<1e+02>»
sprintf '<%.5g>', 100.01; # OUTPUT: «<100.01>»
sprintf '<%.4g>', 100.01; # OUTPUT: «<100>»
For integer conversions, specifying a precision implies that the
output of the number itself should be zero-padded to this width, where
the C<0> flag is ignored:
(Note that this feature currently works for unsigned integer conversions, but not
for signed integer.)
=begin code :skip-test
NYI sprintf '<%.6d>', 1; # <000001>
NYI sprintf '<%+.6d>', 1; # <+000001>
NYI sprintf '<%-10.6d>', 1; # <000001 >
NYI sprintf '<%10.6d>', 1; # < 000001>
NYI sprintf '<%010.6d>', 1; # 000001>
NYI sprintf '<%+10.6d>', 1; # < +000001>
sprintf '<%.6x>', 1; # OUTPUT: «<000001>»
sprintf '<%#.6x>', 1; # OUTPUT: «<0x000001>»
sprintf '<%-10.6x>', 1; # OUTPUT: «<000001 >»
sprintf '<%10.6x>', 1; # OUTPUT: «< 000001>»
sprintf '<%010.6x>', 1; # OUTPUT: «< 000001>»
sprintf '<%#10.6x>', 1; # OUTPUT: «< 0x000001>»
=end code
For string conversions, specifying a precision truncates the string to
fit the specified width:
sprintf '<%.5s>', "truncated"; # OUTPUT: «<trunc>»
sprintf '<%10.5s>', "truncated"; # OUTPUT: «< trunc>»
You can also get the precision from the next argument using C<.*>, or
from a specified argument (e.g., with C<.*2$>):
=begin code :skip-test
sprintf '<%.6x>', 1; # OUTPUT: «<000001>»
sprintf '<%.*x>', 6, 1; # OUTPUT: «<000001>»
NYI sprintf '<%.*2$x>', 1, 6; # "<000001>"
NYI sprintf '<%6.*2$x>', 1, 4; # "< 0001>"
=end code
If a precision obtained through C<*> is negative, it counts as having
no precision at all:
sprintf '<%.*s>', 7, "string"; # OUTPUT: «<string>»
sprintf '<%.*s>', 3, "string"; # OUTPUT: «<str>»
sprintf '<%.*s>', 0, "string"; # OUTPUT: «<>»
sprintf '<%.*s>', -1, "string"; # OUTPUT: «<string>»
sprintf '<%.*d>', 1, 0; # OUTPUT: «<0>»
sprintf '<%.*d>', 0, 0; # OUTPUT: «<>»
sprintf '<%.*d>', -1, 0; # OUTPUT: «<0>»
B<size>
For numeric conversions, you can specify the size to interpret the
number as using C<l>, C<h>, C<V>, C<q>, C<L>, or C<ll>. For integer
conversions (C<d> C<u> C<o> C<x> C<X> C<b> C<i> C<D> C<U> C<O>),
numbers are usually assumed to be whatever the default integer size is
on your platform (usually 32 or 64 bits), but you can override this to
use instead one of the standard C types, as supported by the compiler
used to build Perl 6:
(Note: None of the following have been implemented.)
=begin table
hh | interpret integer as C type "char" or "unsigned char"
h | interpret integer as C type "short" or "unsigned short"
j | interpret integer as C type "intmax_t", only with a C99 compiler (unportable)
l | interpret integer as C type "long" or "unsigned long"
q, L, or ll | interpret integer as C type "long long", "unsigned long long", or "quad" (typically 64-bit integers)
t | interpret integer as C type "ptrdiff_t"
z | interpret integer as C type "size_t"
=end table
B<order of arguments>
Normally, C<sprintf> takes the next unused argument as the value to
format for each format specification. If the format specification uses
C<*> to require additional arguments, these are consumed from the
argument list in the order they appear in the format specification
before the value to format. Where an argument is specified by an
explicit index, this does not affect the normal order for the
arguments, even when the explicitly specified index would have been
the next argument.
So:
my $a = 5; my $b = 2; my $c = 'net';
sprintf "<%*.*s>", $a, $b, $c; # OUTPUT: «< ne>»
uses C<$a> for the width, C<$b> for the precision, and C<$c> as the value to
format; while:
=for code :skip-test
NYI sprintf '<%*1$.*s>', $a, $b;
would use C<$a> for the width and precision and C<$b> as the value to format.
Here are some more examples; be aware that when using an explicit
index, the C<$> may need escaping:
=for code :skip-test
sprintf "%2\$d %d\n", 12, 34; # OUTPUT: «34 12»
sprintf "%2\$d %d %d\n", 12, 34; # OUTPUT: «34 12 34»
sprintf "%3\$d %d %d\n", 12, 34, 56; # OUTPUT: «56 12 34»
NYI sprintf "%2\$*3\$d %d\n", 12, 34, 3; # " 34 12\n"
NYI sprintf "%*1\$.*f\n", 4, 5, 10; # "5.0000\n"
=comment TODO: document effects of locale
Other examples:
=for code :skip-test
NYI sprintf "%ld a big number", 4294967295;
NYI sprintf "%%lld a bigger number", 4294967296;
sprintf('%c', 97); # OUTPUT: «a»
sprintf("%.2f", 1.969); # OUTPUT: «1.97»
sprintf("%+.3f", 3.141592); # OUTPUT: «+3.142»
sprintf('%2$d %1$d', 12, 34); # OUTPUT: «34 12»
sprintf("%x", 255); # OUTPUT: «ff»
Special case: 'sprintf("<b>%s</b>\n", "Perl 6")' will not work, but
one of the following will:
sprintf Q:b "<b>%s</b>\n", "Perl 6"; # OUTPUT: «<b>Perl 6</b>»
sprintf "<b>\%s</b>\n", "Perl 6"; # OUTPUT: «<b>Perl 6</b>»
sprintf "<b>%s\</b>\n", "Perl 6"; # OUTPUT: «<b>Perl 6</b>»
=head2 method starts-with
multi method starts-with(Str:D: Str(Cool) $needle --> True:D)
Returns C<True> if the invocant is identical to or starts with C<$needle>.
say "Hello, World".starts-with("Hello"); # OUTPUT: «True»
say "https://perl6.org/".starts-with('ftp'); # OUTPUT: «False»
=head2 method ends-with
multi method ends-with(Str:D: Str(Cool) $needle --> True:D)
Returns C<True> if the invocant is identical to or ends with C<$needle>.
say "Hello, World".ends-with('Hello'); # OUTPUT: «False»
say "Hello, World".ends-with('ld'); # OUTPUT: «True»
=head2 method subst
multi method subst(Str:D: $matcher, $replacement, *%opts)
Returns the invocant string where C<$matcher> is replaced by C<$replacement>
(or the original string, if no match was found).
There is an in-place syntactic variant of C<subst> spelled
C<s/matcher/replacement/>.
C<$matcher> can be a L<Regex>, or a literal C<Str>. Non-Str matcher arguments
of type L<Cool> are coerced to C<Str> for literal matching.
my $some-string = "Some foo";
my $another-string = $some-string.subst(/foo/, "string"); # gives 'Some string'
$some-string.=subst(/foo/, "string"); # in-place substitution. $some-string is now 'Some string'
The replacement can be a closure:
my $i = 41;
my $str = "The answer is secret.";
my $real-answer = $str.subst(/secret/, {++$i}); # The answer to everything
Here are other examples of usage:
my $str = "Hey foo foo foo";
$str.subst(/foo/, "bar", :g); # global substitution - returns Hey bar bar bar
$str.subst(/foo/, "no subst", :x(0)); # targeted substitution. Number of times to substitute. Returns back unmodified.
$str.subst(/foo/, "bar", :x(1)); #replace just the first occurrence.
$str.subst(/foo/, "bar", :nth(3)); # replace nth match alone. Replaces the third foo. Returns Hey foo foo bar
The C<:nth> adverb has readable English-looking variants:
say 'ooooo'.subst: 'o', 'x', :1st; # OUTPUT: «xoooo»
say 'ooooo'.subst: 'o', 'x', :2nd; # OUTPUT: «oxooo»
say 'ooooo'.subst: 'o', 'x', :3rd; # OUTPUT: «ooxoo»
say 'ooooo'.subst: 'o', 'x', :4th; # OUTPUT: «oooxo»
The following adverbs are supported
=begin table
short long meaning
===== ==== =======
:g :global tries to match as often as possible
:nth(Int|Callable|Whatever) only substitute the nth match; aliases: :st, :nd, :rd, and :th
:ss :samespace preserves whitespace on substitution
:ii :samecase preserves case on substitution
:mm :samemark preserves character marks (e.g. 'ü' replaced with 'o' will result in 'ö')
:x(Int|Range|Whatever) substitute exactly $x matches
=end table
Note that only in the C<s///> form C<:ii> implies C<:i> and C<:ss> implies
C<:s>. In the method form, the C<:s> and C<:i> modifiers must be added to the
regex, not the C<subst> method call.
=head2 method subst-mutate
Where C<subst> returns the modified string and leaves the original
unchanged, it is possible to mutate the original string by using
C<subst-mutate>. If the match is successful, the method returns a C<Match>
object representing the successful match; if C<:g> (or C<:global>) argument
is used, returns a C<List> of C<Match> objects. If no matches happen,
returns C<Any>.
my $some-string = "Some foo";
my $match = $some-string.subst-mutate(/foo/, "string");
say $some-string; # OUTPUT: «Some string»
say $match; # OUTPUT: «「foo」»
$some-string.subst-mutate(/<[oe]>/, '', :g); # remove every o and e, notice the :g named argument from .subst
=head2 routine substr
multi sub substr(Str:D $s, Int:D $from, Int:D $chars = $s.chars - $from --> Str:D)
multi sub substr(Str:D $s, Range $from-to --> Str:D)
multi method substr(Str:D $s: Int:D $from, Int:D $chars = $s.chars - $from --> Str:D)
multi method substr(Str:D $s: Range $from-to --> Str:D)
Returns a part of the string, starting from the character with index C<$from>
(where the first character has index 0) and with length C<$chars>. If a range is
specified, its first and last indices are used to determine the size of the substring.
Examples:
substr("Long string", 6, 3); # RESULT: «tri»
substr("Long string", 6); # RESULT: «tring»
substr("Long string", 6, *-1); # RESULT: «trin»
substr("Long string", *-3, *-1); # RESULT: «in»
=head2 method substr-eq
multi method substr-eq(Str:D: Str(Cool) $test-string, Int(Cool) $from --> Bool)
multi method substr-eq(Cool:D: Str(Cool) $test-string, Int(Cool) $from --> Bool)
Returns C<True> if the C<$test-string> exactly matches the C<String> object,
starting from the given initial index C<$from>. For example, beginning with
the string C<"foobar">, the substring C<"bar"> will match from index 3:
my $string = "foobar";
say $string.substr-eq("bar", 3); # OUTPUT: «True»
However, the substring C<"barz"> starting from index 3 won't match even
though the first three letters of the substring do match:
my $string = "foobar";
say $string.substr-eq("barz", 3); # OUTPUT: «False»
Naturally, to match the entire string, one merely matches from index 0:
my $string = "foobar";
say $string.substr-eq("foobar", 0); # OUTPUT: «True»
Since this method is inherited from the C<Cool> type, it also works on
integers. Thus the integer C<42> will match the value C<342> starting from
index 1:
my $integer = 342;
say $integer.substr-eq(42, 1); # OUTPUT: «True»
As expected, one can match the entire value by starting at index 0:
my $integer = 342;
say $integer.substr-eq(342, 0); # OUTPUT: «True»
Also using a different value or an incorrect starting index won't match:
my $integer = 342;
say $integer.substr-eq(42, 3); # OUTPUT: «False»
say $integer.substr-eq(7342, 0); # OUTPUT: «False»
=head2 method substr-rw
method substr-rw($from, $length?)
A version of C<substr> that returns a L<Proxy|/type/Proxy> functioning as a
writable reference to a part of a string variable. Its first argument, C<$from>
specifies the index in the string from which a substitution should occur, and
its last argument, C<$length> specifies how many characters are to be replaced.
For example, in its method form, if one wants to take the string C<"abc">
and replace the second character (at index 1) with the letter C<"z">, then
one does this:
my $string = "abc";
$string.substr-rw(1, 1) = "z";
$string.say; # OUTPUT: «azc»
Note that new characters can be inserted as well:
my $string = 'azc';
$string.substr-rw(2, 0) = "-Zorro-"; # insert new characters BEFORE the character at index 2
$string.say; # OUTPUT: «az-Zorro-c»
C<substr-rw> also has a function form, so the above examples can also be
written like so:
my $string = "abc";
substr-rw($string, 1, 1) = "z";
$string.say; # OUTPUT: «azc»
substr-rw($string, 2, 0) = "-Zorro-";
$string.say; # OUTPUT: «az-Zorro-c»
It is also possible to alias the writable reference returned by C<substr-rw>
for repeated operations:
my $string = "A character in the 'Flintstones' is: barney";
$string ~~ /(barney)/;
my $ref := substr-rw($string, $0.from, $0.to);
$string.say;
# OUTPUT: «A character in the 'Flintstones' is: barney»
$ref = "fred";
$string.say;
# OUTPUT: «A character in the 'Flintstones' is: fred»
$ref = "wilma";
$string.say;
# OUTPUT: «A character in the 'Flintstones' is: wilma»
Notice that the start position and length of string to replace has been
specified via the C<.from> and C<.to> methods on the C<Match> object, C<$0>.
It is thus not necessary to count characters in order to replace a
substring, hence making the code more flexible.
=head2 routine samemark
multi sub samemark(Str:D $string, Str:D $pattern --> Str:D)
method samemark(Str:D: Str:D $pattern --> Str:D)
Returns a copy of C<$string> with the mark/accent information for each
character changed such that it matches the mark/accent of the corresponding
character in C<$pattern>. If C<$string> is longer than C<$pattern>, the
remaining characters in C<$string> receive the same mark/accent as the last
character in C<$pattern>. If C<$pattern> is empty no changes will be made.
Examples:
say 'åäö'.samemark('aäo'); # OUTPUT: «aäo»
say 'åäö'.samemark('a'); # OUTPUT: «aao»
say samemark('Pêrl', 'a'); # OUTPUT: «Perl»
say samemark('aöä', ''); # OUTPUT: «aöä»
=head2 method succ
method succ(Str:D --> Str:D)
Returns the string incremented by one.
String increment is "magical". It searches for the last alphanumeric
sequence that is not preceded by a dot, and increments it.
'12.34'.succ; # RESULT: «13.34»
'img001.png'.succ; # RESULT: «img002.png»
The actual increment step works by mapping the last alphanumeric
character to a character range it belongs to, and choosing the next
character in that range, carrying to the previous letter on overflow.
'aa'.succ; # RESULT: «ab»
'az'.succ; # RESULT: «ba»
'109'.succ; # RESULT: «110»
'α'.succ; # RESULT: «β»
'a9'.succ; # RESULT: «b0»
String increment is Unicode-aware, and generally works for scripts where a
character can be uniquely classified as belonging to one range of characters.
=head2 method pred
method pred(Str:D: --> Str:D)
Returns the string decremented by one.
String decrementing is "magical" just like string increment (see
L<succ>). It fails on underflow
=for code :skip-test
'b0'.pred; # RESULT: «a9»
'a0'.pred; # Failure
'img002.png'.pred; # RESULT: «img001.png»
=head2 routine ord