@@ -6234,117 +6234,151 @@ X<split>
6234
6234
6235
6235
=item split
6236
6236
6237
- Splits the string EXPR into a list of strings and returns that list. By
6238
- default, empty leading fields are preserved, and empty trailing ones are
6239
- deleted. (If all fields are empty, they are considered to be trailing.)
6237
+ Splits the string EXPR into a list of strings and returns the
6238
+ list in list context, or the size of the list in scalar context.
6240
6239
6241
- In scalar context, returns the number of fields found .
6240
+ If only PATTERN is given, EXPR defaults to C<$_> .
6242
6241
6243
- If EXPR is omitted, splits the C<$_> string. If PATTERN is also omitted,
6244
- splits on whitespace (after skipping any leading whitespace). Anything
6245
- matching PATTERN is taken to be a delimiter separating the fields. (Note
6246
- that the delimiter may be longer than one character.)
6242
+ Anything in EXPR that matches PATTERN is taken to be a separator
6243
+ that separates the EXPR into substrings (called "I<fields>") that
6244
+ do B<not> include the separator. Note that a separator may be
6245
+ longer than one character or even have no characters at all (the
6246
+ empty string, which is a zero-width match).
6247
+
6248
+ The PATTERN need not be constant; an expression may be used
6249
+ to specify a pattern that varies at runtime.
6250
+
6251
+ If PATTERN matches the empty string, the EXPR is split at the match
6252
+ position (between characters). As an example, the following:
6253
+
6254
+ print join(':', split('b', 'abc')), "\n";
6255
+
6256
+ uses the 'b' in 'abc' as a separator to produce the output 'a:c'.
6257
+ However, this:
6258
+
6259
+ print join(':', split('', 'abc')), "\n";
6260
+
6261
+ uses empty string matches as separators to produce the output
6262
+ 'a:b:c'; thus, the empty string may be used to split EXPR into a
6263
+ list of its component characters.
6264
+
6265
+ As a special case for C<split>, the empty pattern given in
6266
+ L<match operator|perlop/"m/PATTERN/msixpodualgc"> syntax (C<//>) specifically matches the empty string, which is contrary to its usual
6267
+ interpretation as the last successful match.
6268
+
6269
+ If PATTERN is C</^/>, then it is treated as if it used the
6270
+ L<multiline modifier|perlreref/OPERATORS> (C</^/m>), since it
6271
+ isn't much use otherwise.
6272
+
6273
+ As another special case, C<split> emulates the default behavior of the
6274
+ command line tool B<awk> when the PATTERN is either omitted or a I<literal
6275
+ string> composed of a single space character (such as S<C<' '>> or
6276
+ S<C<"\x20">>, but not e.g. S<C</ />>). In this case, any leading
6277
+ whitespace in EXPR is removed before splitting occurs, and the PATTERN is
6278
+ instead treated as if it were C</\s+/>; in particular, this means that
6279
+ I<any> contiguous whitespace (not just a single space character) is used as
6280
+ a separator. However, this special treatment can be avoided by specifying
6281
+ the pattern S<C</ />> instead of the string S<C<" ">>, thereby allowing
6282
+ only a single space character to be a separator.
6283
+
6284
+ If omitted, PATTERN defaults to a single space, S<C<" ">>, triggering
6285
+ the previously described I<awk> emulation.
6247
6286
6248
6287
If LIMIT is specified and positive, it represents the maximum number
6249
- of fields the EXPR will be split into, though the actual number of
6250
- fields returned depends on the number of times PATTERN matches within
6251
- EXPR. If LIMIT is unspecified or zero, trailing null fields are
6252
- stripped (which potential users of C<pop> would do well to remember).
6253
- If LIMIT is negative, it is treated as if an arbitrarily large LIMIT
6254
- had been specified. Note that splitting an EXPR that evaluates to the
6255
- empty string always returns the empty list, regardless of the LIMIT
6256
- specified.
6288
+ of fields into which the EXPR may be split; in other words, LIMIT is
6289
+ one greater than the maximum number of times EXPR may be split. Thus,
6290
+ the LIMIT value C<1> means that EXPR may be split a maximum of zero
6291
+ times, producing a maximum of one field (namely, the entire value of
6292
+ EXPR). For instance:
6257
6293
6258
- A pattern matching the empty string (not to be confused with
6259
- an empty pattern C<//>, which is just one member of the set of patterns
6260
- matching the empty string), splits EXPR into individual
6261
- characters. For example:
6294
+ print join(':', split(//, 'abc', 1)), "\n";
6262
6295
6263
- print join(': ', split(/ */, 'hi there')), "\n";
6296
+ produces the output 'abc ', and this:
6264
6297
6265
- produces the output 'h:i:t:h:e:r:e'.
6298
+ print join(':', split(//, 'abc', 2)), "\n";
6266
6299
6267
- As a special case for C<split>, the empty pattern C<//> specifically
6268
- matches the empty string; this is not be confused with the normal use
6269
- of an empty pattern to mean the last successful match. So to split
6270
- a string into individual characters, the following:
6300
+ produces the output 'a:bc', and each of these:
6271
6301
6272
- print join(':', split(//, 'hi there')), "\n";
6302
+ print join(':', split(//, 'abc', 3)), "\n";
6303
+ print join(':', split(//, 'abc', 4)), "\n";
6273
6304
6274
- produces the output 'h:i: :t:h:e:r:e '.
6305
+ produces the output 'a:b:c '.
6275
6306
6276
- Empty leading fields are produced when there are positive-width matches at
6277
- the beginning of the string; a zero-width match at the beginning of
6278
- the string does not produce an empty field. For example:
6307
+ If LIMIT is negative, it is treated as if it were instead arbitrarily
6308
+ large; as many fields as possible are produced.
6279
6309
6280
- print join(':', split(/(?=\w)/, 'hi there!'));
6310
+ If LIMIT is omitted (or, equivalently, zero), then it is usually
6311
+ treated as if it were instead negative but with the exception that
6312
+ trailing empty fields are stripped (empty leading fields are always
6313
+ preserved); if all fields are empty, then all fields are considered to
6314
+ be trailing (and are thus stripped in this case). Thus, the following:
6281
6315
6282
- produces the output 'h:i :t:h:e:r:e!'. Empty trailing fields, on the other
6283
- hand, are produced when there is a match at the end of the string (and
6284
- when LIMIT is given and is not 0), regardless of the length of the match.
6285
- For example:
6316
+ print join(':', split(',', 'a,b,c,,,')), "\n";
6286
6317
6287
- print join(':', split(//, 'hi there!', -1)), "\n";
6288
- print join(':', split(/\W/, 'hi there!', -1)), "\n";
6318
+ produces the output 'a:b:c', but the following:
6289
6319
6290
- produce the output 'h:i: :t:h:e:r:e:!:' and 'hi:there:', respectively,
6291
- both with an empty trailing field.
6320
+ print join(':', split(',', 'a,b,c,,,', -1)), "\n";
6292
6321
6293
- The LIMIT parameter can be used to split a line partially
6322
+ produces the output 'a:b:c:::'.
6294
6323
6295
- ($login, $passwd, $remainder) = split(/:/, $_, 3);
6324
+ In time-critical applications, it is worthwhile to avoid splitting
6325
+ into more fields than necessary. Thus, when assigning to a list,
6326
+ if LIMIT is omitted (or zero), then LIMIT is treated as though it
6327
+ were one larger than the number of variables in the list; for the
6328
+ following, LIMIT is implicitly 4:
6296
6329
6297
- When assigning to a list, if LIMIT is omitted, or zero, Perl supplies
6298
- a LIMIT one larger than the number of variables in the list, to avoid
6299
- unnecessary work. For the list above LIMIT would have been 4 by
6300
- default. In time critical applications it behooves you not to split
6301
- into more fields than you really need.
6330
+ ($login, $passwd, $remainder) = split(/:/);
6302
6331
6303
- If the PATTERN contains parentheses, additional list elements are
6304
- created from each matching substring in the delimiter .
6332
+ Note that splitting an EXPR that evaluates to the empty string always
6333
+ produces zero fields, regardless of the LIMIT specified .
6305
6334
6306
- split(/([,-])/, "1-10,20", 3);
6335
+ An empty leading field is produced when there is a positive-width
6336
+ match at the beginning of EXPR. For instance:
6307
6337
6308
- produces the list value
6338
+ print join(':', split(/ /, ' abc')), "\n";
6309
6339
6310
- (1, '-', 10, ',', 20)
6340
+ produces the output ':abc'. However, a zero-width match at the
6341
+ beginning of EXPR never produces an empty field, so that:
6311
6342
6312
- If you had the entire header of a normal Unix email message in $header,
6313
- you could split it up into fields and their values this way:
6343
+ print join(':', split(//, ' abc'));
6314
6344
6315
- $header =~ s/\n(?=\s)//g; # fix continuation lines
6316
- %hdrs = (UNIX_FROM => split /^(\S*?):\s*/m, $header);
6345
+ produces the output S<' :a:b:c'> (rather than S<': :a:b:c'>).
6317
6346
6318
- The pattern C</PATTERN/> may be replaced with an expression to specify
6319
- patterns that vary at runtime. (To do runtime compilation only once,
6320
- use C</$variable/o>.)
6347
+ An empty trailing field, on the other hand, is produced when there is a
6348
+ match at the end of EXPR, regardless of the length of the match
6349
+ (of course, unless a non-zero LIMIT is given explicitly, such fields are
6350
+ removed, as in the last example). Thus:
6321
6351
6322
- As a special case, specifying a PATTERN of space (S<C<' '>>) will split on
6323
- white space just as C<split> with no arguments does. Thus, S<C<split(' ')>> can
6324
- be used to emulate B<awk>'s default behavior, whereas S<C<split(/ /)>>
6325
- will give you as many initial null fields (empty string) as there are leading spaces.
6326
- A C<split> on C</\s+/> is like a S<C<split(' ')>> except that any leading
6327
- whitespace produces a null first field. A C<split> with no arguments
6328
- really does a S<C<split(' ', $_)>> internally.
6352
+ print join(':', split(//, ' abc', -1)), "\n";
6329
6353
6330
- A PATTERN of C</^/> is treated as if it were C</^/m>, since it isn't
6331
- much use otherwise.
6354
+ produces the output S<' :a:b:c:'>.
6332
6355
6333
- Example:
6356
+ If the PATTERN contains
6357
+ L<capturing groups|perlretut/Grouping things and hierarchical matching>,
6358
+ then for each separator, an additional field is produced for each substring
6359
+ captured by a group (in the order in which the groups are specified,
6360
+ as per L<backreferences|perlretut/Backreferences>); if any group does not
6361
+ match, then it captures the C<undef> value instead of a substring. Also,
6362
+ note that any such additional field is produced whenever there is a
6363
+ separator (that is, whenever a split occurs), and such an additional field
6364
+ does B<not> count towards the LIMIT. Consider the following expressions
6365
+ evaluated in list context (each returned list is provided in the associated
6366
+ comment):
6334
6367
6335
- open(PASSWD, '/etc/passwd');
6336
- while (<PASSWD>) {
6337
- chomp;
6338
- ($login, $passwd, $uid, $gid,
6339
- $gcos, $home, $shell) = split(/:/);
6340
- #...
6341
- }
6368
+ split(/-|,/, "1-10,20", 3)
6369
+ # ('1', '10', '20')
6370
+
6371
+ split(/(-|,)/, "1-10,20", 3)
6372
+ # ('1', '-', '10', ',', '20')
6373
+
6374
+ split(/-|(,)/, "1-10,20", 3)
6375
+ # ('1', undef, '10', ',', '20')
6342
6376
6343
- As with regular pattern matching, any capturing parentheses that are not
6344
- matched in a C<split()> will be set to C< undef> when returned:
6377
+ split(/(-)|,/, "1-10,20", 3)
6378
+ # ('1', '-', '10', undef, '20')
6345
6379
6346
- @fields = split /(A)|B /, "1A2B3";
6347
- # @fields is (1 , 'A ', 2, undef, 3 )
6380
+ split(/(-)|(,) /, "1-10,20", 3)
6381
+ # ('1' , '- ', undef, '10', undef, ',', '20' )
6348
6382
6349
6383
=item sprintf FORMAT, LIST
6350
6384
X<sprintf>
0 commit comments