Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
132 lines (90 sloc) 7.96 KB

Split Strings on Character Change

This is derived from my blog post made in answer to the Week 20 of the Perl Weekly Challenge organized by Mohammad S. Anwar as well as answers made by others to the same challenge.

The challenge reads as follows:

Write a script to accept a string from command line and split it on change of character. For example, if the string is "ABBCDEEF", then it should split like "A", "BB", "C", "D", "EE", "F".

My Solutions

For this, it seemed fairly obvious to me that a simple regex in a one-liner should do the trick.

$ perl6 -e 'say ~$/ if "ABBBCDEEF" ~~ m:g/( (.) $0*)/;'

$ perl6 -e 'say ~$/ if "ABBCDEEF" ~~ m:g/( (.) $0*)/;'

The ((.)$0*) pattern looks for repeated characters and stores the captured groups of identical characters into the $/ match object, which we just need to stringify for outputting it.

Just in case the quote marks and commas are part of the desired output (which I don't really believe), we can fix that easily:

$ perl6 -e 'print join ", ", map {"\"$_\""}, "ABBCDEEF" ~~ m:g/((.)$0*)/'
"A", "BB", "C", "D", "EE", "F"

If we don't want to use a regex and prefer a more traditional procedural approach, we can split the input string, loop through each letter individually, and take actions depending on whether the current letter is equal to the previous one. For example:

use v6;

sub split-str ($in) {
    my $prev = "";
    my $tmp-str = "";
    my @out;
    for $in.comb -> $letter {
        if $letter eq $prev {
            $tmp-str ~= $letter;
        } else {
            push @out, $tmp-str if $tmp-str ne "";
            $tmp-str = $letter;
            $prev = $letter;
    push @out, $tmp-str;
    return join ", ", @out;

sub MAIN (Str $input = "ABBBCDEEF") {
    say split-str $input;

When using the default input parameter ("ABBBCDEEF"), this prints the following:

$ perl6 split-string.p6
A, BBB, C, D, EE, F

Alternative Solutions

Arne Sommer devised a solution somewhat similar to the procedural approach I outlined just above: splitting the input string into an array of individual letters and then loop over each letter to check whether it is the same as the previous one . Adam Russell, who was apparently offering solutions in Perl 6 for the first time, also used a procedural approach, but he used a repeat ... while loop and he printed the letters on the fly within the loop.

Ruben Westerberg also used a procedural approach on an array of letters, but with a fairly original and clever use of state variables, as well a somewhat unexpected use of the when ... default construct; his solution is also the only one using NEXT and LAST phasers.

Jaldhar H Vyas was the only person other than me to suggest a Perl 6 one-liner and also the only person to use the regex pattern (similar to mine) in a substitution rather than a simple match:

perl6 -e ' @*ARGS.shift.subst(/ ( (.)$0* ) /, { "\"$0\"" }, :g).subst("\"\"", "\", \"", :g).say; ' "ABBCDEEF"

Francis Whittle, Martin Barth, Randy Lauen, Joelle Maslak, and Feng Chang used regex patterns almost identical to mine above, but used that pattern as a parameter to the comb built-in function. As an example, this is Joelle's solution:

sub MAIN(Str:D $input) {
    my @matches = $input.comb( / (.) $0* / );
    say @matches.join("\n");

Kevin Colyer, Noud, and Athanasius used the same regex pattern as I did along with a similar syntax to retrieve the bits and pieces.

Ozzy also used a regex, but with named captures rather than using the $0 special variable (which is really a shortcut for $/[0]):

$string.match: / ( $<l>=<.alpha> $<l>* )+ /;    # Quantified capture yields array $/[0] of Match objects
say $/[0][*].Str;                               # Stringify each Match object to see the overall match

Roger Bell West also used something similar to a named capture (although it is really assigning a capture number to a variable):

sub splitchange ($in) {
   return map {$_.Str}, $in ~~ m:g/(.) {} :my $c = $0; ($c*)/;

Although Damian Conway doesn't participate directly to the Perl Weekly Challenge, but usually comments on it afterwards, his beautifully crafted solutions are always worth contemplating. His latest blog suggests a regex as a parameter to the comb builtin subroutine:

use v6.d;

sub MAIN (\str) {
    .say for str.comb: /(.) $0*/

See Also

See also the following blog posts:

*Update: * Yary did not participate to the challenge, but posted on Aug. 22, 2019 Splitting on a change, Challenge 20 Task 1 that notices that the challenge asked "to accept a string from command line and split it on change of character", and further comments: "But every solution that I read in the recap looked for runs of the same character instead of the literal interpretation of the challenge." I'm not sure whether I fully understand his objection, as these seem equivalent to me, but I guess that Yary would like to see something that detects directly places where there is a character change.. The blog's code examples are mainly in Perl 5, but the last one is in Perl 6:

say "ABBCDEEF".split(/<?before (.) {} :my $c=$0;><!after $c> /).perl
("", "A", "BB", "C", "D", "EE", "F").Seq

Yary further asks: any idea on how to remove the spurious empty string at the beginning?

Wrapping up

Please let me know if I forgot any of the challengers or if you think my explanation of your code misses something important.

If you want to participate to the Perl Weekly Challenge, please connect to this site.

You can’t perform that action at this time.