Skip to content

Tips and Tricks Strings

Björn Lindqvist edited this page May 27, 2016 · 2 revisions

Sequence handling Tips and tricks:

Split a String on Whitespace

This is an incredibly common task that might not be obvious how to do in Factor. Python has a handy split() method on strings that you can use:

u"foo    \u00a0bar\u205Fmeh".split()
[u'foo', u'bar', u'meh']

Factor has a split word found in splitting that seem similar. However it requires you to specify what delimiters to split on so you might use it like this:

IN: "foo    \u0000a0bar\u00205Fmeh" " \n\t" split harvest .
{ "foo" " bar meh" }

But to make it work properly you need to list all possible whitespace characters to split on which is a tedious job since there are so many of them in the unicode standard. It is better to use blank? from unicode.categories:

IN: "foo    \u0000a0bar\u00205Fmeh" [ blank? ] split-when harvest .
{ "foo" "bar" "meh" }

Factor doesn't have default function arguments which is why it is longer than the Python equivalent. You might also use the split word from the pcre vocab which offers a succinct alternative:

IN: QUALIFIED: pcre
IN: "foo    \u0000a0bar\u00205Fmeh" "\\s" pcre:split .
{ "foo" "bar" "meh" }