# ak.Strings regex functionality
Demonstration of regex functionality in Strings

Set the `CHPL_RE2` flag and remake chapel
```
export CHPL_RE2=bundled
```
This is for chapel v1.25.0, for v1.24 set the `CHPL_REGEX` flag (`export CHPL_REGEXP=re2`)

The regex functionality uses Chapel's `regex` module which is built on google's `re2`. re2 sacrifices some features like lookahead/lookbehind in exchange for guarantees that searches complete in linear time with respect to the size of the input and in a fixed amount of stack space

In [None]:
import arkouda as ak
ak.connect()

## substring search
Returns a boolean array indicating whether each element `contains`, `startswith`, or `endswith` the regex pattern.

`match` returns a boolean array indicating whether the entire element matches the regex pattern

In [None]:
strings = ak.array(['{} string {}'.format(i, i) for i in range(1, 6)])

In [None]:
strings

In [None]:
strings.contains('string \\d', regex=True)

In [None]:
strings.startswith('\\d str', regex = True)

In [None]:
strings.endswith('ing \\d', regex = True)

In [None]:
strings.match('\\d string \\d')

In [None]:
strings.match('ing \\d')

## peel
Peel off one or more delimited fields from each string (similar to string.partition), returning two new arrays of strings

In [None]:
under = ak.array(['a_b', 'c___d', 'e__f____g'])

In [None]:
under

In [None]:
under.peel('_+', regex=True)

In [None]:
under.peel('_+', includeDelimiter=True, regex=True)

## flatten
Given an array of strings where each string encodes a variable-length sequence delimited by a common substring, flattening offers a method for unpacking the sequences into a flat array of individual elements

regex flatten functionality

In [None]:
under = ak.array(['one_two', 'three_____four____five', 'six'])

In [None]:
under.flatten('_+', return_segments=True, regex=True)

compared to existing flatten

In [None]:
orig = ak.array(['one|two', 'three|four|five', 'six'])

In [None]:
orig.flatten('|', return_segments=True)

## findall and find_locations
find_locations(pattern)

    Finds pattern matches and returns pdarrays containing the number, start postitions, and lengths of matches
    Results of find_locations are cached in regex_dict

findall(pattern, return_match_origins)

    Return all non-overlapping matches of pattern in Strings as a new Strings object.
    If return_match_origins is True, return a pdarray containing the index of the original string each pattern match is from
    Uses the output of find_locations

In [None]:
under = ak.array(['one_two', 'three_____four____five', 'six'])

In [None]:
under.find_locations('_+')

In [None]:
under.cached_regex_patterns()

In [None]:
under.findall('_+', return_match_origins=True)

## shutdown

In [None]:
ak.shutdown()