# String Operations
[String Operations](https://numpy.org/doc/stable/reference/routines.char.html)

In [1]:
import numpy as np

## String Operations

[`np.char.add`](https://numpy.org/doc/stable/reference/generated/numpy.char.add.html#numpy.char.add)

Return element-wise string concatenation for two arrays of str or unicode.

In [2]:
np.char.add("Hello, ", "World!")

array('Hello, World!', dtype='<U13')

[`np.char.multiply`](https://numpy.org/doc/stable/reference/generated/numpy.char.multiply.html)

Return `(a * i)`, that is string multiple concatenation, element-wise.

In [4]:
a = np.array(["a", "b", "c"])
np.char.multiply(a, 3)

array(['aaa', 'bbb', 'ccc'], dtype='<U3')

In [5]:
i = np.array([1,2,3])
np.char.multiply(a, i)

array(['a', 'bb', 'ccc'], dtype='<U3')

In [6]:
np.char.multiply(np.array(['a']), i)

array(['a', 'aa', 'aaa'], dtype='<U3')

In [7]:
a = np.array(['a', 'b', 'c', 'd', 'e', 'f']).reshape((2,3))
np.char.multiply(a, 3)

array([['aaa', 'bbb', 'ccc'],
       ['ddd', 'eee', 'fff']], dtype='<U3')

In [8]:
np.char.multiply(a, i)

array([['a', 'bb', 'ccc'],
       ['d', 'ee', 'fff']], dtype='<U3')

[`np.char.mod`](https://numpy.org/doc/stable/reference/generated/numpy.char.mod.html)

Return a `(a % i)`, that is pre-Python 2.6 string formatting (interpolation), element-wise for a pair of array_likes of str or unicode.

In [10]:
np.char.mod("%s, World!", "Hello")

array('Hello, World!', dtype='<U13')

[`np.char.capitalize`](https://numpy.org/doc/stable/reference/generated/numpy.char.capitalize.html)

Return a copy with only the first character of each element capitalized.

In [11]:
c = np.array(['a1b2', '1b2a', 'b2a1', '2a1b'], 'S4')
c

array([b'a1b2', b'1b2a', b'b2a1', b'2a1b'], dtype='|S4')

In [12]:
np.char.capitalize(c)

array([b'A1b2', b'1b2a', b'B2a1', b'2a1b'], dtype='|S4')

[`np.char.center`](https://numpy.org/doc/stable/reference/generated/numpy.char.center.html)

Return a copy of *a*  with its elements centered in a string of length *width*.

In [15]:
c = np.array(['a1b2', '1b2a', 'b2a1', '2a1b'])
c

array(['a1b2', '1b2a', 'b2a1', '2a1b'], dtype='<U4')

In [16]:
np.char.center(c, width=9)

array(['   a1b2  ', '   1b2a  ', '   b2a1  ', '   2a1b  '], dtype='<U9')

In [17]:
np.char.center(c, width=9, fillchar="*")

array(['***a1b2**', '***1b2a**', '***b2a1**', '***2a1b**'], dtype='<U9')

[`np.char.decode`](https://numpy.org/doc/stable/reference/generated/numpy.char.decode.html)

Calls `bytes.decode` element-wise.

> Decode Bytes Encode Strings

> Decode bytes to string.

In [19]:
c = np.array([
    b'\x81\xc1\x81\xc1\x81\xc1',
    b'@@\x81\xc1@@',
    b'\x81\x82\xc2\xc1\xc2\x82\x81'
])
c

array([b'\x81\xc1\x81\xc1\x81\xc1', b'@@\x81\xc1@@',
       b'\x81\x82\xc2\xc1\xc2\x82\x81'], dtype='|S7')

In [20]:
np.char.decode(c, encoding='cp037')

array(['aAaAaA', '  aA  ', 'abBABba'], dtype='<U7')

[`np.char.encode`](https://numpy.org/doc/stable/reference/generated/numpy.char.encode.html)

> Decode Bytes Encode Strings

> Encode string to bytes.

In [22]:
s = "😁🤳"
np.char.encode(s)

array(b'\xf0\x9f\x98\x81\xf0\x9f\xa4\xb3', dtype='|S8')

[`np.char.expandtabs`](https://numpy.org/doc/stable/reference/generated/numpy.char.expandtabs.html)

Return a copy of each string element wherel all tab characters are replaced by one or more spaces.

In [25]:
s = "\tkind of\t\t tabbed in"
s

'\tkind of\t\t tabbed in'

In [26]:
np.char.expandtabs(s)

array('        kind of          tabbed in', dtype='<U34')

[`np.char.join`](https://numpy.org/doc/stable/reference/generated/numpy.char.join.html)

Return a string which is the concatenation of the strings in the sequence *seq*.
Calls *str.join* element-wise.

In [27]:
np.char.join('-', 'osd')

array('o-s-d', dtype='<U5')

In [28]:
np.char.join(['-', '.'], ['ghc', 'osd'])

array(['g-h-c', 'o.s.d'], dtype='<U5')

[`np.char.ljust](https://numpy.org/doc/stable/reference/generated/numpy.char.ljust.html)

Return an array with the elements of a left-justified in a string of length *width*.

In [30]:
np.char.ljust("Hey", width=6, fillchar="!")

array('Hey!!!', dtype='<U6')

[`np.char.lower`](https://numpy.org/doc/stable/reference/generated/numpy.char.lower.html)

Return an array with elements converted to lowercase.

In [31]:
c = np.array(['A1B C', '1BCA', 'BCA1'])
c

array(['A1B C', '1BCA', 'BCA1'], dtype='<U5')

In [32]:
np.char.lower(c)

array(['a1b c', '1bca', 'bca1'], dtype='<U5')

[`np.char.lstrip`](https://numpy.org/doc/stable/reference/generated/numpy.char.lstrip.html)

For each element is *a*, return a copy with the leading characters removed.

In [42]:
a = np.char.rjust("Hey", width=6, fillchar="!") # getting ahead of ourselves with rjust
a

array('!!!Hey', dtype='<U6')

In [43]:
np.char.lstrip(a, chars="!")

array('Hey', dtype='<U6')

[`np.char.partition`](https://numpy.org/doc/stable/reference/generated/numpy.char.partition.html)

Partition each element in *a* around *sep*.

For each element in *a*, split the element as the first occurence of *sep*, and return 3 strings containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return 3 strings containig the string itself, followed by two empty strings.

In [44]:
a = "10;11;21;32"
np.char.partition(a, ";")

array(['10', ';', '11;21;32'], dtype='<U8')

In [45]:
np.char.partition(a, "|")

array(['10;11;21;32', '', ''], dtype='<U11')

[`np.char.replace`](https://numpy.org/doc/stable/reference/generated/numpy.char.replace.html)

For each element in *a*, return a copy the string with all occurrences of substring *old* replaced *new*.

In [59]:
a = np.array(["That is a mango", "Monkeys eat mangos"])
np.char.replace(a, 'mango', 'banana')

array(['That is a banana', 'Monkeys eat bananas'], dtype='<U19')

In [62]:
np.char.replace(
    np.char.add(np.char.add(a[0], ", "), np.char.lower(a[1])), 
    'mango', 'banana', count=1
)

array('That is a banana, monkeys eat mangos', dtype='<U36')

In [48]:
a = np.array(["The dish is fresh", "This is it"])
np.char.replace(a, 'is', 'was')

array(['The dwash was fresh', 'Thwas was it'], dtype='<U19')

[`np.char.rjust`](https://numpy.org/doc/stable/reference/generated/numpy.char.rjust.html)

Return an array with the elements of *a* right-justified in a string of length *width*.

In [64]:
np.char.rjust(" Hey", width=6, fillchar="-")

array('-- Hey', dtype='<U6')

[`np.char.rpartition`](https://numpy.org/doc/stable/reference/generated/numpy.char.rpartition.html)

Partition (split) each element around the right-most separator.

In [66]:
a = "10;11;21;32"
np.char.rpartition(a, ";")

array(['10;11;21', ';', '32'], dtype='<U8')

Compare to `np.char.partition`:

In [67]:
a = "10;11;21;32"
np.char.partition(a, ";")

array(['10', ';', '11;21;32'], dtype='<U8')

In [68]:
a = "10;11;21;32"
np.char.rpartition(a, "|")

array(['', '', '10;11;21;32'], dtype='<U11')

[`np.char.rsplit`](https://numpy.org/doc/stable/reference/generated/numpy.char.rsplit.html)

For each element in *a*, return a list of the words in the string, using *sep* as the delimiter string.
Calls *str.rsplit* element-wise. Except for splitting from the right, `rsplit` behaves like `split`.

In [69]:
a = "  much whitespace     "
np.char.rsplit(a)

array(list(['much', 'whitespace']), dtype=object)

In [70]:
np.char.split(a)

array(list(['much', 'whitespace']), dtype=object)

[`np.char.rstrip`](https://numpy.org/doc/stable/reference/generated/numpy.char.rstrip.html)

For each element in *a*, return a copy with trailing characters removed.

In [71]:
c = np.array(['aAaAaA', 'abBABba'], dtype='S7')
c

array([b'aAaAaA', b'abBABba'], dtype='|S7')

In [72]:
np.char.rstrip(c, b'a')

array([b'aAaAaA', b'abBABb'], dtype='|S7')

In [73]:
np.char.rstrip(c, b'A')

array([b'aAaAa', b'abBABba'], dtype='|S7')

[`np.char.split`](https://numpy.org/doc/stable/reference/generated/numpy.char.split.html)

For each element in *a*, return a list of the words in the string, using *sep* as the delimiter string.

In [74]:
np.char.split("Hello my future")

array(list(['Hello', 'my', 'future']), dtype=object)

[`np.char.splitlines`](https://numpy.org/doc/stable/reference/generated/numpy.char.splitlines.html)

For each element in *a*, return a list of the lines in the element, breaking at line boundaries.

In [112]:
lines = """brocoli
cauliflower
cabbage
water cress
raddish
rutabaga
"""

np.char.splitlines(lines)

array(list(['brocoli', 'cauliflower', 'cabbage', 'water cress', 'raddish', 'rutabaga']),
      dtype=object)

[`np.char.strip`](https://numpy.org/doc/stable/reference/generated/numpy.char.strip.html)

For each element in *a*, return a copy with the leading and trailing characters removed.

In [76]:
c = np.array(['aAaAaA', '  aA  ', 'abBABba'])
c

array(['aAaAaA', '  aA  ', 'abBABba'], dtype='<U7')

In [77]:
np.char.strip(c)

array(['aAaAaA', 'aA', 'abBABba'], dtype='<U7')

In [78]:
np.char.strip(c, 'a')

array(['AaAaA', '  aA  ', 'bBABb'], dtype='<U7')

In [79]:
np.char.strip(np.char.strip(c), 'a')

array(['AaAaA', 'A', 'bBABb'], dtype='<U7')

[`np.char.swapcase`](https://numpy.org/doc/stable/reference/generated/numpy.char.swapcase.html)

Return element-wise a copy of the string with uppercase characters converted to lowercase and vice versa.

In [81]:
c = np.array(['abAB', 'aBcDeFgH'])
c

array(['abAB', 'aBcDeFgH'], dtype='<U8')

In [82]:
np.char.swapcase(c)

array(['ABab', 'AbCdEfGh'], dtype='<U8')

[`np.char.title`](https://numpy.org/doc/stable/reference/generated/numpy.char.title.html)

Return element-wise title cased version of string or unicode. Tile case words statr with uppercase characters, all remaining cased characters are lowercase.

In [83]:
cruciferous = """brocoli
cauliflower
cabbage
water cress
raddish
"""

first_part = "how to cultivate"

print(cruciferous)
print(first_part)

brocoli
cauliflower
cabbage
water cress
raddish

how to cultivate


In [84]:
np.char.title(cruciferous)

array('Brocoli\nCauliflower\nCabbage\nWater Cress\nRaddish\n',
      dtype='<U48')

In [85]:
np.char.title(first_part)

array('How To Cultivate', dtype='<U16')

In [98]:
np.char.splitlines(np.char.title(cruciferous))

array(list(['Brocoli', 'Cauliflower', 'Cabbage', 'Water Cress', 'Raddish']),
      dtype=object)

In [129]:
vegetables = np.char.split(np.char.title(cruciferous))
vegetables

array(list(['Brocoli', 'Cauliflower', 'Cabbage', 'Water', 'Cress', 'Raddish']),
      dtype=object)

**NOTE** this `array(list([...]), dtype=object)` above doesn't seem to be very useful, we have to do it the way we do it below.

In [132]:
vegetables = cruciferous.splitlines()
np.array(vegetables)

array(['brocoli', 'cauliflower', 'cabbage', 'water cress', 'raddish'],
      dtype='<U11')

In [135]:
vegetables = cruciferous.splitlines()
np.char.add(np.char.add(np.char.title(first_part), " "), np.char.title(np.array(vegetables)))

array(['How To Cultivate Brocoli', 'How To Cultivate Cauliflower',
       'How To Cultivate Cabbage', 'How To Cultivate Water Cress',
       'How To Cultivate Raddish'], dtype='<U28')

[`np.char.translate`](https://numpy.org/doc/stable/reference/generated/numpy.char.translate.html)

For each element in *a*, return a copy of the string where all characters occuring in the optional argument *deletechars* are removed, and the remaining characters have been mapped though the given translation table.

No idea how to do example, so let's just use [this one](https://www.educative.io/answers/what-is-the-numpychartranslate-function-in-python):

In [138]:
myarray = np.array(["That", "There", "The"])
myarray

array(['That', 'There', 'The'], dtype='<U5')

In [140]:
# dict for translation table
mydict = {"T": "B", "h": "l", "e":"a", "a":"o"}

# actual translation table
mytable = "Thea".maketrans(mydict)
mytable

{84: 'B', 104: 'l', 101: 'a', 97: 'o'}

In [141]:
np.char.translate(myarray, mytable)

array(['Blot', 'Blara', 'Bla'], dtype='<U5')

## Comparison

[`np.char.equal`](https://numpy.org/doc/stable/reference/generated/numpy.char.equal.html#numpy.char.equal)

Return (x1 == x2) element-wise.

**NOTE** variables below serve as example data for entire **Comparison** subsection.

In [143]:
x1 = np.array(["Hey", "Now"])
x2 = np.array(["hey", "now"])

In [144]:
np.char.equal(x1, x2)

array([False, False])

In [145]:
np.char.equal(x1, np.char.title(x2))

array([ True,  True])

[`np.char.not_equal`](https://numpy.org/doc/stable/reference/generated/numpy.char.not_equal.html)

REturn (x1 != x2) element-wise.

In [146]:
np.char.not_equal(x1, x2)

array([ True,  True])

In [147]:
np.char.not_equal(x1, np.char.title(x2))

array([False, False])

[`np.char.greater_equal`](https://numpy.org/doc/stable/reference/generated/numpy.char.greater_equal.html)

Return (x1 >= x2) element-wise.

Unlike `np.greater_equal`, this comparison is performed by first stripping whitespace characters from the end of the string. This behavior is provided for backward-compatibility with numarray.

In [148]:
np.char.greater_equal(x1, x2)

array([False, False])

In [151]:
np.char.greater_equal(x1, np.char.title(x2))

array([ True,  True])

[`np.char.less_equal`](https://numpy.org/doc/stable/reference/generated/numpy.char.less_equal.html)

Return (x1 <= x2) element-wise.

Unlike `numpy.less_equal`, this comparison is performed by first stripping whitespace characters from the end of the string. This behaviour is provided for backward-compatibility with numarrary.

In [153]:
np.char.less_equal(x1, x2)

array([ True,  True])

In [152]:
np.char.less_equal(x1, np.char.title(x2))

array([ True,  True])

[`np.char.greater`](https://numpy.org/doc/stable/reference/generated/numpy.char.greater.html)

Return (x1 > x2) element-wise.

In [154]:
np.char.greater(x1, x2)

array([False, False])

In [155]:
np.char.greater(x1, np.char.title(x2))

array([False, False])

[`np.char.less`](https://numpy.org/doc/stable/reference/generated/numpy.char.less.html)

Return (x1 < x2) element-wise.

In [156]:
np.char.less(x1, x2)

array([ True,  True])

In [157]:
np.char.less(x1, np.char.title(x2))

array([False, False])

[`np.char.compare_chararrays`](https://numpy.org/doc/stable/reference/generated/numpy.char.compare_chararrays.html)

Performs element-wise comparison of two string arrays using the comparison operator specified by *cmp_op*.

In [159]:
a = np.array(["a", "b", "cde"])
b = np.array(["a", "a", "dec"])
np.char.compare_chararrays(a, b, ">", True)

array([False,  True, False])

## String information

## Convenience class