# 03.10 - Vectorized String Operations

In order to efficiently manipulate strings, Pandas goes one step further introducing **vectorized string operations**. 

### Introducing Pandas String Operations

Here is a vectorized arithmetic operation:

In [1]:
import numpy as np
x = np.array([2, 3, 5, 7, 11, 13])
x * 2

array([ 4,  6, 10, 14, 22, 26])

Pandas simply extends the concept of vectorization to strings:

In [5]:
import pandas as pd
data = ['peter', 'Paul', None, 'MARY', 'gUIDO']
names = pd.Series(data)
names

0    peter
1     Paul
2     None
3     MARY
4    gUIDO
dtype: object

For example, here is how we capitalize the Series above:

In [6]:
names.str.capitalize()

0    Peter
1     Paul
2     None
3     Mary
4    Guido
dtype: object

We can also check all the availble vectorized string operations using <code>tab completion</code> after <code>str</code>. 

### Tables of Pandas String Methods

To illustrate some of the key methods available, we will be using the following series of names: 

In [8]:
monte = pd.Series(['Graham Chapman', 'John Cleese', 'Terry Gilliam',
                   'Eric Idle', 'Terry Jones', 'Michael Palin'])

#### Methods similar to Python string methods

Most of Pandas string methods have an equivalent Python built-in string method:

<pre>

len() 	  lower() 	translate() 	islower()
ljust() 	upper() 	startswith() 	isupper()
rjust() 	find() 	endswith() 	isnumeric()
center()    rfind() 	isalnum() 	isdecimal()
zfill() 	index() 	isalpha() 	split()
strip() 	rindex() 	isdigit() 	rsplit()
rstrip()    capitalize() 	isspace() 	partition()
lstrip()    swapcase() 	istitle() 	rpartition()

</pre>

#### Methods using regular expressions

Additionally, there are several methods that accept regular expressions to examine the content of each string, following the conventions of Python's <code>re</code> module:

<pre>

Method 	Description

match() 	  Call re.match() on each element, returning a boolean.
extract() 	Call re.match() on each element, returning matched groups as strings.
findall() 	Call re.findall() on each element
replace() 	Replace occurrences of pattern with some other string
contains()    Call re.search() on each element, returning a boolean
count() 	  Count occurrences of pattern
split() 	  Equivalent to str.split(), but accepts regexps
rsplit() 	 Equivalent to str.rsplit(), but accepts regexps

</pre>

With this, we can filter for specific criteria using regexs, for example checking all the names starting _and_ ending with a consonant:

In [10]:
monte.str.findall(r'^[^AEIOU].*[^aeiou]$')

0    [Graham Chapman]
1                  []
2     [Terry Gilliam]
3                  []
4       [Terry Jones]
5     [Michael Palin]
dtype: object

#### Miscellaneous methods

<pre>
Method 	Description

get() 	       Index each element
slice() 	     Slice each element
slice_replace()  Replace slice in each element with passed value
cat() 	       Concatenate strings
repeat() 	    Repeat values
normalize() 	 Return Unicode form of string
pad() 	       Add whitespace to left, right, or both sides of strings
wrap() 	      Split long strings into lines with length less than a given width
join() 	      Join strings in each element of the Series with passed separator
get_dummies()    Extract dummy variables as a dataframe
</pre>