# Advanced Regex

## 1. Capturing Groups

Up to now, we've used the `search` function to check if a string matched a certain pattern. But the only thing we've done with the result is print. Printing is useful when we want to see if a string matches a certain pattern. But most of the time, we want to take the information that we matched and use it for something else.

For example, we may want to extract the hostname or a process ID from a log line and use that value for another operation. For that we need to use a concept of regular expressions called **capturing groups**. 

Capturing groups are portions of the pattern that are enclosed in parentheses. Let's say that we have a list of people's full names. These names are stored as last name, comma, first name. We want to turn this around and create a string that starts with the first name followed by the last name. We can do this using a regular expression with capturing groups. Let's see how this works. 

First we'll create a matching pattern that matches a group of letters followed by a comma, a space, and then another group of letters. To capture our groups, we'll put each group of letters between parentheses like this.

In [2]:
import re
result = re.search(r'^(\w*), (\w*)$', 'Barreta, Trent')
result

<re.Match object; span=(0, 14), match='Barreta, Trent'>

Great, we have a match. Remember that `\w` will match letters, numbers, and underscores. The match object has more attributes and methods than the ones shown by print, so we are going to start using them now. Let's look at the output of the `groups` method.

In [3]:
print(result.groups())

('Barreta', 'Trent')


Because we defined two separate groups, the group method returns a tuple of two elements. We can also use indexing to access these groups. The first element contains the text matched by the entire regular expression. Each successive element contains the data that was matched by every subsequent match group. So let's look at the element at index 0.

In [4]:
print(result[0])

Barreta, Trent


That's the whole string. Now, the following index is correspond to each of the captured groups. Let's check this out.

In [5]:
print(result[1])
print(result[2])

Barreta
Trent


In [6]:
'{} {}'.format(result[2], result[1])

'Trent Barreta'

Okay, so now that we've got this more or less working, let's put this into a function that would do the rearranging for us. We'll start by defining a function called `rearrange_name`, that receives a name by parameter.

In [10]:
def rearrange_name(name):
    result = re.search(r'^(\w*), (\w*)$', name)
    if result is None:
        return name
    return '{} {}'.format(result[2], result[1])

In [11]:
rearrange_name('Barreta, Trent')

'Trent Barreta'

Cool, this seems to be working. But what if we give it something a little bit more challenging?

In [12]:
rearrange_name('Nguyen, Brian E.')

'Nguyen, Brian E.'

Now, the regular expression didn't match because we used the \w character, which only matches letters. And so it didn't recognize the middle initial as part of the given name. Can you figure out how to fix it? 

What we need to do here is add the extra characters that we want to allow in the names. In this example we'd want to add spaces and dots. For other names we can also include dashes. So after updating our pattern, this is what our function would look like.

In [13]:
def rearrange_name(name):
    result = re.search(r"^([\w \.-]*), ([\w \.-]*)$", name)
    if result is None:
        return name
    return '{} {}'.format(result[2], result[1])

In [14]:
rearrange_name('Nguyen, Brian E.')

'Brian E. Nguyen'