# Regular Expressions: Groups

Suppose that we'd like to extract some email addresses from a body of text. For example: 

> Students should email me at michael.perlmutter.ucla@gmail.com. 
        Nonstudents should use perlmutter@math.ucla.edu.

We'd like to extract the usernames and domains of each of these two email addresses. 

In [4]:
s = """Students should email me at michael.perlmutter.ucla@gmail.com. 
        Nonstudents should use perlmutter@math.ucla.edu."""

For this we can use **groups**. Groups allow us to give names to "parts" of matches, enabling further processing. 

Intuitively, we are looking for: 

1. **The username**: A sequence of one or more letters and numbers, possibly with a period followed by 
2. An `@` symbol, followed by  
3. **The domain:** another sequence of characters, numbers, or the symbol `.`.
4. We should not include the final `.` in the domain name for the string above 

To see how groups work, let's take a look at an interactive demonstration in [Pythex](https://pythex.org/). 

In [5]:
#import 
import re

In [7]:
#Extract full email,
#one of more letters, numbers, and periods followed by at
#at is followed by more letters and periods, needs one letter after the last period
pattern=r"[A-z0-9.]+@[A-z0-9.]+[A-z]"
result=re.search(pattern,s)
result

<re.Match object; span=(28, 61), match='michael.perlmutter.ucla@gmail.com'>

We have extracted the first email, but now we need to create groups

In [9]:
#so far we don't have anything meaningful
result.group(), result.groups()

('michael.perlmutter.ucla@gmail.com', ())

We create groups with parentheses

In [10]:
pattern=r"([A-z0-9.]+)@([A-z0-9.]+[A-z])"
result=re.search(pattern,s)
result

<re.Match object; span=(28, 61), match='michael.perlmutter.ucla@gmail.com'>

In [12]:
result.group()

'michael.perlmutter.ucla@gmail.com'

In [13]:
result.groups()

('michael.perlmutter.ucla', 'gmail.com')

Now, if we use findall, we get 

In [14]:
re.findall(pattern,s)

[('michael.perlmutter.ucla', 'gmail.com'), ('perlmutter', 'math.ucla.edu')]