# Regular Expressions: Groups

Suppose that we'd like to extract some email addresses from a body of text. For example: 

> You can reach me at phil@math.ucla.edu, or my friend Jean-Luc at picard@ucla.edu. 

We'd like to extract the usernames and domains of each of these two email addresses. 

In [1]:
s = "You can reach me at phil@math.ucla.edu, or my friend Jean-Luc at Picard10@ucla.edu."
s

'You can reach me at phil@math.ucla.edu, or my friend Jean-Luc at Picard10@ucla.edu.'

For this we can use **groups**. Groups allow us to give names to "parts" of matches, enabling further processing. 

Intuitively, we are looking for: 

1. **The username**: A sequence of one or more letters and numbers, followed by 
2. An `@` symbol, followed by  
3. **The domain:** another sequence of characters, numbers, or the symbol `.`.
4. We should not include the final `.` in the domain name for Picard. 

To see how groups work, let's take a look at an interactive demonstration in [Pythex](https://pythex.org/). 

In [2]:
import re

In [5]:
pattern = r"[A-z0-9]+@[a-z\.]+[a-z]+"
result = re.search(pattern, s)
result

<re.Match object; span=(20, 38), match='phil@math.ucla.edu'>

In [7]:
result.group(), result.groups()

('phil@math.ucla.edu', ())

In [8]:
pattern = r"([A-z0-9]+)@([a-z\.]+[a-z]+)"
result = re.search(pattern, s)

In [10]:
result.groups()

('phil', 'math.ucla.edu')

('phil', 'math.ucla.edu')

We can alternatively use `re.findall()`, which returns a list of tuples. Each entry of the tuple corresponds to a group. 

In [11]:
re.findall(pattern, s)

[('phil', 'math.ucla.edu'), ('Picard10', 'ucla.edu')]