# Regular Expression Pattern

## Character Identifiers

The following table lists the characters that can be used to define a pattern that matches a single character, witch means that the pattern will match any single character that appears in the pattern.

<table>
  <tr>
    <th style="text-align:center;">Character</th>
    <th style="text-align:center;">Description</th>
    <th style="text-align:center;">Example Pattern Code</th>
    <th style="text-align:center;">Example Match</th>
  </tr>
  <tr>
    <td style="text-align:center;">\d</td>
    <td style="text-align:left;">A digit</td>
    <td style="text-align:center;">file_\d\d</td>
    <td style="text-align:center;">file_25</td>
  </tr>
  <tr>
    <td style="text-align:center;">\w</td>
    <td style="text-align:left;">Alphanumeric</td>
    <td style="text-align:center;">\w-\w\w\w</td>
    <td style="text-align:center;">A-b_1</td>
  </tr>
  <tr>
    <td style="text-align:center;">\s</td>
    <td style="text-align:left;">White space</td>
    <td style="text-align:center;">a\sb\sc</td>
    <td style="text-align:center;">a b c</td>
  </tr>
  <tr>
    <td style="text-align:center;">\D</td>
    <td style="text-align:left;">A non digit</td>
    <td style="text-align:center;">\D\D\Ð</td>
    <td style="text-align:center;">ABC</td>
  </tr>
  <tr>
    <td style="text-align:center;">\W</td>
    <td style="text-align:left;">Non-alphanumeric</td>
    <td style="text-align:center;">\W\W\W\W\W</td>
    <td style="text-align:center;">*-+=)</td>
  </tr>
  <tr>
    <td style="text-align:center;">\S</td>
    <td style="text-align:left;">Non-whitespace</td>
    <td style="text-align:center;">\S\S\S\S</td>
    <td style="text-align:center;">Yoyo</td>
  </tr>
</table>


In [1]:
import re

text = "The phone number of the agent is 408-555-1234. Call soon!"

phone = re.search("408-555-1234", text) # search for the pattern in the text

phone

<re.Match object; span=(33, 45), match='408-555-1234'>

In [2]:
phone = re.search(r"\d\d\d-\d\d\d-\d\d\d\d", text) # search for the pattern in the text using regular expression, that intellicently search for the pattern, although the pattern is not exactly the same as the text.

phone

<re.Match object; span=(33, 45), match='408-555-1234'>

In [3]:
phone.group() # return the matched pattern

'408-555-1234'

## Quantifiers

Now, if we want to search for a pattern that has a variable length, we can use quantifiers. The following table lists the characters that can be used to define a pattern that matches a variable length string.

<table>
  <tr>
    <th style="text-align:center;">Character</th>
    <th style="text-align:center;">Description</th>
    <th style="text-align:center;">Example Pattern Code</th>
    <th style="text-align:center;">Example Match</th>
  </tr>
  <tr>
    <td style="text-align:center;">+</td>
    <td style="text-align:left;">Occurs one or more times</td>
    <td style="text-align:center;">Version \w-\w+</td>
    <td style="text-align:center;">Version A-b1_1</td>
  </tr>
  <tr>
    <td style="text-align:center;">{3}</td>
    <td style="text-align:left;">Occurs exactly 3 times</td>
    <td style="text-align:center;">\D{3}</td>
    <td style="text-align:center;">ABC</td>
  </tr>
  <tr>
    <td style="text-align:center;">{2,4}</td>
    <td style="text-align:left;">Occurs 2 to 4 times</td>
    <td style="text-align:center;">\d{2,4}</td>
    <td style="text-align:center;">123</td>
  </tr>
  <tr>
    <td style="text-align:center;">{3,}</td>
    <td style="text-align:left;">Occurs 3 or more</td>
    <td style="text-align:center;">\w{3,}</td>
    <td style="text-align:center;">anycharacters</td>
  </tr>
  <tr>
    <td style="text-align:center;">\*</td>
    <td style="text-align:left;">Occurs zero or more times</td>
    <td style="text-align:center;">A\*B\*C*</td>
    <td style="text-align:center;">AAACC</td>
  </tr>
  <tr>
    <td style="text-align:center;">?</td>
    <td style="text-align:left;">Once or none</td>
    <td style="text-align:center;">plurals?</td>
    <td style="text-align:center;">plural</td>
  </tr>

In [4]:
phone = re.search(r"\d{3}-\d{3}-\d{4}", text) # Using quantifiers to search for the pattern in the text using regular expression, we can reduce the code and make it more readable.

phone

<re.Match object; span=(33, 45), match='408-555-1234'>

### ___re.compile()___

We can also use the ___re.compile()___ function to create a pattern object, which can be used later to search for the pattern in the text.

In [5]:
phone_pattern = re.compile(r"(\d{3})-(\d{3})-(\d{4})") # create a pattern object

results = re.search(phone_pattern, text) # search for the pattern in the text using the pattern object

results

<re.Match object; span=(33, 45), match='408-555-1234'>

In [6]:
results.group() # return the matched pattern

'408-555-1234'

In [7]:
results.group(1) # return the first group of the matched pattern

'408'

In [8]:
results.group(2) # return the second group of the matched pattern

'555'

In [9]:
results.group(3) # return the third group of the matched pattern

'1234'