# Overview of Regular Expressions

Regular Expressions (sometimes called regex for short) allows a user to search for strings using almost any sort of rule they can come up. For example, finding all capital letters in a string, or finding a phone number in a document. 

Regular expressions are notorious for their seemingly strange syntax. This strange syntax is a byproduct of their flexibility. Regular expressions have to be able to filter out any string pattern you can imagine, which is why they have a complex string pattern format.

## Searching for Basic Patterns

Let's imagine that we have the following string:

In [3]:
text = "The agent's phone number is 408-555-1234. Call soon!"

We'll start off by trying to find out if the string "phone" is inside the text string. Now we could quickly do this with:

In [4]:
'phone' in text

True

But let's show the format for regular expressions, because later on we will be searching for patterns that won't have such a simple solution.

In [6]:
import re

In [8]:
pattern = 'phone'

In [10]:
re.search(pattern,text)

<_sre.SRE_Match object; span=(12, 17), match='phone'>

In [11]:
pattern = "NOT IN TEXT"

In [12]:
re.search(pattern,text)

Now we've seen that re.search() will take the pattern, scan the text, and then returns a Match object. If no pattern is found, a None is returned (in Jupyter Notebook this just means that nothing is output below the cell).

Let's take a closer look at this Match object.

In [13]:
pattern = 'phone'

In [14]:
match = re.search(pattern,text)

In [20]:
match

<_sre.SRE_Match object; span=(12, 17), match='phone'>

Notice the span, there is also a start and end index information.

In [21]:
match.span()

(12, 17)

In [18]:
match.start()

12

In [19]:
match.end()

17

But what if the pattern occurs more than once?

In [22]:
text = "my phone is a new phone"

In [23]:
match = re.search("phone",text)

In [24]:
match.span()

(3, 8)

Notice it only