# What is a regular expression?

A regular expression is a sequence of characters that define a search pattern. Usually this pattern is then used by string searching algorithms for "find" or "find and replace" operations on strings. Another name is regex or regexp, that is sometimes used in source code editors and programming languages.

An example can be the necessity to find all the email addresses in a text document. A regular expression can be used to define a pattern for searching, that can be used by the search algorithm to find matches, like email addresses, telephone numbers or URLs in a document.

____________

### ___Regex patterns___

A regular expression is a pattern that is matched against a subject string from left to right. Most characters stand for themselves in a pattern, and match the corresponding characters in the subject. As a simple example, the pattern `the` will match any string containing the substring `the` such as `the`, `them`, or `therefore`.

In [1]:
text = "The agent's phone number is 408-555-1234. Call soon!"

'phone' in text

True

__________________

### ___re.search()___

The `re` library in Python provides several functions that make it a skill worth mastering. The most commonly used function is `re.search()`. This function will search the entire string and return the first match. If no match is found, it returns `None`. To use `re.search()` we need to import the library.

In [2]:
import re

pattern = 'phone'

re.search(pattern, text)

<re.Match object; span=(12, 17), match='phone'>

In [3]:
text[12:17] # the slice of the text where the pattern was found

'phone'

In [4]:
pattern = 'NOT IN TEXT'

re.search(pattern, text) # returns nothing if the pattern is not found

In [5]:
pattern = 'phone'

match = re.search(pattern, text)

match

<re.Match object; span=(12, 17), match='phone'>

__________________

### ___match.span()___

The `span()` method of match object returns the start and end indices of the match as a tuple.

In [6]:
match.span()

(12, 17)

__________________

### ___match.start()___

The `start()` method returns the index of the start of the matched substring. Equivalent to `match.span()[0]`.

In [7]:
match.start()

12

__________________

### ___match.end()___

The `end()` method returns the index of the end of the matched substring. Equivalent to `match.span()[1]`.

In [8]:
match.end()

17

In [9]:
text = 'my phone once, my phone twice'

match = re.search(pattern, text)

match # only the first instance is returned

<re.Match object; span=(3, 8), match='phone'>

__________________

### ___re.findall()___

The `re.findall()` method returns a list of all matches.

In [10]:
matches = re.findall(pattern, text)

matches # returns all instances

['phone', 'phone']

__________________

### ___re.finditer()___

The `re.finditer()` method returns an iterator of match objects.

In [11]:
for match in re.finditer(pattern, text):
    print(match.span())

(3, 8)
(18, 23)


__________________

### ___match.group()___

The `group()` method returns the string matched by the re. Equivalent to `match[0]`.

In [12]:
for match in re.finditer(pattern, text):
    print(match.group())

phone
phone
