# Python RegEx
- A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

- RegEx can be used to check if a string contains the specified search pattern.

# RegEx Module
- Python has a built-in package called re, which can be used to work with Regular Expressions.

## RegEx in Python
- When you have imported the re module, you can start using regular expressions:

- Search the string to see if it starts with "The" and ends with "Spain":

In [1]:
import re

# Check if the string starts with "The" and ends with "Spain":

txt = "The rain in Spain"
x = re.search("^The.*Spain$",txt)

if x:
    print("YES! We have a match!")
else:
    print("No match")

YES! We have a match!


## The findal() Function
- The findall() function returns a list containing all matches.

In [5]:
import re

txt = "The rain in Spain"
x = re.findall("ai", txt)

print(x)

['ai', 'ai']


- The list contains the matches in the order they are found.

- If no matches are found, an empty list is returned:

In [8]:
import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)

print(x)

[]


## The search() Function
- The search() function searches the string for a match, and returns a Match object if there is a match.

- If there is more than one match, only the first occurrence of the match will be returned:

In [9]:
import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("The first white-space character in located in position:",
      x.start())

The first white-space character in located in position: 3


  x = re.search("\s", txt)


- If no matches are found, the value None is returned:

In [10]:
import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)

print(x)

None


## The split() Function
- The split() function returns a list where the string has been split at each match:

- Split at each white-space character:

In [11]:
import re

txt = "The rain in Spain"
x = re.split("\s", txt)

print(x)

['The', 'rain', 'in', 'Spain']


  x = re.split("\s", txt)


- You can control the number of occurrences by specifying the maxsplit parameter:

In [14]:
import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)

print(x)

['The', 'rain in Spain']


  x = re.split("\s", txt, 1)


## The sub() Function
- The sub() function replaces the matches with the text of your choice:

Replace every white-space character with the number 9:

In [None]:
import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
# x = re.sub("\s", "-", txt)

print(x)

The9rain9in9Spain


  x = re.sub("\s", "9", txt)


- You can control the number of replacements by specifying the count parameter:

- Replace the first 2 occurrences:

In [22]:
import re
txt = "The rain in Spain"
x = re.sub("\s", "-", txt, 2)

print(x)

The-rain-in Spain


  x = re.sub("\s", "-", txt, 2)


## Match Object
- A Match Object is an object containing information about the search and the result.

- Do a search that will return a Match Object:

In [23]:
import re

txt = "The rain in Spain"
x = re.search("ai", txt)

print(x)

<re.Match object; span=(5, 7), match='ai'>


- The Match object has properties and methods used to retrieve information about the search, and the result:

- .span() returns a tuple containing the start-, and end positions of the match.
- .string returns the string passed into the function
- .group() returns the part of the string where there was a match

- Print the position (start- and end-position) of the first match occurrence.

- The regular expression looks for any words that starts with an upper case "S":

In [25]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)

print(x.span())

(12, 17)


- Print the string passed into the function:

In [26]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)

print(x.string)

The rain in Spain


- Print the part of the string where there was a match.

- The regular expression looks for any words that starts with an upper case "S":

In [27]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)

print(x.group())

Spain
