A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.

Python has a built-in package called re, which can be used to work with Regular Expressions.

In [1]:
import re

![Screenshot%20%28305%29.png](attachment:Screenshot%20%28305%29.png)

In [10]:
# findall

In [2]:
txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

['ai', 'ai']


In [11]:
# search

In [3]:
txt = "The rain in Spain"
x = re.search(" ", txt)

In [6]:
x

<re.Match object; span=(3, 4), match=' '>

In [7]:
print("The first white-space character is located in position:", x.start())

The first white-space character is located in position: 3


In [12]:
# split

In [9]:
txt = "The rain in Spain"
x = re.split(" ", txt)
print(x)

['The', 'rain', 'in', 'Spain']


In [13]:
## Note : We can control the number of occurrences by specifying the maxsplit parameter.

txt = "The rain in Spain"
x = re.split(" ", txt, 1)
print(x)

['The', 'rain in Spain']


In [14]:
# sub

In [15]:
txt = "The rain in Spain"
x = re.sub("in", "9", txt)
print(x)

The ra9 9 Spa9


In [16]:
txt = "The rain in Spain"
x = re.sub("in", "9", txt, 2)
print(x)

The ra9 9 Spain


NOTE: 
#### Match Object : A Match Object is an object containing information about the search and the result.

In [17]:
txt = "The rain in Spain"
x = re.search("ai", txt)
print(x)

<re.Match object; span=(5, 7), match='ai'>


In [19]:
# The Match object has properties and methods used to retrieve information about the search, and the result:

# .span() returns a tuple containing the start-, and end positions of the match.
# .string returns the string passed into the function
# .group() returns the part of the string where there was a match

# Example : The regular expression looks for any words that starts with an upper case "S"

In [23]:
txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.span())

(12, 17)


In [24]:
txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.string)

The rain in Spain


In [29]:
txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.group())

Spain


## SETS

![Screenshot%20%28306%29-2.png](attachment:Screenshot%20%28306%29-2.png)

In [30]:
txt = "The rain in Spain"

#Check if the string has any a, r, or n characters:
x = re.findall("[arn]", txt)
print(x)

['r', 'a', 'n', 'n', 'a', 'n']


In [31]:
txt = "8 times before 11:45 AM"

#Check if the string has any characters from a to z lower case, and A to Z upper case:
x = re.findall("[a-zA-Z]", txt)
print(x)

['t', 'i', 'm', 'e', 's', 'b', 'e', 'f', 'o', 'r', 'e', 'A', 'M']


In [32]:
txt = "8 times before 11:45 AM"

#Check if the string has any two-digit numbers, from 00 to 59:
x = re.findall("[0-5][0-9]", txt)
print(x)

['11', '45']


In [33]:
txt = "8 times before 11:45 AM+PM"

#Check if the string has any + characters:
x = re.findall("[+]", txt)
print(x)

['+']


## Metacharacters

![Screenshot%20%28308%29.png](attachment:Screenshot%20%28308%29.png)

In [34]:
txt = "That will be 59 dollars"

#Find all digit characters:
x = re.findall("\d", txt)
print(x)

['5', '9']


In [40]:
txt_1 = "Hello, World! How are you?"
txt_2 = "Hello, World!"

x = re.findall("^Hello, World!$", txt_1)
y = re.findall("^Hello, World!$", txt_2)

# the regular expression will match the entire line that contains only the string "Hello, World!", 
# with no characters before or after it

print(x)
print(y)

[]
['Hello, World!']


In [42]:
txt = "he is going to the zoo"

# Search for a sequence that starts with "is" followed by 1 or more (any) characters,
# and ends ($) with "o":

x = re.findall("is.+o$", txt)
print(x)

['is going to the zoo']


In [43]:
txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains either "falls" or "stays":
x = re.findall("falls|stays", txt)
print(x)

['falls']


## Special Sequences

![Screenshot%20%28307%29.png](attachment:Screenshot%20%28307%29.png)

In [44]:
txt = "The rain in Spain"

#Check if the string starts with "The":
x = re.findall("\AThe", txt)
print(x)

['The']


In [52]:
txt = "Gain more knowledge to gain confidence"

# Check if "now" is present, but NOT at the beginning of a word:
x = re.findall(r"\Bnow\w*", txt)
print(x)

['nowledge']


In [53]:
txt = "Gain more knowledge to gain confidence"

#Return a match at every word character (characters from a to Z, digits from 0-9, and the underscore _ character):
x = re.findall("\w", txt)
print(x)

['G', 'a', 'i', 'n', 'm', 'o', 'r', 'e', 'k', 'n', 'o', 'w', 'l', 'e', 'd', 'g', 'e', 't', 'o', 'g', 'a', 'i', 'n', 'c', 'o', 'n', 'f', 'i', 'd', 'e', 'n', 'c', 'e']


In [54]:
txt = "The rain in Spain"

#Check if the string ends with "Spain":
x = re.findall("Spain\Z", txt)
print(x)

['Spain']
