# Python RegEx

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.



# RegEx Module

Python has a built-in package called `re`, which can be used to work with Regular Expressions.

Import the re module:

In [1]:
import re

# RegEx in Python

When you have imported the re module, you can start using regular expressions:

example for search string start with the and end with Spain


In [2]:
import re

#Check if the string starts with "The" and ends with "Spain":

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

if x:
  print("YES! We have a match!")
else:
  print("No match")


YES! We have a match!


In [11]:
import re
str="brahma reddy is working in the soft suave technologies.brahma"
x=re.findall("b+rahma",str)
for y in x:
    print(y)

brahma
brahma


# RegEx Functions

The re module offers a set of functions that allows us to search a string for a match:

<table class="ws-table-all notranslate">
<tbody><tr>
<th style="width:120px">Function</th>
<th>Description</th>
</tr>
<tr>
<td><a href="#findall">findall</a></td>
<td>Returns a list containing all matches</td>
</tr>
<tr>
<td><a href="#search">search</a></td>
<td>Returns a <a href="#matchobject">Match object</a> if there is a match anywhere in the string</td>
</tr>
<tr>
<td><a href="#split">split</a></td>
<td>Returns a list where the string has been split at each match </td>
</tr>
<tr>
<td><a href="#sub">sub</a></td>
<td>Replaces one or many matches with a string</td>
</tr>
</tbody></table>

# Metacharacters

Metacharacters are characters with a special meaning:

<table class="ws-table-all notranslate">
<tbody><tr>
<th style="width:120px">Character</th>
<th>Description</th>
<th style="width:120px">Example</th>
<th style="width:75px">Try it</th>
</tr>
<tr>
<td>[]</td>
<td>A set of characters</td>
<td>"[a-m]"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_meta1">Try it »</a></td>
</tr>
<tr>
<td>\</td>
<td>Signals a special sequence (can also be used to escape special characters)</td>
<td>"\d"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_meta2">Try it »</a></td>
</tr>
<tr>
<td>.</td>
<td>Any character (except newline character)</td>
<td>"he..o"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_meta3">Try it »</a></td>
</tr>
<tr>
<td>^</td>
<td>Starts with</td>
<td>"^hello"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_meta4">Try it »</a></td>
</tr>
  <tr>
<td>$</td>
<td>Ends with</td>
<td>"planet$"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_meta5">Try it »</a></td>
  </tr>
  <tr>
<td>*</td>
<td>Zero or more occurrences</td>
<td>"he.*o"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_meta6">Try it »</a></td>
  </tr>
  <tr>
<td>+</td>
<td>One or more occurrences</td>
<td>"he.+o"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_meta7">Try it »</a></td>
  </tr>
  <tr>
<td>?</td>
<td>Zero or one occurrences</td>
<td>"he.?o"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_meta10">Try it »</a></td>
  </tr>
  <tr>
<td>{}</td>
<td>Exactly the specified number of occurrences</td>
<td>"he.{2}o"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_meta8">Try it »</a></td>
  </tr>
  <tr>
<td>|</td>
<td>Either or</td>
<td>"falls|stays"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_meta9">Try it »</a></td>
  </tr>
  <tr>
<td>()</td>
<td>Capture and group</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
  </tr>
</tbody></table>

In [13]:
# [] set of characters
import re
x="brahma reddy"
y=re.findall("[a-b]",x)
print(y)

['b', 'a', 'a']


In [14]:
# \ signals a special sequence
import re

txt = "That will be 59 dollars"

#Find all digit characters:

x = re.findall("\d", txt)
print(x)


['5', '9']


In [15]:
# .(about any character except special character)
import re

txt = "hello planet"

#Search for a sequence that starts with "he", followed by two (any) characters, and an "o":

x = re.findall("he..o", txt)
print(x)


['hello']


In [16]:
# ^ start with regex
import re

txt = "hello planet"

#Check if the string starts with 'hello':

x = re.findall("^hello", txt)
if x:
  print("Yes, the string starts with 'hello'")
else:
  print("No match")


Yes, the string starts with 'hello'


In [17]:
# $ ends with regex
import re

txt = "hello planet"

#Check if the string ends with 'planet':

x = re.findall("planet$", txt)
if x:
  print("Yes, the string ends with 'planet'")
else:
  print("No match")


Yes, the string ends with 'planet'


In [18]:
# * Zero or more occurences  
import re

txt = "hello planet"

#Search for a sequence that starts with "he", followed by 0 or more  (any) characters, and an "o":

x = re.findall("he.*o", txt)

print(x)

['hello']


In [23]:
# + one or more occurences
import re
x="brahma"
y=re.findall("b.+ahma",x)
print(y)

['brahma']


In [29]:
# ? zero or one occurences
import re
x="braahma"
y=re.findall("br.?hma",x)
print(y)

[]


In [30]:
import re

txt = "hello planet"

#Search for a sequence that starts with "he", followed excactly 2 (any) characters, and an "o":

x = re.findall("he.{2}o", txt)

print(x)


['hello']


In [31]:
import re

txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains either "falls" or "stays":

x = re.findall("falls|stays", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")


['falls']
Yes, there is at least one match!


# Special Sequences

A special sequence is a `\` followed by one of the characters in the list below, and has a special meaning:

<table class="ws-table-all notranslate">
<tbody><tr>
<th style="width:120px">Character</th>
<th>Description</th>
<th style="width:120px">Example</th>
<th style="width:75px">Try it</th>
</tr>
<tr>
<td>\A</td>
<td>Returns a match if the specified characters are at the beginning of the 
string</td>
<td>"\AThe"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq1">Try it »</a></td>
</tr>
  <tr>
<td>\b</td>
<td>Returns a match where the specified characters are at the beginning or at the 
end of a word<br>(the "r" in the beginning is making sure that the string is 
being treated as a "raw string")</td>
<td>r"\bain"<br>r"ain\b"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq2">Try it »</a><br>
<a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq2-2">Try it »</a></td>
  </tr>
  <tr>
<td>\B</td>
<td>Returns a match where the specified characters are present, but NOT at the beginning 
(or at 
the end) of a word<br>(the "r" in the beginning is making sure that the string 
is being treated as a "raw string")</td>
<td>r"\Bain"<br>r"ain\B"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq3">Try it »</a><br>
<a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq3-2">Try it »</a></td>
  </tr>
  <tr>
<td>\d</td>
<td>Returns a match where the string contains digits (numbers from 0-9)</td>
<td>"\d"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq4">Try it »</a></td>
  </tr>
  <tr>
<td>\D</td>
<td>Returns a match where the string DOES NOT contain digits</td>
<td>"\D"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq5">Try it »</a></td>
  </tr>
  <tr>
<td>\s</td>
<td>Returns a match where the string contains a white space character</td>
<td>"\s"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq6">Try it »</a></td>
  </tr>
  <tr>
<td>\S</td>
<td>Returns a match where the string DOES NOT contain a white space character</td>
<td>"\S"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq7">Try it »</a></td>
  </tr>
  <tr>
<td>\w</td>
<td>Returns a match where the string contains any word characters (characters from 
a to Z, digits from 0-9, and the underscore _ character)</td>
<td>"\w"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq8">Try it »</a></td>
  </tr>
  <tr>
<td>\W</td>
<td>Returns a match where the string DOES NOT contain any word characters</td>
<td>"\W"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq9">Try it »</a></td>
  </tr>
<tr>
<td>\Z</td>
<td>Returns a match if the specified characters are at the end of the string</td>
<td>"Spain\Z"</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_seq10">Try it »</a></td>
</tr>
</tbody></table>

In [32]:
import re

txt = "The rain in Spain"

#Check if the string starts with "The":

x = re.findall("\AThe", txt)

print(x)

if x:
  print("Yes, there is a match!")
else:
  print("No match")

['The']
Yes, there is a match!


In [33]:
import re

txt = "The rain in Spain"

#Check if "ain" is present at the beginning of a WORD:

x = re.findall(r"\bain", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")


[]
No match


In [34]:
import re

txt = "The rain in Spain"

#Check if "ain" is present at the end of a WORD:

x = re.findall(r"ain\b", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

['ain', 'ain']
Yes, there is at least one match!


In [35]:
import re

txt = "The rain in Spain"

#Check if "ain" is present, but NOT at the beginning of a word:

x = re.findall(r"\Bain", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")


['ain', 'ain']
Yes, there is at least one match!


In [36]:
import re

txt = "The rain in Spain"

#Check if "ain" is present, but NOT at the end of a word:

x = re.findall(r"ain\B", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

[]
No match


In [37]:
import re

txt = "The rain in Spain"

#Check if the string contains any digits (numbers from 0-9):

x = re.findall("\d", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

[]
No match


In [38]:
import re

txt = "The rain in Spain"

#Return a match at every white-space character:

x = re.findall("\s", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")


[' ', ' ', ' ']
Yes, there is at least one match!


In [39]:
import re

txt = "The rain in Spain"

#Return a match at every NON white-space character:

x = re.findall("\S", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")


['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n']
Yes, there is at least one match!


In [40]:
import re

txt = "The rain in Spain"

#Return a match at every word character (characters from a to Z, digits from 0-9, and the underscore _ character):

x = re.findall("\w", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n']
Yes, there is at least one match!


In [41]:
import re

txt = "The rain in Spain"

#Return a match at every NON word character (characters NOT between a and Z. Like "!", "?" white-space etc.):

x = re.findall("\W", txt)

print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")


[' ', ' ', ' ']
Yes, there is at least one match!


In [42]:
import re

txt = "The rain in Spain"

#Check if the string ends with "Spain":

x = re.findall("pain\Z", txt)

print(x)

if x:
  print("Yes, there is a match!")
else:
  print("No match")


['pain']
Yes, there is a match!


# Sets

A set is a set of characters inside a pair of square brackets [] with a special meaning:

<table class="ws-table-all notranslate">
<tbody><tr>
<th style="width:120px">Set</th>
<th>Description</th>
<th style="width:75px">Try it</th>
</tr>
  <tr>
<td>[arn]</td>
<td>Returns a match where one of the specified characters (<code class="w3-codespan">a</code>,
<code class="w3-codespan">r</code>, or <code class="w3-codespan">n</code>) is 
present</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_set1">Try it »</a></td>
  </tr>
  <tr>
<td>[a-n]</td>
<td>Returns a match for any lower case character, alphabetically between
<code class="w3-codespan">a</code> and <code class="w3-codespan">n</code></td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_set2">Try it »</a></td>
  </tr>
  <tr>
<td>[^arn]</td>
<td>Returns a match for any character EXCEPT <code class="w3-codespan">a</code>,
<code class="w3-codespan">r</code>, and <code class="w3-codespan">n</code></td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_set3">Try it »</a></td>
  </tr>
  <tr>
<td>[0123]</td>
<td>Returns a match where any of the specified digits (<code class="w3-codespan">0</code>,
<code class="w3-codespan">1</code>, <code class="w3-codespan">2</code>, or <code class="w3-codespan">
3</code>) are 
present</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_set4">Try it »</a></td>
  </tr>
  <tr>
<td>[0-9]</td>
<td>Returns a match for any digit between
<code class="w3-codespan">0</code> and <code class="w3-codespan">9</code></td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_set5">Try it »</a></td>
  </tr>
<tr>
<td>[0-5][0-9]</td>
<td>Returns a match for any two-digit numbers from <code class="w3-codespan">00</code> and <code class="w3-codespan">
59</code></td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_set6">Try it »</a></td>
</tr>
  <tr>
<td>[a-zA-Z]</td>
<td>Returns a match for any character alphabetically between
<code class="w3-codespan">a</code> and <code class="w3-codespan">z</code>, lower case OR upper case</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_set7">Try it »</a></td>
  </tr>
  <tr>
<td>[+]</td>
<td>In sets, <code class="w3-codespan">+</code>, <code class="w3-codespan">*</code>,
<code class="w3-codespan">.</code>, <code class="w3-codespan">|</code>,
<code class="w3-codespan">()</code>, <code class="w3-codespan">$</code>,<code class="w3-codespan">{}</code> 
has no special meaning, so <code class="w3-codespan">[+]</code> means: return a match for any
<code class="w3-codespan">+</code> character in the string</td>
<td><a target="_blank" class="w3-btn btnsmall btnsmall" href="trypython.asp?filename=demo_regex_set8">Try it »</a></td>
  </tr>
</tbody></table>

# The findall() Function

In [43]:
import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

['ai', 'ai']


The list contains the matches in the order they are found.

If no matches are found, an empty list is returned:

In [46]:
import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x)

[]


# The search() Function

The `search()` function searches the string for a match, and returns a Match object if there is a match.

If there is more than one match, only the first occurrence of the match will be returned:

In [47]:
import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("The first white-space character is located in position:", x.start())


The first white-space character is located in position: 3


If no matches are found, the value None is returned:

In [48]:
import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)

None


# The split() Function

The split() function returns a list where the string has been split at each match:

import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

You can control the number of occurrences by specifying the `maxsplit` parameter:

In [50]:
import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)

['The', 'rain in Spain']


# The sub() Function

The sub() function replaces the matches with the text of your choice:

In [51]:
import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

The9rain9in9Spain


You can control the number of replacements by specifying the count parameter:

In [52]:
import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

The9rain9in Spain


# Match Object

A Match Object is an object containing information about the search and the result.

Note: If there is no match, the value None will be returned, instead of the Match Object.

In [54]:
import re

txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) #this will print an object

<re.Match object; span=(5, 7), match='ai'>


The Match object has properties and methods used to retrieve information about the search, and the result:

<p>
<code class="w3-codespan">.span()</code> returns a tuple containing the start-, and end positions of the match.<br>
<code class="w3-codespan">.string</code> returns the string passed into the function<br>
<code class="w3-codespan">.group()</code> returns the part of the string where there was a match<br>
</p>

Print the position (start- and end-position) of the first match occurrence.

The regular expression looks for any words that starts with an upper case "S":

In [56]:
import re

#Search for an upper case "S" character in the beginning of a word, and print its position:

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.span())

(12, 17)


In [57]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.string)

The rain in Spain


Print the part of the string where there was a match.

The regular expression looks for any words that starts with an upper case "S":

In [58]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.group())

Spain


Note: If there is no match, the value None will be returned, instead of the Match Object.