#### Pyhton RegEx

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.

#### RegEx Module
Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the `re`module:

In [1]:
import re

#### RegEx in Python
When you have imported the re module, you can start using regular expressions:

#### Example
Search the string to see if it starts with "The" and ends with "Spain":

In [2]:
import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

if x :
    print("Yes! We have a match!")
else:
    print("No match")

Yes! We have a match!


#### RegEx Functions
The re module offers a set of functions that allows us to search a string for a match:

* Function   ----	Description

* findall   ----	Returns a list containing all matches

* search   ----	Returns a Match object if there is a match anywhere in the string

* split   ----	Returns a list where the string has been split at each match

* sub   ----	Replaces one or many matches with a string


#### Difference between match, search and find all

* `Match :` function search the only beginning of the string.

* `Search :` function search in the entire  string and return match object from first match if there is more than one match.

* `Findall :` return the list of all the matching object.


#### Metacharacters
Metacharacters are characters with a special meaning:

`1. []	A set of characters	"[a-m]"`

In [2]:
import re

txt = "The ran in Spain"
#Find all lower case characters
#alphabetically between "a" and "m":

x = re.findall("[a-m]", txt)
print(x)

['h', 'e', 'a', 'i', 'a', 'i']


In [4]:
import re 

txt = "Hello World!"
patterm = "[A-Z]"

# find all uppercase 
x = re.findall(patterm , txt)

print(x)

['H', 'W']


`2. \	Signals a special sequence (can also be used to escape special characters)	"\d"`

In [5]:
import re 

txt  = "that will be 59 dollars"

# find the all digit characters:

x = re.findall("\d", txt)
print(x)

['5', '9']


In [5]:
import re 

txt  = "that will be 59 dollars"

# find the all digit characters:

x = re.findall("\d\d", txt)
print(x)

['59']


In [7]:
import re 

txt  = "that will be 59 dollars"

# find the all digit characters:

x = re.findall("\d+", txt)
print(x)

['59']


In [11]:
import re 

txt  = "that will be 59 dollars"

# find the all digit characters:

x = re.findall("\dd*", txt)
print(x)

['5', '9']


In [1]:
import re 

txt  = "that will be 5999 dollars"

# find the all digit characters:

x = re.findall("\d\d*", txt)
print(x)

['5999']


In [2]:
import re 

txt  = "that will be 5999 dollars"

# find the all digit characters:

x = re.findall("\d*", txt)
print(x)

['', '', '', '', '', '', '', '', '', '', '', '', '', '5999', '', '', '', '', '', '', '', '', '']


In [3]:
import re 

txt  = "that will be 5999 dollars"

# find the all digit characters:

x = re.findall("\d\d*", txt)
print(x)

['5999']


`3  .	Any character (except newline character)	"he..o"`

In [12]:
import re 
txt =  "Hello planet"
#Search for a sequence that starts with 
#"he", followed by two (any) characters,
# and an "o":

x = re.findall("He..o", txt)
print(x)

['Hello']


In [13]:
import re 

txt = "Hello guys into Python wolrd , hello"

x = re.findall("He..o|he..o", txt)

print(x)

['Hello', 'hello']


In [18]:
import re 

txt  = "that will be 59!8AAA7889dollars"

# find the all digit characters:

x = re.findall("\d", txt)  
print(x)

['5', '9', '8', '7', '8', '8', '9']


In [21]:
import re 

txt  = "that will be 59!8AAA7889dollars"

# find the all digit characters:

x = re.findall("\d.", txt) # any character except new line 
print(x) # be careful while searching for digit only and used '.' which means any character

['59', '8A', '78', '89']


`4.   ^	Starts with	"^hello"``

In [22]:
import re

txt = "hello planet"
# check if string start with "hello"

x = re.findall("^hello", txt)

if x:
    print("Yes, the string starts with 'hello'")
else:
    print("No match")

Yes, the string starts with 'hello'


`5. $	Ends with	"planet$"`

In [23]:
import re

# check if the string ends with 'planet'

txt = "hello planet"
x = re.findall("planet$", txt)

if x:
    print("Yes, the string end with 'Planet'")
else:
    print("No match")

Yes, the string end with 'Planet'


`6.  *	Zero or more occurrences	"he.*o"`

In [24]:
import re

txt = "hello planet"

# Search for a sequence that starts with 
# "he", followed by 0 or more  (any) 
# characters, and an "o":

x = re.findall("he.*o", txt)

print(x)

['hello']


In [25]:
import re 

txt  = "that will be 5979797 dollars"

# find the all digit characters:

x = re.findall("\d", txt)
print(x)

['5', '9', '7', '9', '7', '9', '7']


In [4]:
# but when we use '*' along with this 
import re 

txt  = "that will be 5979797 dollars"

# find the all digit characters:

x = re.findall("\d\d*", txt) # here this will search start with digit and 
                            # end with Zero or more occurrences till last digit 
print(x)

['5979797']


In [33]:
# but when we use '*' along with this 
import re 

txt  = "that will be 5979797 dollars"

# find the all digit characters:

x = re.findall("\d\d*", txt)  # here '*' means zero or more occurance.

print(x)

['5979797']


In [38]:
# but when we use '*' along with this 
import re 

txt  = "that will be 5979797 dollars"

# find the all digit characters:

x = re.findall("\d*", txt)  # here '*' means zero or more occurance.

print(x)

['', '', '', '', '', '', '', '', '', '', '', '', '', '5979797', '', '', '', '', '', '', '', '', '']


In [34]:
# but when we use '*' along with this 
import re 

txt  = "that will be 5979797 dollars"

# find the all digit characters:

x = re.findall("\d+", txt)  

print(x)

['5979797']


In [5]:
# but when we use '*' along with this 
import re 

txt  = "that will be 59!797!97 dollars"

# find the all digit characters:

x = re.findall("\d*7", txt) 
print(x)

['797', '97']


In [36]:
# but when we use '*' along with this 
import re 

txt  = "that will be 59!797!97 dollars"

# find the all digit characters:

x = re.findall("\d+7", txt) 
print(x)

['797', '97']


In [7]:
import re

txt = "heo planet helllll, hellolll heeeeeeeo"

# Search for a sequence that starts with 
# "he", followed by 0 or more  (any) 
# characters, and an "o":

x = re.findall("he*o", txt) # here 'e' will be zeor or as many as can be 

print(x)

['heo', 'heeeeeeeo']


As we can see 'he' followed by 0 or more any character till 'o' .

In [8]:
import re

txt = "heo planet helllll, heeeeellolll"

# Search for a sequence that starts with 
# "he", followed by 0 or more  (any) 
# characters, and an "o":

x = re.findall("he*", txt) 

print(x)

['he', 'he', 'heeeee']


In [4]:
import re

txt = "heeeo  planet hhhelllll, hellolll"

# Search for a sequence that starts with 
# "he", followed by 0 or more  (any) 
# characters, and an "o":

x = re.findall("he*", txt) 

print(x)

['heee', 'h', 'h', 'he', 'he']


`7.  +	One or more occurrences	"he.+o"`

In [9]:
import re
txt= "hello planet"

#Search for a sequence that starts with
# "he", followed by 1 or more  (any) 
# characters, and an "o":


x = re.findall("he.+o", txt)

print(x)

['hello']


In [30]:
import re

txt = "heo planet helll"

x = re.findall("he.+o", txt) # One or more occurrences and here we can see that no occurance of 
                             # any word that is why here Re engine return nothing.

print(x)

[]


In [31]:
import re

txt = "heo planet helll"

x = re.findall("he.*o", txt) # As we know that "*" uses for 'zero or more occurence' so Re engine
                             # will find 'heo' because there is zero occurance of other element.
    

print(x)

['heo']


In [33]:
import re

txt = "heo o planet helll" 

x = re.findall("he.+o", txt) 
# note - it will search start with "he" and will add a search till 
# the last 'o' it will find.

    
print(x) 

['heo o']


In [13]:
import re

txt = "heo planet helll hhhhhe"

x = re.findall("he+", txt) # without "."

print(x)

['he', 'he', 'he']


In [35]:
import re

txt = "heo planet helll hhhhhe"

x = re.findall("he.+", txt) # with "."

print(x)

['heo planet helll hhhhhe']


In [36]:
import re

txt = "heo planet helll hhhhhe"

x = re.findall("he", txt)

print(x)

['he', 'he', 'he']


`8.  ?	Zero or one occurrences	"he.?o"`

In [37]:
import re 

txt = "hello planet"

# Search for a sequence that starts
# with "he", followed by 0 or 1  (any) 
# character, and an "o":
x = re.findall("he.?o", txt)

print(x)

#This time we got no match, because there 
# were not zero, not one, but two characters
# between "he" and the "o"


[]


In [48]:
import re 

txt = "hello helo  heo planet"

#Search for a sequence that starts
#with "he", followed by 0 or 1  (any) 
# character, and an "o":
x = re.findall("he.?o", txt)

print(x)


['helo', 'heo']


this time we got a match because there is one character 'l' that followed 'he'

In [28]:
import re 

txt = "helo heooo oplanet"

#Search for a sequence that starts
#with "he", followed by 0 or 1  (any) 
# character, and an "o":
x = re.findall("he.?o", txt)

print(x) # this time we can see that it add first two 'o' as a search part anf third 'o' as last.

['helo', 'heoo']


`9. {}	Exactly the specified number of occurrences	"he.{2}o"`

In [12]:
import re

txt  = "hello planet"

#Search for a sequence that starts with
# "he", followed excactly 2 (any) characters,
# and an "o":

x = re.findall("he.{2}o", txt)

print(x)

['hello']


In [16]:
import re

txt = "hello planet"

x = re.findall("pl.{3}t", txt)

print(x)

['planet']


In [29]:
import re

txt = "hello planet, planet plssst"

x = re.findall("pl.{3}", txt)

print(x)

['plane', 'plane', 'plsss']


`10.  |	Either or	"falls|stays"`

In [17]:
import re

txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains either "falls" or "stays":

x = re.findall("falls|stays", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['falls']
Yes, there is at least one match!


`11.   ()	Capture and group`

In [16]:
import re

txt = "hello planet,hello "

x = re.findall("(hello planet)", txt)

print(x)

['hello planet']


In [17]:
import re

txt = "hello, planet,hello "

x = re.findall("(hello planet)", txt) # but when ',' in txt and in pattern it doesn't then let's 
                                      # see whet will will be the output

print(x)

[]


In [30]:
import re

txt = "Hope you are getting whatever you have been teached so far."

x = re.findall("(you have)", txt)

print(x)

['you have']


#### Special Sequences
`A special sequence is a '\' followed by one of the characters in the list below, and has a special meaning:`

`1. \A	Returns a match if the specified characters are at the beginning of the string	"\AThe"`

In [31]:
import re

txt = "The rain in Spain"

#Check if the string starts with "The":

x = re.findall("\AThe", txt)

print(x)

if x:
    print("Yes, there is a match!")
else:
    print("No match")


['The']
Yes, there is a match!


In [1]:
import re

txt = "rain in The Spain"

x = re.findall("\AThe", txt)

print(x)

if x:
    print("Yes, there is a match")
else:
    print("NO match")
    

[]
NO match


`2.   \b	Returns a match where the specified characters are at the beginning or at the end of a word  like \bain for beginning and ain\b for the end of the word 'ain'
(the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"\bain" or  r"ain\b"`

In [2]:
txt = "this is a broad road"
import re
re.sub(r"\broad", 'rd', txt)

'this is a broad rd'

In [3]:
txt = "this is a broad road"
import re
re.sub(r"road\b", 'rd', txt) # but when we put '\b' at end then it will take road whether it 
# it is single word or at end of the word


'this is a brd rd'

In [23]:
import re

txt  = "The rain in Spain"
# chack if 'ain' is present at the begigging
# of a WORD:

x = re.findall(r"\bain", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No Match")
    

[]
No Match


In [33]:
import re

txt  = "The ain in Spain"
# chack if 'ain' is present at the begigging
# of a WORD:

x = re.findall(r"\bain", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No Match")
    

['ain']
Yes, there is at least one match!


In [25]:
### for end of the word in string

import re 
txt = "The rain in Spain"

x = re.findall(r"ain\b", txt)

print(x)

if x:
    print("Yes, ther is at least one match")
else:
    print("NO match")

['ain', 'ain']
Yes, ther is at least one match


`3.   \B	Returns a match where the specified characters are present, but neither at the beginning nor at the end of a word.
(the "r" in the beginning is making sure that the string is being treated as a "raw string")	for beginning r"\Bain" and for end   r"ain\B"`

In [26]:
### For checking NOt at Beginnig

import re

txt = "the rain in Spain"
# Check if "ain" is present, but NOT at 
# the beginning of a word:

x = re.findall(r"\Bain", txt)

print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match ")

['ain', 'ain']
Yes, there is at least one match!


In [27]:
## check not at end

import re
txt = "The rain in Spain"

#Check if "ain" is present, but NOT at 
#the end of a word:

x = re.findall(r"ain\B", txt)

print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

[]
No match


In [28]:
## check not at end

import re
txt = "The remainder rain in Spain"

# here ain in middle of 'remainder'
# this wil give one match
x = re.findall(r"ain\B", txt)

print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['ain']
Yes, there is at least one match!


here 'ain' in the match list is the part of word 'remainder' because in this 'ain' is neither at beginning nor at end.

In [4]:
txt = "this is a broad road"
import re
re.sub(r"\Broad", 'rd', txt)

'this is a brd road'

In [5]:
txt = "this is a broad road"
import re
re.sub(r"road\B", 'rd', txt)

'this is a broad road'


`4. \d	Returns a match where the string contains digits (numbers from 0-9)	"\d"`

In [29]:
import re
txt = "The rain in Spain"

#Check if the string contains any digits
#(numbers from 0-9):
x = re.findall("\d", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")
    

[]
No match


In [30]:
import re
txt = "The rain123 in Spain"

#Check if the string contains any digits
#(numbers from 0-9):
x = re.findall("\d", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")
    

['1', '2', '3']
Yes, there is at least one match!


`5.  \D	Returns a match where the string DOES NOT contain digits	"\D"`

In [49]:
import re

txt = "Hello13231 Planet"
# return a macth at every no-didgit character

x = re.findall("\D", txt)
print(x)

if x:
    print("Yes, there is at least one match!")
    
else:
    print("No Match")
    

['H', 'e', 'l', 'l', 'o', ' ', 'P', 'l', 'a', 'n', 'e', 't']
Yes, there is at least one match!


In [50]:
import re

txt = "Hello_Planet122345670987"
# return a macth at every no-didgit character

x = re.findall("\D", txt)
print(x)

if x:
    print("Yes, there is at least one match!")
    
else:
    print("No Match")

['H', 'e', 'l', 'l', 'o', '_', 'P', 'l', 'a', 'n', 'e', 't']
Yes, there is at least one match!


`6.  \s	Returns a match where the string contains a white space character	"\s"`

In [51]:
import re

txt = "The rain in Spain"

#Return a match at every white-space
#character 

x = re.findall("\s", txt)

print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No Match")

[' ', ' ', ' ']
Yes, there is at least one match!


In [36]:
import re

txt = "The_rain_in_Spain"

#Return a match at every white-space character 

x = re.findall("\s", txt)

print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No Match")

[]
No Match


`7.  \S	Returns a match where the string DOES NOT contain a white space character	"\S"`

In [52]:
import re
#Return a match at every NON white-space
#character:
txt = "hello_planet 76587 09"
x = re.findall("\S", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No Match")

['h', 'e', 'l', 'l', 'o', '_', 'p', 'l', 'a', 'n', 'e', 't', '7', '6', '5', '8', '7', '0', '9']
Yes, there is at least one match!


In [38]:
import re
#Return a match at every NON white-space
#character:
txt = "hello planet"
x = re.findall("\S", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No Match")

['h', 'e', 'l', 'l', 'o', 'p', 'l', 'a', 'n', 'e', 't']
Yes, there is at least one match!


means it searchs for non white space only 

`8.    \w	Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)	"\w"`

In [53]:
import re

txt = "The rain_in Spain123"
# Return a match at every word character
# (characters from a to Z, digits from 0-9,
# and the underscore _ character):

x = re.findall("\w", txt)
print(x)

if x:
    print("yes, there is at least one match!")
else:
    print("No match")

['T', 'h', 'e', 'r', 'a', 'i', 'n', '_', 'i', 'n', 'S', 'p', 'a', 'i', 'n', '1', '2', '3']
yes, there is at least one match!


In [2]:
import re

txt = "hello@World"
# Return a match at every word character
# (characters from a to Z, digits from 0-9,
# and the underscore _ character):

x = re.findall("\w", txt)
print(x)

if x:
    print("yes, there is at least one match!")
else:
    print("No match")

['h', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']
yes, there is at least one match!


`9.   \W	Returns a match where the string DOES NOT contain any word characters	"\W"`

In [54]:
import re

txt = "The rain in Spain hello$world"

#Return a match at every NON word character 
#(characters NOT between a and Z. Like
# "!", "?" white-space etc.):

x = re.findall("\W", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

[' ', ' ', ' ', ' ', '$']
Yes, there is at least one match!


In [15]:
import re

txt = "hello@_world"

#Return a match at every NON word character 
#(characters NOT between a and Z. Like
# "!", "?" white-space etc.):

x = re.findall("\W", txt)
print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['@']
Yes, there is at least one match!


` 10.  \Z	Returns a match if the specified characters are at the end of the string	"Spain\Z" `

In [6]:
import re

txt = "the rain in Spain"

# check if the string ends with "Spain"

x = re.findall("Spain\Z", txt)

print(x)

if x:
    print("Yes, there is a match!")
    
else:
    print("NO match")

['Spain']
Yes, there is a match!


#### Sets
A set is a set of characters inside a pair of square brackets` [] `with a special meaning:

`1.  [arn]	Returns a match where one of the specified characters (a, r, or n) are present`

In [7]:
import re

txt = "The rain on Spain"
#Check if the string has any a, r, or n 
# characters:

x = re.findall("[arn]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")
    

['r', 'a', 'n', 'n', 'a', 'n']
Yes, there is at least one match!


`2. [a-n]	Returns a match for any lower case character, alphabetically between a and n`


In [10]:
import re 
txt = "The rain in Spain"

#Check if the string has any characters 
# between a and n:

x = re.findall("[a-n]", txt)


print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No Match")

['h', 'e', 'a', 'i', 'n', 'i', 'n', 'a', 'i', 'n']
Yes, there is at least one match!


In [10]:
import re 
txt = "The rain in Spain"

#Check if the string has any characters 
# between a and n:

x = re.findall("[a-c]", txt)


print(x)
if x:
    print("Yes, there is at least one match!")
else:
    print("No Match")

['a', 'a']
Yes, there is at least one match!


`3.  [^arn]	Returns a match for any character EXCEPT a, r, and n`

In [16]:
import re

txt = "The rain in@ _ Spain"

#Check if the string has other characters 
# than a, r, or n:

x = re.findall("[^arn]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['T', 'h', 'e', ' ', 'i', ' ', 'i', '@', ' ', '_', ' ', 'S', 'p', 'i']
Yes, there is at least one match!


`4.   [0123]	Returns a match where any of the specified digits (0, 1, 2, or 3) are present`

In [13]:
import re

txt = "The rain in Spain"

#Check if the string has any 0, 1, 2, 
# or 3 digits:

x = re.findall("[0123]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


[]
No match


`5.  [0-9]	Returns a match for any digit between 0 and 9`

In [55]:
import re

txt = "The rain i79n Spain"

# Check if the string has any digits:

x = re.findall("[0-9]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("NO match")

['7', '9']
Yes, there is at least one match!


In [2]:
import re

txt = "The rain 1123! in Spain"

# Check if the string has any digits:

x = re.findall("[0-9]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("NO match")

['1', '1', '2', '3']
Yes, there is at least one match!


`6.  [0-5][0-9]	Returns a match for any two-digit numbers from 00 and 59`

In [17]:
import re 

txt = "8 times before 11:45 AM"
#Check if the string has any two-digit
#numbers, from 00 to 59:

x = re.findall("[0-5][0-9]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['11', '45']
Yes, there is at least one match!


In [18]:
import re 

txt = "8 times 89 45 44 before 11:45 AM"
#Check if the string has any two-digit
#numbers, from 00 to 59:

x = re.findall("[0-5][0-9]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['45', '44', '11', '45']
Yes, there is at least one match!


As we can see that 89 has been ignored by regex engine because first digit limit is` [0-5]` and it start with 8 means out of limit search 

In [18]:
import re 

txt = "8 times 89:45 44 before 11:45 AM"
#Check if the string has any two-digit
#numbers, from 00 to 59:

x = re.findall("[0-5][0-9]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['45', '44', '11', '45']
Yes, there is at least one match!


`7.   [a-zA-Z]	Returns a match for any character alphabetically between a and z, lower case OR upper case`

In [5]:
import re

txt = "8 times before 11:45 AM"

#Check if the string has any characters from
#a to z lower case, and A to Z upper case:

x = re.findall("[a-zA-Z]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")



['t', 'i', 'm', 'e', 's', 'b', 'e', 'f', 'o', 'r', 'e', 'A', 'M']
Yes, there is at least one match!


`[+]	In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string`

In [56]:
import re

txt = "8 times before 11:45 AM"

#Check if the string has any + characters:

x = re.findall("[+]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


[]
No match


In [20]:
import re

txt = "8 times before 11:45 AM"

#Check if the string has any + characters:

x = re.findall("[eo+]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['e', 'e', 'o', 'e']
Yes, there is at least one match!


### The findall() Function
The `findall()` function returns a list containing all matches.

In [21]:
import re 

txt = "The rain in Spain"
x = re.findall("ai", txt)

print(x)

['ai', 'ai']


The list contains the matches in the order they are found.

If no matches are found, an empty list is returned:

In [22]:
import re

txt = "The rain in Spain"

x = re.findall("Portugal" , txt)

print(x)

[]


#### The search() Function
The `search()` function searches the entire string for a match, and `returns a Match object` if there is a match.

`If there is more than one match, only the first occurrence of the match will be returned:`

In [6]:
import re

txt = "The rain in Spain"

x = re.search("\s", txt)

print("The first white-space character is located in position: ", x.start())

The first white-space character is located in position:  3


In [7]:
x.span()

(3, 4)

In [8]:
x.group()

' '

In [9]:
x.groupdict()

{}

In [24]:
import re

txt = "Tha rain in Spain"

x = re.search("\s", txt)

print(x)

<re.Match object; span=(3, 4), match=' '>


If no matches are found, the value `None` is returned:

In [28]:
import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)

None


In [10]:
import re

txt = "The rain in Spain"

x = re.search("rain", txt)

print("The first white-space character is located in position: ", x.start())

The first white-space character is located in position:  4


In [11]:
x.span()

(4, 8)

In [12]:
x.end()

8

In [14]:
x.endpos

17

In [15]:
len(txt)

17

In [19]:
x.expand("ra")

'ra'

In [20]:
x

<re.Match object; span=(4, 8), match='rain'>

In [23]:
x.groupdict()

{}

In [25]:
x.groups()

()

In [27]:
x.string

'The rain in Spain'

In [28]:
x.pos

0

In [29]:
x.start()

4

### The split() Function
The `split()` function returns a list where the string has been split at each match:

In [29]:
import re

txt = "The rain in Spain"

x = re.split("\s", txt)
print(x)

['The', 'rain', 'in', 'Spain']


**note:**
You can control the number of occurrences by specifying the `maxsplit` parameter:

In [30]:
import re

txt = "The rain in Spain"

x = re.split("\s", txt, 1)
print(x)

['The', 'rain in Spain']


In [33]:
import re

txt = "Hello World 1 2 and 121"

x = re.split("\d", txt)
print(x)

['Hello World ', ' ', ' and ', '', '', '']


In [34]:
import re

txt = "Hello World"

x = re.split("[A-Z]", txt)
print(x)

['', 'ello ', 'orld']


In [35]:
import re

txt = "Hello World"

x = re.split("\s", txt)
print(x)

['Hello', 'World']


#### The sub() Function
The `sub()` function replaces the matches with the text of your choice:

#### Example
Replace every white-space character with the number 9:

In [37]:
import re

txt = "The rain in Spain"

x = re.sub("\s", "9", txt)

print(x)

The9rain9in9Spain


In [38]:
import re

txt = "Hello world 123"

x = re.sub("\d", "!", txt)
print(x)

Hello world !!!


In [39]:
import re 

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)

print(x)

The9rain9in Spain


In [41]:
import re 

txt = "Hello world 123"
x = re.sub("\d", "!", txt, 2)
print(x)

Hello world !!3


In [11]:
import re

txt  = "the broad and the road"
re.sub("road", "rd", txt)

'the brd and the rd'

### Match Object

A Match Object is an object containing information about the search and the result.

**Note:**

If there is no match, the value `None` will be returned, instead of the Match Object.

In [42]:
import re

txt = "The rain in Spain"

x = re.search("ai", txt)
print(x) # this will print an object

<re.Match object; span=(5, 7), match='ai'>


The Match object has properties and methods used to retrieve information about the search, and the result:

1.  `.span()` returns a tuple containing the start-, and end positions of the match.

2.  `.string` returns the string passed into the function

3.  `.group()` returns the part of the string where there was a match

#### Example
Print the position (start- and end-position) of the first match occurrence.

The regular expression looks for any words that starts with an upper case "S":

In [43]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)

print(x.span())

(12, 17)


In [44]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.string)


The rain in Spain


In [45]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.group())

Spain


`Note:`

If there is no match, the value `None `will be returned, instead of the Match Object.

In [47]:
import re

txt = "The rain in Sa"
x = re.search(r"\bS\w+", txt)
print(x.group())

Sa


In [52]:
import re

txt = "The rain in Sqa"
x = re.search(r"\bS\w+", txt)
print(x.span())

(12, 15)


In [5]:
import re
password = input("Enter string to test: ")
if re.fullmatch(r'[A-Za-z0-9@#$%^&+=]{8,}', password):
    print("match") #match
else:
    print("no match")# no match

Enter string to test: qw12
no match


In [10]:
# Password validation in Python
# using naive method
  
# Function to validate the password
def password_check(passwd):
      
    SpecialSym =['$', '@', '#', '%']
    val = True
      
    if len(passwd) < 6:
        print('length should be at least 6')
        val = False
          
    if len(passwd) > 20:
        print('length should be not be greater than 8')
        val = False
          
    if not any(char.isdigit() for char in passwd):
        print('Password should have at least one numeral')
        val = False
          
    if not any(char.isupper() for char in passwd):
        print('Password should have at least one uppercase letter')
        val = False
          
    if not any(char.islower() for char in passwd):
        print('Password should have at least one lowercase letter')
        val = False
          
    if not any(char in SpecialSym for char in passwd):
        print('Password should have at least one of the symbols $@#')
        val = False
    if val:
        return val
  
# Main method
def main():
    passwd = 'Geek12@'
      
    if (password_check(passwd)):
        print("Password is valid")
    else:
        print("Invalid Password !!")
          
# Driver Code        
if __name__ == '__main__':
    main()

Password is valid


In [11]:
# importing re library
import re
  
def main():
    passwd = 'Geek12@'
    reg = "^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*#?&])[A-Za-z\d@$!#%*?&]{6,20}$"
      
    # compiling regex
    pat = re.compile(reg)
      
    # searching regex                 
    mat = re.search(pat, passwd)
      
    # validating conditions
    if mat:
        print("Password is valid.")
    else:
        print("Password invalid !!")
  
# Driver Code     
if __name__ == '__main__':
    main()

Password is valid.
