### Regular Expression
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.RegEx can be used to check if a string contains the specified search pattern.

## 
Function-------Description

findall--------Returns a list containing all matches

search---------Returns a Match object if there is a match anywhere in the string

split----------Returns a list where the string has been split at each match

sub------------Replaces one or many matches with a string

#
- The findall() Function

  The findall() function returns a list containing all matches.


- The search() Function

  The search() function searches the string for a match, and returns a Match object if there is a match.
  
  If there is more than one match, only the first occurrence of the match will be returned:
 
 
- The split() Function

  The split() function returns a list where the string has been split at each match:
 
 
- The sub() Function

  The sub() function replaces the matches with the text of your choice:


- Match Object

  A Match Object is an object containing information about the search and the result.
    
  The Match object has properties and methods used to retrieve information about the search, and the result:

  .span() returns a tuple containing the start-, and end positions of the match.
  
  .string returns the string passed into the function
  
  .group() returns the part of the string where there was a match



#### Character set [ ]

Match any character in the set.

In [1]:
import re

txt = "The rain in Spain"

#Find all lower case characters alphabetically between "a" and "m":
x = re.findall("[a-m]", txt)
y = re.findall("[ai]", txt)

print(x)
print(y)


['h', 'e', 'a', 'i', 'i', 'a', 'i']
['a', 'i', 'i', 'a', 'i']


## \ 
Signals a special sequence (can also be used to escape special characters)

In [7]:
txt = "That will be 59$ dollars"

In [5]:
print(re.findall("\w\w\w\w", txt)) # returns the word contains 4-letters

['That', 'will', 'doll']


In [6]:
print(re.findall("\w", txt))# returns overal string in list as individual strings

['T', 'h', 'a', 't', 'w', 'i', 'l', 'l', 'b', 'e', '5', '9', 'd', 'o', 'l', 'l', 'a', 'r', 's']


In [11]:
print(re.findall("\W", txt))# Not word

[' ', ' ', ' ', '$', ' ']


In [71]:
print(re.findall("\d", txt))# digit

['5', '9']


In [72]:
print(re.findall("\d\d", txt))

['59']


# .
Any character (except newline character)

In [95]:
a = "hello planet"

#Search for a sequence that starts with "he", followed by two (any) characters, and an "o":

print( re.findall("he..o", a))
print( re.findall("p..n", a))

['hello']
['plan']


## ^
Starts with

In [80]:
txt = "hello planet"

#Check if the string starts with 'hello':

x = re.findall("^h", txt)

if x:
    print("Yes, the string starts with 'h'")
else:
    print("No match")


Yes, the string starts with 'h'


In [82]:
txt = "hello planet"

#Check if the string starts with 'hello':

x = re.findall("^hello", txt)

if x:
    print("Yes, the string starts with 'hello'")
else:
    print("No match")

Yes, the string starts with 'hello'


## $
Ends with

In [83]:
txt = "hello planet"

#Check if the string ends with 'planet':

x = re.findall("t$", txt)
if x:
    print("Yes, the string ends with 't'")
else:
    print("No match")

Yes, the string ends with 't'


In [84]:
txt = "hello planet"

#Check if the string ends with 'planet':

x = re.findall("planet$", txt)
if x:
    print("Yes, the string ends with 'planet'")
else:
    print("No match")


Yes, the string ends with 'planet'


# *
Zero or more occurrences

In [86]:
txt = "hello planet"

#Search for a sequence that starts with "he", followed by 0 or more  (any) characters, and an "o":

print(re.findall("he.*o", txt))

['hello']


In [92]:
txt = "hello planet"

x = re.findall("p.*t", txt)

print(x)

['planet']


# +
One or more occurrences

In [13]:
txt = "hello planet"

#Search for a sequence that starts with "p", followed by 1 or more  (any) characters, and an "t":

x = re.findall("p.+t", txt) # One or more occurrences
z = re.findall("p.*t", txt) # Zero or more occurrences
#using(.)
y = re.findall("p....t", txt)

print(x)
print(z)
print(y)


['planet']
['planet']
['planet']


# ?
Zero or one occurrences

In [108]:
txt = "sky hello"

#Search for a sequence that starts with "s", followed by 0 or 1  (any) character, and an "y":

x = re.findall("s.?y", txt)

y = re.findall("h.?o", txt) 
# it wont works bcoz '?' it works as followed by 0 or 1 here we got more than 1 between (h and o)

print(x)
print(y)

['sky']
[]


# {}
Exactly the specified number of occurrences

In [120]:
txt = "hello planet"

#Search for a sequence that starts with "he", followed excactly 2 (any) characters, and an "o":
print(re.findall("he.{2}o", txt))

print(re.findall("p.{4}t", txt))


['hello']
['planet']


# |
Either or

In [121]:
txt = "The rain in Spain falls mainly in the plain!"

print(re.findall("falls", txt))

['falls']


In [123]:
# using | --> or
txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains either "falls" or "stays":

x = re.findall("falls|stays", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")



['falls']
Yes, there is at least one match!


# A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:

# \A
Returns a match if the specified characters are at the beginning of the string

In [17]:
txt = "The rain in Spain"

#Check if the string starts with "The":

x = re.findall("\AThe", txt)

print(x)

if x:
      print("Yes, there is a match!")
else:
      print("No match")


[]
No match


# \b
Returns a match where the specified characters are at the beginning or at the end of a word
(the "r" in the beginning is making sure that the string is being treated as a "raw string")

In [18]:
txt = "The rain in Spain"

#Check if "ain" is present at the beginning of a WORD:

x = re.findall(r"\brai", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['rai']
Yes, there is at least one match!


In [134]:
txt = "The rain in Spain"

#Check if "ain" is present at the end of a WORD:

x = re.findall(r"ain\b", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['ain', 'ain']
Yes, there is at least one match!


In [135]:
txt = "The rain in Spain"

#Check if "ain" is present at the beginning of a WORD:

x = re.findall(r"\bSpa", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['Spa']
Yes, there is at least one match!


# \B
Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
(the "r" in the beginning is making sure that the string is being treated as a "raw string")

In [144]:
txt = "The rain in Spain"

#Check if "ain" is present, but NOT at the beginning of a word:

x = re.findall(r"\Bain", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['ain', 'ain']
Yes, there is at least one match!


# \d
Returns a match where the string contains digits (numbers from 0-9)

In [136]:
txt = "The rain in Spain"

#Check if the string contains any digits (numbers from 0-9):

x = re.findall("\d", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


[]
No match


In [138]:
txt = "The rain99 in Spain5"

#Check if the string contains any digits (numbers from 0-9):

x = re.findall("\d", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['9', '9', '5']
Yes, there is at least one match!


# \D
Returns a match where the string DOES NOT contain digits

In [147]:
txt = "The rain in Spain99"

#Return a match at every no-digit character:

x = re.findall("\D", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['T', 'h', 'e', ' ', 'r', 'a', 'i', 'n', ' ', 'i', 'n', ' ', 'S', 'p', 'a', 'i', 'n']
Yes, there is at least one match!


# \s
Returns a match where the string contains a white space character

In [148]:
txt = "The rain in Spain"

#Return a match at every white-space character:

x = re.findall("\s", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


[' ', ' ', ' ']
Yes, there is at least one match!


# \S
Returns a match where the string DOES NOT contain a white space character

In [149]:
txt = "The rain in Spain"

#Return a match at every NON white-space character:

x = re.findall("\S", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n']
Yes, there is at least one match!


# \w
Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)

In [151]:
txt = "The_rain in Spain99"

#Return a match at every word character (characters from a to Z, digits from 0-9, and the underscore _ character):

x = re.findall("\w", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['T', 'h', 'e', '_', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n', '9', '9']
Yes, there is at least one match!


# \W
Returns a match where the string DOES NOT contain any word characters

In [167]:
txt = "The@ rain? in^ ^ Spain!"

#Return a match at every NON word character (characters NOT between a and Z. Like "!", "?" white-space etc.):

x = re.findall("\W", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['@', ' ', '?', ' ', '^', ' ', '^', ' ', '!']
Yes, there is at least one match!


# \ Z
Returns a match if the specified characters are at the end of the string

In [173]:
txt = "The rain in Spain"

#Check if the string ends with "Spain":

x = re.findall("pain\Z", txt)

print(x)

if x:
    print("Yes, there is a match!")
else:
    print("No match")

['pain']
Yes, there is a match!


In [176]:
txt = "The rain in Spain"

#Check if the string ends with "Spain": using "\z"
print(re.findall("pain\Z", txt))

#Check if the string ends with "Spain": using "$"
print(re.findall("pain$", txt))

['pain']
['pain']


# Sets
A set is a set of characters inside a pair of square brackets [] with a special meaning:

# [arn]
Returns a match where one of the specified characters (a, r, or n) is present

In [178]:
txt = "The rain in Spain"

#Check if the string has any a, r, or n characters:

x = re.findall("[arn]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['r', 'a', 'n', 'n', 'a', 'n']
Yes, there is at least one match!


# [a-n]
Returns a match for any lower case character, alphabetically between a and n

In [179]:
txt = "The rain in Spain"

#Check if the string has any characters between a and n:

x = re.findall("[a-n]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['h', 'e', 'a', 'i', 'n', 'i', 'n', 'a', 'i', 'n']
Yes, there is at least one match!


# [^arn]
Returns a match for any character EXCEPT a, r, and n

In [180]:
txt = "The rain in Spain"

#Check if the string has other characters than a, r, or n:

x = re.findall("[^arn]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['T', 'h', 'e', ' ', 'i', ' ', 'i', ' ', 'S', 'p', 'i']
Yes, there is at least one match!


In [186]:
# using ^ for getting a string starts with the given word


txt = "The rain in Spain"

#Check if the string has other characters than a, r, or n:

x=re.findall("^The", txt)
print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['The']
Yes, there is at least one match!


# [0123]
Returns a match where any of the specified digits (0, 1, 2, or 3) are present

In [189]:
txt = "The00 rain 11 in Spain99"

#Check if the string has any 0, 1, 2, or 3 digits:

x = re.findall("[0123]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['0', '0', '1', '1']
Yes, there is at least one match!


# [0-9]
Returns a match for any digit between 0 and 9

In [192]:

txt = "9 times before 11:45 AM"

#Check if the string has any digits:

x = re.findall("[0-8]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['1', '1', '4', '5']
Yes, there is at least one match!


# [0-5][0-9]
Returns a match for any two-digit numbers from 00 and 59

In [193]:
txt = "8 times before 11:45 AM"

#Check if the string has any two-digit numbers, from 00 to 59:

x = re.findall("[0-5][0-9]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['11', '45']
Yes, there is at least one match!


# [a-zA-Z]
Returns a match for any character alphabetically between a and z, lower case OR upper case

In [195]:
txt = "8 times before 11:45 AM"

#Check if the string has any characters from a to z lower case, and A to Z upper case:

x = re.findall("[a-zA-Z]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")


['t', 'i', 'm', 'e', 's', 'b', 'e', 'f', 'o', 'r', 'e', 'A', 'M']
Yes, there is at least one match!


# [+]
In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string

In [200]:
txt = "8 times before+ 11:45 AM"

#Check if the string has any + characters:

x = re.findall("[+]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['+']
Yes, there is at least one match!


In [201]:
txt = "8 times before @11:45 AM"

#Check if the string has any + characters:

x = re.findall("[@]", txt)

print(x)

if x:
    print("Yes, there is at least one match!")
else:
    print("No match")

['@']
Yes, there is at least one match!


# The findall() Function
The findall() function returns a list containing all matches.

In [202]:
txt = "The rain in Spain"
x = re.findall("ai", txt) #finding for ai in txt
print(x)

['ai', 'ai']


In [203]:
txt = "The rain in Spain"
x = re.findall("Portugal", txt) #finding for Portugal in txt
print(x)

[]


# The search() Function
- The search() function searches the string for a match, and returns a Match object if there is a match.

- If there is more than one match, only the first occurrence of the match will be returned:

In [209]:
txt = "The rain in Spain"
x = re.search("\s", txt) #search for 1st whitespace
print(x)
print("The first white-space character is located in position:", x.start())

<re.Match object; span=(3, 4), match=' '>
The first white-space character is located in position: 3


# The split() Function
The split() function returns a list where the string has been split at each match:

In [218]:
# spliting with white space \s
txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

['The', 'rain', 'in', 'Spain']


In [217]:
# spliting with ' '
txt = "The rain in Spain"
x = re.split(" ", txt) 
print(x)

['The', 'rain', 'in', 'Spain']


# The sub() Function
The sub() function replaces the matches with the text of your choice:

In [219]:
txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

The9rain9in9Spain


In [227]:
txt = "The rain in Spain"
print(re.sub("\s", "9", txt, 1))
print(re.sub("\s", "9", txt, 2))
print(re.sub("\s", "9", txt, 3))


The9rain in Spain
The9rain9in Spain
The9rain9in9Spain


In [220]:
txt = "The rain in Spain"
x = re.sub("rain", "RAIN", txt)
print(x)

The RAIN in Spain


# Match Object

In [228]:
txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) #this will print an object

<re.Match object; span=(5, 7), match='ai'>


#### .span() this will print index of \w wordstarts with S till end of that particular word

In [234]:
txt = "The rain in Spain was heavy"
x = re.search(r"\bS\w+", txt)
print(x.span()) #this will print index of \w wordstarts with S till end of that particular word

(12, 17)


#### .string   this will print the string if given search word is present in that

In [244]:
txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.string) # this will print the string if given search word is present in that

The rain in Spain


#### .group() this will print the word from string according to our regex pattern

In [238]:
txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.group()) # this will print the word from string according to our regex pattern

Spain
