## Python Regular Expressions
Regular expressions in python are supported by the re module. We should import re module to start using regular expression methods.<br>
Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.

In [18]:
import re
pattern = re.compile("abc")    #pattern object created here searches for pattern 'abc' against given string
pattern

re.compile(r'abc', re.UNICODE)

**match()** is used to match pattern against text. It produces a match object instance which has methods.

In [20]:
m = pattern.match("None")
print(m)
m = pattern.match("abc")
print(m)                   

None
<_sre.SRE_Match object; span=(0, 3), match='abc'>


We can also use compilation flags to specify cases like 'Ignore Case', 'Multi Line', 'Any Character', 'Verbose'

In [23]:
import re
pattern = re.compile(r"[a-z]+", re.I)
m = pattern.match("SravS")                    #Ignores Case
print(m)

pattern = re.compile(r"\w+\s\d+",re.M)         #matches string in multiple lines
m = pattern.match("year 2017\n month 10")
print(m)

pattern = re.compile(
    '''
    [\w\d.+-]+       # username
    @
    ([\w\d.]+\.)+    # domain name prefix
    (com|org|edu)    # more domains
    ''',
    re.VERBOSE)                 #verbose makes it more readable

m = pattern.match("sravs.a@aol.com")
print(m)



<_sre.SRE_Match object; span=(0, 5), match='SravS'>
<_sre.SRE_Match object; span=(0, 9), match='year 2017'>
<_sre.SRE_Match object; span=(0, 15), match='sravs.a@aol.com'>


match() method only checks if the regex matches at the start of a string, so start() will always be zero. However, the **search()** method of patterns scans through the string, so the match may not start at zero in that case.

In [24]:
import re
pat = re.compile("abc")
m = pat.search("Here is abc")
print(m)

<_sre.SRE_Match object; span=(8, 11), match='abc'>


We can use group(), start(), end(), span() methods of matched object for more information.

In [26]:
print(m.start())     #gives start index of matched part
print(m.end())       #gives end index of matched part
print(m.group())     #gives the matched string
print(m.span())      #gives a tuple containing the (start, end) positions of the match

8
11
abc
(8, 11)


**findall()** returns all the matches of a pattern unlike match(), search() which only return first matched string.

In [28]:
import re
pattern = re.compile('\d+\s\w+')
m = pattern.findall('We have 100 pigs, 30 cows, 40 hens and 2 dogs in our farm')
print(m)

['100 pigs', '30 cows', '40 hens', '2 dogs']


findall() creates the list of matched strings and returns it where as **finditer()** returns matched strings as iterator.

In [29]:
import re
pattern = re.compile('\d+\s\w+')
iterator = pattern.finditer('We have 100 pigs, 30 cows, 40 hens and 2 dogs in our farm')
for match in iterator:
    print(match.group())

100 pigs
30 cows
40 hens
2 dogs


We needn't create the pattern object everytime to search for a pattern. re module also provides top-level functions match(), search(), findall(), sub() etc., which take pattern to be matched as first argument, string to be matched against as second argument and return either none or a match object instance.

In [34]:
import re
m = re.match('\d+\s\w+','We have 100 pigs, 30 cows, 40 hens and 2 dogs in our farm')
print(m)

m = re.search('\d+\s\w+','We have 100 pigs, 30 cows, 40 hens and 2 dogs in our farm')
print(m)
print(m.group())

m = re.findall('\d+\s\w+','We have 100 pigs, 30 cows, 40 hens and 2 dogs in our farm')
print(m)

iter = re.finditer('\d+\s\w+','We have 100 pigs, 30 cows, 40 hens and 2 dogs in our farm')
for m in iter:
    print(m.group())

None
<_sre.SRE_Match object; span=(8, 16), match='100 pigs'>
100 pigs
['100 pigs', '30 cows', '40 hens', '2 dogs']
100 pigs
30 cows
40 hens
2 dogs


### References
https://docs.python.org/3/howto/regex.html <br>
Mastering Python Regula Expressions by Felix Lopez, Victor Romero