## Regular Expressions in Python
Regular expressions are supported by the 're' module in Python. We need to import 're' module to work with regular expressions.

In [1]:
import re

The most common uses of regular expressions are: <br>

 - Search a string (search and match) <br>
 - Finding a string (findall) <br>
 - Break string into a sub strings (split) <br>
 - Replace part of a string (sub) <br>

The 're' package provides various methods to perform queries on an input string. But, the most commonly used methods are shown below:

 - re.match()<br>
 - re.search()<br>
 - re.findall()<br>
 - re.split()<br>
 - re.sub()<br>
 - re.compile()

#### re.match() 
This method finds match if it occurs at start of the string. For example, calling match() on the below string and looking for a pattern 'The' will match. However, if we try to look for any other pattern, it will not match. To print the matching string, we will use method 'group', which helps to return the matching string. We can also find the start ad end position of the matching pattern.

In [22]:
res1 = re.match(r'The', 'The quick brown fox jumps over the lazy dog')
print(res1)
print(res1.group(0))

res2 = re.match(r'fox', 'The quick brown fox jumps over the lazy dog')
print(res2)

print(res1.start())   # start position of matching pattern in the string
print(res1.end())     # end position of matching pattern in the string


<_sre.SRE_Match object; span=(0, 3), match='The'>
The
None
0
3


#### re.search()
This method is similar to match() but it is not restricted to find only the matches at the beginning of the string. Here, search() method is able to find a pattern from any position of the string but it returns only the first occurrence of the search pattern.

In [23]:
res3 = re.search(r'fox', 'The quick brown fox jumps over the lazy dog')
print(res3.group(0))

res4 = re.search(r'fox', 'The quick brown fox jumps over the lazy dog and a cunning fox')
print(res4.group(0))

fox
fox


#### re.findall()
This method helps to get a list of all matching patterns with no constraints of searching from start or end. This method will return both the occurrences of fox when we search for it. It is always preferable to use re.findall() as it can work like re.search() and re.match() both.

In [24]:
res5 = re.findall(r'fox', 'The quick brown fox jumps over the lazy dog and a cunning fox')
print(res5)

['fox', 'fox']


#### re.split()
This methods helps to split string by the occurrences of given pattern. The method split() has an argument called “maxsplit“ with default value of zero. In this case, it does the maximum splits that can be done, but if we give value to maxsplit, it will split the string according to the value given. Example, if we give maxsplit=1, the method splits oly at the first occurrence of the pattern.

In [29]:
res6=re.split(r'o','The quick brown fox jumps over the lazy dog and a cunning fox')
print(res6)

res7=re.split(r'o','The quick brown fox jumps over the lazy dog and a cunning fox', maxsplit=1)
print(res7)

['The quick br', 'wn f', 'x jumps ', 'ver the lazy d', 'g and a cunning f', 'x']
['The quick br', 'wn fox jumps over the lazy dog and a cunning fox']


#### re.sub()
This method helps to search a pattern and replace it with a new sub-string. If the pattern is not found, the string is returned unchanged.


In [30]:
res8=re.sub('fox','cat', 'The quick brown fox jumps over the lazy dog and a cunning fox')
print(res8)

The quick brown cat jumps over the lazy dog and a cunning cat


#### re.compile()
We can combine a regular expression pattern into pattern objects, which can be used for pattern matching. It also helps to search a pattern again without rewriting it.

In [33]:
pattern=re.compile('fox')
res9=pattern.findall('The quick brown fox jumps over the lazy dog')
print(res9)

res10=pattern.findall('The quick brown fox jumps over the lazy dog and a cunning fox')
print(res10)

['fox']
['fox', 'fox']


#### Compilation Flags
It is also possible to modify the standard behavior of the patterns. In order to do that, we have to use the compilation flags. The 're' functions take options to modify the behavior of the pattern match. The option flag is added as an extra argument to the search() or findall() etc. Below are some of those flags: <br>

IGNORECASE -- ignore upper/lowercase differences for matching.<br>
DOTALL -- allow dot (.) to match newline -- normally it matches anything but newline. <br>
MULTILINE -- Within a string made of many lines, allow ^ and \$ to match the start and end of each line. Normally ^/$ would just match the start and end of the whole string.

In [27]:
import re

pattern = re.compile(r"[a-z]+")
pattern.search("Felix is awesome")

<_sre.SRE_Match object; span=(1, 5), match='elix'>

In [28]:
import re

pattern = re.compile(r"[a-z]+", re.IGNORECASE)
pattern.search("Felix is awesome")

<_sre.SRE_Match object; span=(0, 5), match='Felix'>

In [40]:
# When we try to match a block of text that spans multiple lines, if we don't use flag .DOTALL, the pattern doesn't match newlines

comment = re.compile(r'/\*(.*?)\*/')   # #  it doesn’t match newlines 
text1 = '/* this is a comment */'
text2 = '''/* this is a
              multiline comment */
        '''
print(comment.findall(text1))       

print(comment.findall(text2))        

[' this is a comment ']
[]


In [41]:
# The re.compile() function accepts a flag, re.DOTALL, which is useful when trying to find a match across multiple lines

comment = re.compile(r'/\*(.*?)\*/', re.DOTALL)   # matches newlines when .DOTALL is used
text1 = '/* this is a comment */'
text2 = '''/* this is a
              multiline comment */
        '''
print(comment.findall(text1))       

print(comment.findall(text2))

[' this is a comment ']
[' this is a\n              multiline comment ']


#### Resources Used:<br>
1. Python for Data Analysis by Wes McKinney
2. Mastering Python Regular Expressions by Félix López and Víctor Romero
3. Scientific Computing with Python 3 by Claus Führer, Jan Erik Solem and Olivier Verdier
4. Functional Python Programming by Steven Lott
5. https://www.analyticsvidhya.com/
6. https://stackoverflow.com/