# Using Regular Expressions with Python

Regular Expressions are also called RegExp, Reg or Re. They are specific characters used for matching text patterns.  

What for?

They are very helpful since they automate tasks such as searching, deleting or replacing patterns into a text.

Let's import the `re` module first

The Python "re" module provides regular expression support.

In [1]:
import re

## Search

* Search for years in a list of strings: first character is a digit (1 or 2), the three following characters are a digit between o and 9: \[1-2\]\[0-9\]\{3\} 

In [2]:
import re
year_strings = []
strings = ['War of 1812', 'There are 5280 feet to a mile', 'Happy New Year 2016!']
for string in strings:
    new = re.search("[1-2][0-9]{3}", string)
    if new is not None:
        year_strings.append(string)
year_strings


['War of 1812', 'Happy New Year 2016!']

## Sub

* How to get rid of punctuation using regexp


In [3]:
import string
print(string.punctuation)

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~


In [4]:
import re
import string
punct = '[{}]'.format(re.escape(string.punctuation))
# return a pattern object
regex = re.compile(punct)
string = " Hello! What a beautiful day, how are you?"
print(regex.sub("", string))

 Hello What a beautiful day how are you


* How to get rid of lower alphabetical letters and space

In [6]:
import re
import string
punct = '[{}]'.format('a-z ')
# return a pattern object
regex = re.compile(punct)
s = " Hello! What a beautiful day, how are you?"
print(regex.sub("", s))

H!W,?


## match

* Select only specific characters

In [15]:
#!/usr/bin/python
import re

line = "Cats are smarter than dogs"

matchObj = re.match(r'(.*) are (.*?) .*', line)
# You can add options: re.match(r'(.*) are (.*?) .*', line, re.M|re.I)
#re.I performs case-insensitive matching.
# re.M makes $ match the end of a line (not just the end of the string) and 
# makes ^ match the start of any line (not just the start of the string).
if matchObj:
    print("matchObj.group() : ", matchObj.group())
    print("matchObj.group(1) : ", matchObj.group(1))
    print("matchObj.group(2) : ", matchObj.group(2))
else:
    print("No match!!")

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter


In [16]:
import re
line = "Cats are smarter than dogs"
regex = re.compile(r'(.*) are (.*?) .*')
matchObj = regex.match(line)
if matchObj:
    print("matchObj.group() : ", matchObj.group())
    print("matchObj.group(1) : ", matchObj.group(1))
    print("matchObj.group(2) : ", matchObj.group(2))
else:
    print("No match!!")

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter


## Findall

In [17]:
import re
years_string = '2015 was a good year, but 2016 will be better!'
years= re.findall("[1-2][0-9]{3}", years_string)
print(years)

['2015', '2016']


* Check if brackets are properly structured:
    * [blah blah]{blah blah} returns True
    * ([]{}) returns True
    * )][] returns False

In [18]:
line = " [as(dd)]{asdasd}(asd)"

pattern = re.compile('[\{\(\[\]\)\}]')
final = pattern.findall(line)
opened = ['[', '{', '(']
closed = [']', '}', ')']

result = True

while len(final) >= 2 and result:
    i = 0
    while (i < len(final)) and (final[i] in opened):
        last_opened = final[i]
        i += 1

    index_last = opened.index(last_opened)
    if (final[i] == closed[index_last]):
        final.pop(i-1)
        final.pop(i-1)
    else:
        result = False
    print(final)

if len(final) != 0:
    result = False
print(result)

['[', ']', '{', '}', '(', ')']
['{', '}', '(', ')']
['(', ')']
[]
True
