# Regular Expressions II

I think regular expressions are perfectly suited for people who like puzzles like _crosswords_ and _Sudoku_. Naturally, people created games out of them (more info on that later). They can be long, mindboggling and indecipherable at times. But we'll start at the very beginning to keep it simple.

There are multiple versions of Regex Engines out there for different programming languages and software (analogous to SQL flavors for databases). The differences usually lie in the more advanced features. The basic nomenclature should hold across most Regex Engines.

A regular expression is a sequence of characters that can be used as a search pattern for a string. They are commonly used to do __find__ and __find and replace__ string operations and for __validation of user input__ (we'll see examples of this later).

Regular expressions can be used to evaulate Unicode strings or 8-bit strings (e.g. '00001110'). We'll just worry about the former in this workshop. Python has adopted the Perl syntax `\w` vs. the POSIX syntax `[:word:]`.

Let's get started and import the regular expression module in Python.

In [1]:
import re

To set up a regular expression (regex), we use the `compile` method.

In [None]:
regex = re.compile(r'\w{7}\s\w+')
regex

To look for a match, we use the `search` method.

In [None]:
re.search(regex, 'This string should match the above regular expression')

The `search` method returns a match object that contains the span of characters that matched along with the characters that matched. If an object exists in Python, it usually is equivalent to `True`. We'll use that fact to check if we have a match or not.

Here is a custom function we'll be using to return `True`/`False` about whether our regex matches a list of strings.

In [2]:
def Go_Fish(strings, regex, flag=0):
    TF = []
    for string in strings:
        if re.search(regex, string):
            if flag:
                print(re.search(regex,string))
            TF.append(True)
        else:
            TF.append(False)
    return TF

## Warm Up Exercises

You should be able to answer the exercises below. You only require knowledge that was covered in Part I of the workshop. Those topics include anchors, metacharacters, range of characters.

In [None]:
strings = ['Ada, Ohio', 'Ada...Oklahoma', 'Ada_Oregon', 'Ada_Minnesota', 'Minnesota']
# [True, False, False, True, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

From Part I of the workshop. Extract the city/township name from these geocoded addresses. **Hint:** Its okay if its not a totally clean match in terms of punctuation.

In [None]:
strings = ['300, 600 Woodward Ave, Detroit, MI 48226, USA',
           '2990-2998 Evaline St, Hamtramck, MI 48212, USA',
           '531 Seven Mile E, Highland Park, MI 48203, USA',
           '16935 5 Points St, Redford Charter Twp, MI 48240, USA',
           '21547-21721 W Warren Ave, Dearborn Heights, MI 48127, USA']
# [True, True, True, True, True]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

This is designed to be a trick question.

In [None]:
strings = ['01/19/2017','01-19-2017','01192017','01/19-2017']
# [True, True, True, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

## Capture `()` Groups

Parentheses `()` allows you to group characters together.

Recall the last question in Part I of the workshop asked you to extract city names from addresses. You probably couldn't extract just the city name because you were probably matching the commas before and after the name. Here's where you would use capture groups to isolate the city name. 

In the example below, we are grouping the characters `Detroit` together and using the `?` on the whole group and not just the previous character. Without the parentheses, we would not match the last string since the `?` is only operating on the `t` and the regex engine would try to match the string `Detroit`.

In [None]:
strings = ['Detroit, Wayne County', 'Detroit, Michigan', 'Michigan']
# [True, True, True]

In [None]:
regex = re.compile(r'(Detroit)?.+')
Go_Fish(strings, regex, 1)

This also has the added benefit of creating a numbered captured group.

In [None]:
match = re.search(regex, 'Detroit, Michigan')
print(match.group(0))
print(match.group(1))

Q: So why is this useful?  
A: It can save you the hassle of parsing the string after the fact.  
We will also see another use for it when using backreferences.

Let's revisit another example from Part I.

In [None]:
strings = ['keep calm and cook bacon',
           'keep calm and cook hash browns',
           'keep calm and cook quinoa',
           'bacon and hash browns are yummy']
# [True, True, False, False]

Recall one solution was `bacon$|browns$`. Here's another solution using parantheses (i.e. captured groups).

In [None]:
regex = re.compile(r'keep calm and cook (bacon|hash browns)')

Now, we have the option of keeping calm and extracting the food item by itself with the numbered capture group, if we wish.

In [None]:
match = re.search(regex, 'keep calm and cook hash browns')
print(match.group(0))
print(match.group(1))

For the example below, I've written a regex for matching the dates. Finish the regex by creating groups for each part of the date (month, day, year).

In [None]:
strings = ['01-11-2017','July 1, 1867','Jan 11, 2017','Dec 25, 2000']
# [False, True, True, True]

In [None]:
regex = re.compile(r'\w{3,} \d{1,2}, \d{4}')
Go_Fish(strings, regex, 1)

Here is an example script to extract the groups after `re.search`. 

**Note:** The Python code will give you an `IndexError: no such group` message if you do not create enough groups in the solution.

In [None]:
for string in strings:
    date = re.search(regex,string)
    if date:
        print('Match={}\tGrp1={}\tGrp2={}\tGrp3={}'.format(date.group(0), date.group(1), date.group(2), date.group(3)))

Write a regex to differentiate between phone numbers and SSN.

In [None]:
strings = ['734-764-7828','555-234-5678','313-255-9119','555-23-4678']
# [True, True, True, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

Using your answer above, create a group for the area code (first 3 digits) and the exchange (next 3 digits).

In [None]:
regex = re.compile(r'')
for string in strings:
    match = re.search(regex,string)
    if match:
        print('Area Code={}\tExchange={}\t'.format(match.group(1), match.group(2)))

## Nested Groups `( () () )`

You can also nest groups within other groups. The outer group gets precedence when numbered. 

For example, the regex `(January (24)), 2017` creates two captured groups. Group 1 is `January 24` and Group 2 is `24`.

1. Write a regex to match the three strings
2. Create a group to extract the Month and Day together but also the Month and Day separately.  
3. Create a group for the 4 digit Year and the 2 digit Year.

There should be 6 groups created in total (Group# 0-5) thus you should have 5 sets of parentheses in your solution. 

In [None]:
strings = ['July 1, 1867','Jan 24, 2017','Dec 25, 2000']
regex = re.compile(r'')
for string in strings:
    date = re.search(regex,string)
    if date:
        print('Match={0}  MonthDay={1}  Month={2}  Day={3} Y4={4}  Y2={5}'.format(date.group(0), date.group(1), date.group(2), date.group(3), date.group(4), date.group(5)))

## Non-Capturing Groups `(?:)`

You can create groups that are non-capturing by using the `(?:` notation. The substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern. 

Q: Why would you need to create a group not to capture it?  
A: I have no idea but I thought I should mention it because I've seen it used in real life. Maybe with the `|` operator.

For example, the month is grouped below but not captured. The day is now group 1.

In [None]:
strings = ['July 1, 1867','Jan 24, 2017','Dec 25, 2000']
regex = re.compile(r'(?:\w{3,}) (\d{1,2}), \d{4}')
for string in strings:
    date = re.search(regex,string)
    if date:
        print('Match={}\tGrp1={}'.format(date.group(0), date.group(1)) )

## Conditional Statements `(?(#))`

You can check to see if a captured group was matched so far and then have a conditional if else statement following it to match either one pattern or another.

Suppose we want to extract the first/given name below. The problem lies in the fact that it is written in two different ways. We can use a conditional statement to test which format the name is written in and then have two regex patterns we can use depending on which format we see.

In [199]:
strings = ['Dumbledore, Albus',
           'Harry Potter',
           'Hermione Granger',
           'Snape, Severus']
# [True, True, True, True]

In [211]:
regex = re.compile(r'(,)?(?(1) \w+|\w+ )')
Go_Fish(strings, regex, 1)

<_sre.SRE_Match object; span=(10, 17), match=', Albus'>
<_sre.SRE_Match object; span=(0, 6), match='Harry '>
<_sre.SRE_Match object; span=(0, 9), match='Hermione '>
<_sre.SRE_Match object; span=(5, 14), match=', Severus'>


[True, True, True, True]

In [206]:
for string in strings:
    match = re.search(regex, string)
    print(match.group(0).strip(', '))

Albus
Harry
Hermione
Severus


Match the first sentence in each string if the first sentence contains the word `question` or `answer`.

In [185]:
strings = ['The question is: What is the capital of Nebraska?',
           'The answer is not Omaha.',
           'The question is binary. The answer is either true or false.',
           'There is no FAQ page on this website. I need an answer today.']
# [True, True, True, False]

In [186]:
regex = re.compile(r'(The question)?(?(1) [^\.]+[\.?]|The answer .+)')
Go_Fish(strings, regex, 1)

<_sre.SRE_Match object; span=(0, 49), match='The question is: What is the capital of Nebraska?>
<_sre.SRE_Match object; span=(0, 24), match='The answer is not Omaha.'>
<_sre.SRE_Match object; span=(0, 23), match='The question is binary.'>


[True, True, True, False]

# Comments `(?#)`

You can add a comment within a regex (for clarity?) using the `(?#comment)` notation. Everything before the first closing parentheses is a comment.

For example, the regex below does not match any date string because the day is commented out.

In [None]:
strings = ['July 1, 1867','Jan 24, 2017','Dec 25, 2000']
# [True, True, True]

In [None]:
regex = re.compile(r'(\w{3,}) (?#this is a comment\d{1,2}), (\d{4})')
Go_Fish(strings, regex, 1)

## Backreferences `\#`

We saw in the capture group section how we can extract certain substrings that we are interested in. We can take it one step further by using it to validate the string in question by backreferencing. Backreferences specifies that a character must match a character from an earlier captured group. They are useful for checking consistent use of delimiters used in dates and phone numbers.

The backreference nomenclature is a backslash followed by the number of the group (i.e. `\2` backreferences group 2). 

For example, suppose we wanted to match back-to-back letters `i` but with the constraint that they are the same letter case. Backreferencing can be used to enforce the constraint.

In [None]:
strings = ['octopii', 'Star Wars II', 'iI','Ii']
# [True, True, False, False]
regex = re.compile(r'([iI])\1')
Go_Fish(strings, regex, 1)

### Named Groups `(?P<name>)`

Alternatively, you can name your group if you don't like enumeration using the `(?P<name>regex)` syntax to capture the group and then `(?P=name)` to backreference it.

In [7]:
strings = ['octopii', 'Star Wars II', 'iI','Ii']
# [True, True, False, False]

In [11]:
regex = re.compile(r'(?P<doublei>[iI])(?P=doublei)')
Go_Fish(strings, regex, 1)

<_sre.SRE_Match object; span=(5, 7), match='ii'>
<_sre.SRE_Match object; span=(10, 12), match='II'>


[True, True, False, False]

Match the strings that have a consistent delimiter using backreferencing. <font color='white'>**Hint**: Use the ? metacharacter to handle the first delimiter.</font>

In [None]:
strings = ['01/19/2017','01-19-2017','01192017','01/19-2017']
# [True, True, True, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

The above example would be more difficult to solve without backreferencing (not sure if I could do it with the tools I've showed you so far).

Match the phone numbers that have a consistent delimiter. <font color='white'>**Hint**: Use the ? metacharacter to handle the '(734)' string.</font>

In [None]:
strings = ['734-764-STAT','(734) 764 7828','734 764 7828','734.764.7828', '734 764-7828', '734/764 7828']
# [True, True, True, True, False, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 0)

## Assertions

Assertions, also referred to as "lookaround", matches characters but they do not consume characters in the string (also refered to as zero length) but only assert whether a match is possible or not. You can have assertions in your pattern that look ahead or behind to ensure that a subpattern does or does not occur. Thus, there are 4 types of assertions:
1. Negative lookahead (?!)
2. Positive lookahead (?=)
3. Positive lookbehind (?<=)
4. Negative lookbehind (?<!)

### Negative lookahead assertion `(?!)`

Asserts that the current position in the string is not followed by a match for the subpattern in parentheses.  
If the pattern does not match, then try the rest of the regex from that point onward.

For example, we want words that have the letter `i` in them but not immediately followed by the letter `e`.

In [None]:
strings = ['neighborhood','weigh','diner','quiet']
# [True, True, True, False]

A solution based on the regex knowledge I've given you so far is:

In [None]:
regex = re.compile(r'i[^e]')
Go_Fish(strings, regex, 1)

The quasi-equivalent solution using a negative lookahead assertion is:

In [None]:
regex = re.compile(r'i(?!e)')
Go_Fish(strings, regex, 1)

**Note**: The assertion only returns a match of one letter instead of two like above. Recall it doesn't consume any characters to lookahead.  
**Note 2**: The parentheses for the assertion does not create a captured group. Recall it doesn't consume any characters to lookahead so there is no group to capture.

Q. So why is this complicated method useful?  
A. For more complicated cases

Consider the scenario where we only want words where `i` is never followed by `e` in the word. This would be very hard without assertions. Let's give it a try.

In [None]:
strings = ['neighborhood','weigh','diner','quiet']
# [True, True, False, False]

In [None]:
regex = re.compile(r'i[^e]')
Go_Fish(strings, regex, 1)

With assertions, its doable.

In [None]:
regex = re.compile(r'i(?!.*e)')
Go_Fish(strings, regex, 1)

We want all the words that have the letter `q` but not the sequence `qu`. Bonus if you can match the entire word instead of just the letter `q`.

In [None]:
strings = ['Iraq','qat','qabala','quad','quack']
# [True, True, True, False, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

Match the phone numbers (all 12 characters) that do not have the sequence `555` in them. The example is meant to illustrate the fact that assertions do not consume any characters. <font color='white'>**Hint**: Use the \w metacharacter to take care of 'STAT'.</font>

In [None]:
strings = ['734-764-STAT','734-764-7828','734-764-5558','734-555-STAT']
# [True, True, False, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

### Positive lookahead assertion `(?=)`

Asserts that the current position in the string is followed by a match for the subpattern in parentheses.  

For example, we want to match words that have `i` followed by the letter `e`.

In [None]:
strings = ['quiet','diet','diner','neighborhood','weigh']
# [True, True, False, False, False]

In [None]:
regex = re.compile(r'i(?=e)')
Go_Fish(strings, regex, 1)

Extract the dollar amount from the strings containing dollar amounts.

In [None]:
strings = ['1 dollar', '65358 dollars', '200 euros', '314 yens', '1592 pounds', 'The US dollar is worth 1.34 CDN dollars']
# [True, True, False, False, False, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

### Positive lookbehind assertion `(?<=)`

Asserts that the current position in the string is preceded by a match for the subpattern in parentheses.  

For example, if we want words with the letter `u` that are preceded by the letter `q`.

In [None]:
strings = ['quad','quack','Iraq','qat','qabala']
# [True, True, False, False, False]

In [None]:
regex = re.compile(r'(?<=q)u')
Go_Fish(strings, regex, 1)

**Note**: An additional constraint that lookbehind assertions have is that the subpattern must only match strings of some fixed length. For example, this regex will throw an error `(?<=q+)u` since we don't know how many characters to match because of the `+` character.

Extract the dollar amount from the strings containing dollar amounts. <font color='white'>**Hint**: Start with a positive lookbehind.</font>

In [None]:
strings = ['$1', '$65358', '€200', '¥314', '£1592']
# [True, True, False, False, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

### Negative lookbehind assertion `(?<!)`

Asserts that the current position in the string is preceded by a match for the subpattern in parentheses.  
The pattern must only match strings of some fixed length similar to positive lookbehind assertions.

For example, if we want phrases that have `Mouse` but nothing to do with `Mickey`.

In [None]:
strings =['Mighty Mouse','Modest Mouse','Minnie Mouse','Mickey Mouse','Mickey Rooney']
# [True, True, True, False, False]

In [None]:
regex = re.compile(r'(?<!Mickey) Mouse')
Go_Fish(strings, regex, 1)

Extract the currency and the amount that are not in dollars. <font color='white'>**Hint**: You might want to use the non-word character.</font>

In [None]:
strings = ['$1', '€200', '¥314', '£1592', '$65358']
# [False, True, True, True, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

And that's it (for now) for learning regular expressions. 

> ## Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski

# Practical Applications

This section tests what you have learned to use in some practical applications (and maybe some just for fun) where there is no context or constraint on which tools you can use.

## Name Variations

`Pandas` is a must-have when working with real-world data. __Disclaimer__: I also run a `pandas` workshop.

In [None]:
import pandas as pd
pd.options.display.max_rows = 10

This code block will read in the popular baby names from the year 2015 released by the Social Security Administration.

In [None]:
df = pd.read_csv('yob2015.txt', header=None)
df.columns = ['name','gender','count']

Find as many valid variations of `Alex` as you can (or your own name, if you wish) using a single regex. I got 67 variants when I tried it.

In [None]:
regex = re.compile(r'')
df['match']= df['name'].apply(lambda x: True if re.search(regex,x) else False)
variations = df.query('match == True').drop_duplicates('name')
print('There were {} variants of Alex for boys and girls'.format(variations.shape[0]))
print(variations['name'].tolist())

The top 10 girl names from 2015 are:

(Not so practical problem). Match the top 5 girl names but not 6-10 in the least amount of characters possible (people call this regex golf). I did it in twelve characters.

In [None]:
strings = ['Emma','Olivia','Sophia','Ava','Isabella','Mia','Abigail','Emily','Charlotte','Harper']
# [True, True, True, True, True, False, False, False, False, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

(Not so practical problem). Match the top 5 boy names but not 6-10 in the least amount of characters possible. I did it in sixteen characters.

In [None]:
df.query('gender == "M"').head(10)

In [None]:
strings = ['Noah','Liam','Mason','Jacob','William','Ethan','James','Alexander','Michael','Benjamin']
# [True, True, True, True, True, False, False, False, False, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

## VIN Filtering

VINs are 17 digit alphanumeric (CAPITAL letters only) characters used to uniquely identify vehicles. A VIN can tell you the country of origin, manufacturer, specifications, model year, and assembly plant of the vehicle.

This data file contains valid VINs and non-valid VINs. The invalid ones are obtained from actual crash reports where police officers write in other text besides VIN when its not available (maybe because the field doesn't allow blanks).

Read in the data file.

In [None]:
df = pd.read_csv('vins.txt', header=None, names=['vin'])

Here are the valid VINs. There are only twenty in this file. They are in the first twenty rows.

In [None]:
pd.options.display.max_rows = 20
df.head(20)

Write a regex to keep only the valid VINs (first 20 rows). You might also want to open the file in a text editor to see what the non-valid entries look like.

In [None]:
regex = re.compile(r'')
TF = df['vin'].apply(lambda x: True if re.search(regex, x) else False)
TF.sum()

Take a look at the rows you've matched. Make sure you've matched the first 20 and not some other random set.

In [None]:
pd.options.display.max_rows = 20
df[TF]

## Web Scraping

Regular expressions can be used in web scraping to help parse html tags that have similar or the same styling. Each html tag is represented by a row in the data. __Disclaimer__: I also run a webscraping workshop.

We want to scape some college basketball data from this url http://www.usatoday.com/sports/ncaab/sagarin/. Below is a snapshot of the top of the table.

![webpage](img/webpage.png)

Unfortunately, the data is not in an html table and requires more sophisticated techniques to extract the data. 

The goal is to get the table into a DataFrame. Usually, we can filter content based on the font tag and its attributes but this HTML code is kind of a disaster so that is not possible here. We can, however, use regular expressions to help us separate the content. Below is a snippet of the page source that is highlighted in the image above.

![webpage](img/pagesource.png)

The goal is to write a regex to match the tag content as specific as possible.
Below is a sample of the data from the page source for the top and last ranked team. 
The last line represents the "HOME ADVANTAGE" line.
If the regex is done right for this sample, it should work on the entire table which we can test later on.

In [None]:
strings = ['  94.89',
           '    9   0   74.42( 100)    1   0  |    2   0',
           '   94.60    2 ',
           '   95.37    1 |',
           '   93.97    2',
           '  52.48',
           '    0   8   71.35( 199)    0   0  |    0   0',
           '   52.06  351 ',
           '   52.28  350 |',
           '   55.12  347',
           '   3.18']

Match the text for the RATING column (list elements 1 and 6)

In [None]:
# [True, False, False, False, False, True, False, False, False, False, False]
regex1 = re.compile(r'') # contains a decimal
Go_Fish(strings, regex1)

Match the text for the Win/Loss and SOS Columns (list elements 2 and 7)

In [None]:
# [False, True, False, False, False, False, True, False, False, False, False]
regex2 = re.compile(r'') # contains a number within parentheses
Go_Fish(strings, regex2)

Match the text for the PREDICTOR columns (list elements 3 and 8)

In [None]:
# [False, False, False, True, False, False, False, True, False]
regex3 = re.compile(r'') # number followed by a space
Go_Fish(strings, regex3, 1)

Match the text for the GOLDEN MEAN columns (list elements 4 and 9)

In [None]:
# [False, False, True, False, False, False, True, False, False]
regex4 = re.compile(r'') # ends with a |
Go_Fish(strings, regex4)

Match the text for the RECENT columns (list elements 5, 10)

In [None]:
# [False, False, False, True, False, False, False, True, False]
regex5 = re.compile(r'') # number followed by a space
Go_Fish(strings, regex5, 1)

If all 5 regexes above matched, try it on the full dataset below. Each tag is represented by a row in the data.

A correct response should print out the value 351 (x 5) for the number of rows (teams).

In [None]:
import pandas as pd
htmltags = pd.read_csv('webscraping.txt', header=None)
for regex in [regex1, regex2, regex3, regex4, regex5]:
    print(htmltags[0].map(lambda x: True if re.search(regex,x) else False).sum())

## Words with consecutive vowels

In [None]:
strings = ['aardvark','ape','feet','strength','Hawaii','kissing','goodbye','moat','vacuum','hut']
# [False, False, False, True, False, False, False, False, False, True]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

## Words with exactly one vowel

In [None]:
strings = ['aardvark','ape','feet','strength','Hawaii','kissing','goodbye','moat','vacuum','hut']
# [False, False, False, True, False, False, False, False, False, True]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

## Words with 3 or more vowels

In [None]:
strings = ['aardvark','ape','feet','strength','Hawaii','kissing','goodbye','moat','vacuum','hut']
# [True, False, False, False, True, False, True, False, True, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

## Movie Sequels

Match the movie sequels.  
<font color='white'>**Hint**: Use the logical OR | to create separate cases.</font>

In [None]:
strings = ['Mission: Impossible III',
           'Rocky V',
           'Star Wars IX: Not the Last One',
           'Terminator 2: Judgment Day',
           'Fast 8',
           '13 Going on 30',
           '2001: A Space Odyssey',
           '28 Days',
           '40 Days and 40 Nights']
# [True, True, True, True, True, False, False, False, False]

In [None]:
regex = re.compile(r'')
Go_Fish(strings, regex, 1)

# Regex Games

Regex Crossword  
https://regexcrossword.com/

Regex Golf (I don't know why it refers to Golf)  
https://alf.nu/RegexGolf

# References

Online regex tester including real-time matching and explanations. Can only test one string at a time.  
https://regex101.com

Online tutorial with real-time matching  
https://regexone.com/

Regex Cheat Sheet  
https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

Language-agnostic website about regular expressions  
http://www.regular-expressions.info/

Official Python3 `re` documentation  
https://docs.python.org/3/library/re.html

Python's alternative regular expression module `regex` to replace `re`   
https://pypi.python.org/pypi/regex  
This module allows you to do more fancy things like nested sets and set operations.  
For example, the regex `[[a-z]--[aeiou]]` specifies all lowercase non-vowels.