# Regular Expressions in Python

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern.

A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check if a particular string matches a given regular expression.Regular expressions can contain both special and ordinary characters. Most ordinary characters, like 'A', 'a', or '0', are the simplest regular expressions; they simply match themselves.

Regular expressions can contain both special and ordinary characters. Most ordinary characters, like 'A', 'a', or '0', are the simplest regular expressions; they simply match themselves.

The **Match** function attempts to match RE pattern to string with optional flags.

**Syntax:**
re.match(pattern, string, flags=0) 

where,
pattern - regular expression to be matched.
string - string, which would be searched to match the pattern at the beginning of string.
flags - different modifiers

### Module contents

**1. re.compile(pattern, flags=0)**

Compile a regular expression pattern into a regular expression object, which can be used for matching using its match() and search() methods

**2. re.search(pattern, string, flags=0)**

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. 
Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

**3. re.match(pattern, string, flags=0)**

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. 
Return None if the string does not match the pattern; note that this is different from a zero-length match.

**4. re.fullmatch(pattern, string, flags=0)**

If the whole string matches the regular expression pattern, return a corresponding match object. 
Return None if the string does not match the pattern; note that this is different from a zero-length match.

**5. split(pattern, string, maxsplit=0, flags=0)**

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. 
If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list.

In [3]:
re.split('\W+', 'Words, words, words.')

['Words', 'words', 'words', '']

## Match Objects

Match objects always have a boolean value of True. Since match() and search() return None when there is no match, you can test whether there was a match with a simple if statement

In [None]:
import re
match = re.search(pattern, string)
if match:
    process(match)

match.group([group1, ...])

-  Returns one or more subgroups of the match.
-  If there is a single argument, the result is a single string; 
-  if there are multiple arguments, the result is a tuple with one item per argument. 
-  Without arguments, group1 defaults to zero (the whole match is returned).

In [5]:
m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
m.group(0)        # prints, 'Isaac Newton'

m.group(1)       # The first parenthesized subgroup. prints, 'Isaac'

m.group(2)       # The second parenthesized subgroup. prints, 'Newton'

m.group(1, 2)    # Multiple arguments give us a tuple.

('Isaac', 'Newton')

In [6]:
m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
m.group('first_name')
#m.group('last_name')

'Malcolm'

In [7]:
#Named groups can also be referred to by their index
m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
m.group(1)
#m.group(2)

'Malcolm'

### Matching Versus Searching

Python offers two different primitive operations based on regular expressions: 
match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string

In [8]:
import re

line = "Cats are smarter than dogs";

matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
   print "match --> matchObj.group() : ", matchObj.group()
else:
   print "No match!!"

searchObj = re.search( r'dogs', line, re.M|re.I)
if searchObj:
   print "search --> searchObj.group() : ", searchObj.group()
else:
   print "Nothing found!!"

No match!!
search --> searchObj.group() :  dogs


### Search and Replace

One of the most important re methods that use regular expressions is sub.

**Syntax:**
re.sub(pattern, repl, string, max=0)

This method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided. This method returns modified string.

In [9]:
import re

phone = "2004-959-559 # This is Phone Number"

# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print "Phone Num : ", num

# Remove anything other than digits
num = re.sub(r'\D', "", phone)    
print "Phone Num : ", num

Phone Num :  2004-959-559 
Phone Num :  2004959559


**Regular Expression Patterns:**

^
Matches beginning of line.
	
$
Matches end of line.
	
.
Matches any single character except newline. Using m option allows it to match newline as well.
	
[...]
Matches any single character in brackets.
	
[^...]
Matches any single character not in brackets
	
re*
Matches 0 or more occurrences of preceding expression.
	
re+
Matches 1 or more occurrence of preceding expression.
	
re?
Matches 0 or 1 occurrence of preceding expression.
	
re{ n}
Matches exactly n number of occurrences of preceding expression.
	
re{ n,}
Matches n or more occurrences of preceding expression.
	
re{ n, m}
Matches at least n and at most m occurrences of preceding expression.
	
a| b
Matches either a or b.
	
(re)
Groups regular expressions and remembers matched text.
	
(?imx)
Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected.
	
(?-imx)
Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected.
	
(?: re)
Groups regular expressions without remembering matched text.
	
(?imx: re)
Temporarily toggles on i, m, or x options within parentheses.
	
(?-imx: re)
Temporarily toggles off i, m, or x options within parentheses.
	
(?#...)
Comment.
	
(?= re)
Specifies position using a pattern. Doesn't have a range.
	
(?! re)
Specifies position using pattern negation. Doesn't have a range.
	
(?> re)
Matches independent pattern without backtracking.
	
\w
Matches word characters.
	
\W
Matches nonword characters.
	
\s
Matches whitespace. Equivalent to [\t\n\r\f].
	
\S
Matches nonwhitespace.
	
\d
Matches digits. Equivalent to [0-9].
	
\D
Matches nondigits.
	
\A
Matches beginning of string.
	
\Z
Matches end of string. If a newline exists, it matches just before newline.
	
\z
Matches end of string.
	
\G
Matches point where last match finished.
	
\b
Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
	
\B
Matches nonword boundaries.
	
\n, \t, etc.
Matches newlines, carriage returns, tabs, etc.
	
\1...\9
Matches nth grouped subexpression.
	
\10
Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code.



**References:**

https://docs.python.org/3.4/library/re.html#
    
https://www.tutorialspoint.com/python/python_reg_expressions.htm