# Regular Expression

<b> A regular expression is a special sequence of characters that helps you <font color = "Red">match</font> or <font color = "Red">find</font> other strings or sets of strings, using a specialized syntax held in a pattern.</b> 

- The Module <b><font color ="red" >re</font></b> provides full support for Regular Expression in Python.
- The re module raises the exception re.error if an error occurs while compiling or using a regular expression.

We would cover two important functions, which would be used to handle regular expressions.They are:
<b>
1. match Function
2. search Function

## Basic patterns that match single chars

| Expression And Matches      | Meaning       | 
|:-------------:|:-------------:| 
|a, X, 9, <|ordinary characters just match themselves exactly.|
|. (a period)|matches any single character except newline '\n'|
|\w|matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_].|
|\W|matches any non-word character.|
|\b|boundary between word and non-word.|
|\s|matches a single whitespace character -- space, newline, return, tab|
|\S|matches any non-whitespace character.|
|\t, \n, \r|tab, newline, return|	
|\d|decimal digit [0-9]|
|^|matches start of the string.|	
|$|match the end of the string.|
|\|inhibit the "specialness" of a character.|

## The match Function

- This function attempts to match RE pattern to string with optional flags.
<b><br>
Here is Syntax:

       re.match(pattern, string, flags = 0)
        
</b>         

Here is the description of parameters: 
#### 1. pattern
     - This is the regular expression to be matched.
#### 2. string
     - This is the string, which would be searched to match the pattern at the beginning of string.
#### 3. flags
     - You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below.



The <font color = "red">re.match</font> function returns a <font color = "red">match object</font> on success, <font color = "red">None</font> on failure. We <font color = "red">usegroup(num)</font> or <font color = "red">groups()</font> function of match object to get matched expression.
#### group(num = 0)
   - This method returns entire match (or specific subgroup num)
	
#### groups()
- This method returns all matching subgroups in a tuple (empty if there weren't any)

### Example:

In [49]:
import re 
line = "A quick Brown Fox Jumps Over A lazy Dog"
matchObj = re.match( r'(.*) over (.*?) .*', line, re.M|re.I)
if matchObj:
    print(matchObj.groups())
    print("matchObj(0) is :",matchObj.group(0))
    print("matchObj(1) is :",matchObj.group(1))
    print("matchObj(2) is :",matchObj.group(2))
else:
   print ("Nothing found!!")

('A quick Brown Fox Jumps', 'A')
matchObj(0) is : A quick Brown Fox Jumps Over A lazy Dog
matchObj(1) is : A quick Brown Fox Jumps
matchObj(2) is : A


## The search Function

 - This function attempts to match RE pattern to string with optional flags.
 
 ##### Syntax:
             re.match(pattern, string, flags = 0)



In [45]:
import re

line = "A quick Brown Fox Jumps Over A lazy Dog"

searchObj = re.search( r'(.*) Jumps (.*?) .*', line, re.M|re.I)

if searchObj:
   print ("searchObj.groups() : ", searchObj.groups())
   print ("searchObj.group() : ", searchObj.group())
   print ("searchObj.group(1) : ", searchObj.group(1))
   print ("searchObj.group(2) : ", searchObj.group(2))
else:
   print ("Nothing found!!")

searchObj.groups() :  ('A quick Brown Fox', 'Over')
searchObj.group() :  A quick Brown Fox Jumps Over A lazy Dog
searchObj.group(1) :  A quick Brown Fox
searchObj.group(2) :  Over


## Matching Versus Searching
Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).

In [44]:
import re

line = "Welcome to Python course";

matchObj = re.match( r'Python', line, re.M|re.I)
if matchObj:
   print (matchObj.group())
else:
   print ("No match!!")

searchObj = re.search( r'Python', line, re.M|re.I)
if searchObj:
   print (searchObj.group())
else:
   print ("Nothing found!!")


No match!!
Python


## Search and Replace

One of the most important re methods that use regular expressions is sub.

#### Syntax :

          re.sub(pattern, repl, string, max=0)

This method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max is provided. This method returns modified string.     



In [48]:
#replace C with python
string= "Welcome to C course"
num = re.sub(r'C', "Python", string)    
print ("String is : ", num)

String is :  Welcome to Python course


In [47]:
# Remove anything other than digits
phone= "for any enquiry please contact 95666-9556666"
num = re.sub(r'\D', "", phone)    
print ("Phone Num : ", num)

Phone Num :  9566696666


## Regular Expression Modifiers: Option Flags

|Modifier       | Description   | 
|:-------------:|:-------------:| 
|re.I|Performs case-insensitive matching.|
|re.L|Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B).|
|re.M|Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).|
|re.S|Makes a period (dot) match any character, including a newline.|
|re.U|Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.|
|re.X|Permits "cuter" regular expression syntax. It ignores whitespace (except inside a set [ ] or when escaped by a backslash) and treats unescaped # as a comment marker.|

## Regular Expression Patterns

<b>
Except for the control characters, <font color="red">(+ ? . * ^ $ ( ) [ ] { } | \)</font>, all characters match themselves. You can escape a control character by preceding it with a backslash.
</b>


### Character Classes

In [3]:
import re
string = "A quick brown fox jumps over a lazy Dog."
re.findall('Dog',string) #Match "Dog"

['Dog']

In [17]:
# match "Dog" or "dog"
print(re.findall('[dD]og',string)) 
string = "A quick brown fox jumps over a lazy dog."
print(re.findall('[dD]og',string))

['Dog']
['dog']


In [32]:
#Match any one lowercase vowel
string = "A quick brown fox jumps over a lazy Dog."
print(re.findall(r'[aeiou]',string))

#Match anything other than a lowercase vowel
print(re.findall(r'[^aeiou]',string))

['u', 'i', 'o', 'o', 'u', 'o', 'e', 'a', 'a', 'o']
['A', ' ', 'q', 'c', 'k', ' ', 'b', 'r', 'w', 'n', ' ', 'f', 'x', ' ', 'j', 'm', 'p', 's', ' ', 'v', 'r', ' ', ' ', 'l', 'z', 'y', ' ', 'D', 'g', '.']


In [34]:
#Match any digit; 
string = "cube of 4 is 64"
print(re.findall(r'[0-9]',string))  #same as [0123456789]
print(re.findall(r'[0-9][0-9]',string)) #consecutive occurence of two digit

#Match anything other than a digit
print(re.findall(r'[^0-9]',string))

['4', '6', '4']
['64']
['c', 'u', 'b', 'e', ' ', 'o', 'f', ' ', ' ', 'i', 's', ' ']


In [28]:
string = "A quick brown fox jumps over a lazy Dog."
print(re.findall(r'[a-z]',string)) #Match any lowercase ASCII letter
print(re.findall(r'[A-Z]',string )) #Match any uppercase ASCII letter

['q', 'u', 'i', 'c', 'k', 'b', 'r', 'o', 'w', 'n', 'f', 'o', 'x', 'j', 'u', 'm', 'p', 's', 'o', 'v', 'e', 'r', 'a', 'l', 'a', 'z', 'y', 'o', 'g']
['A', 'D']


In [31]:
#Match any of the above
string = "phone is: ind/#92501444"
print(re.findall(r'[a-zA-Z0-9]',string))


['p', 'h', 'o', 'n', 'e', 'i', 's', 'i', 'n', 'd', '9', '2', '5', '0', '1', '4', '4', '4']


# Program to extract out Name and Age from String

In [44]:
import re

NameAge = ''' Ram is 10 and he is fond of listening music
              Mohan is 12 and he loves to play cricket
              Manish is 15 and he is studious
              Sooraj is 5 and yongest among all.'''

ages = re.findall(r'\d{1,3}',NameAge)
names = re.findall(r'[A-Z][a-z]*',NameAge)

agedict = dict(zip(names, ages))
print(agedict)

{'Ram': '10', 'Mohan': '12', 'Manish': '15', 'Sooraj': '5'}


# Filter Email

In [60]:
import re
email = "abc@gmail.com ab.com hi.com py@.com pqr21@yahoo.com "
matches = re.findall(r'[\w._%+-]{1,20}@[\w]{2,20}.[A-Za-z]{2,3}',email)
print("Number of Valid Email Address are:", len(matches))
print("Valid Email Address are :", matches)

Number of Valid Email Address are: 2
Valid Email Address are : ['abc@gmail.com', 'pqr21@yahoo.com']
