### A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.
### RegEx can be used to check if a string contains the specified search pattern.


#### Finding Patterns of Text Without Regular Expressions
Say you want to find an American phone number in a string. You know the
pattern if you’re American: three numbers, a hyphen, three numbers, a
hyphen, and four numbers. Here’s an example: 415-555-4242.

In [2]:
def isPhoneNumber(text):
    if len(text) != 12:
        return False
    for i in range(0, 3):
        if not text[i].isdecimal():
            return False
    if text[3] != '-':
        return False
    for i in range(4, 7):
        if not text[i].isdecimal():
            return False
    if text[7] != '-':
        return False
    for i in range(8, 12):
        if not text[i].isdecimal():
            return False
    return True

In [5]:
print('Is 415-555-4242 a phone number?')
print(isPhoneNumber('415-555-4242'))
print('Is xxxx-123-345 a phone number?')
print(isPhoneNumber('Moshi moshi'))

Is 415-555-4242 a phone number?
True
Is xxxx-123-345 a phone number?
False


In [21]:
message = 'Call me at 415-555-1011 tomorrow. 415-555-9999 is my office.'
for i in range(len(message)):
    chunk = message[i:i+12]
    # print(chunk)
    if isPhoneNumber(chunk):
        print('Phone number found: ' + chunk)
print('We are Done')

Phone number found: 415-555-1011
Phone number found: 415-555-9999
We are Done


What about a phone number formatted like 415.555.4242 or (415) 555-4242? 
What if the phone number had an extension, like 415-555-4242 x99? 
The isPhoneNumber() function would fail to validate them.
We could add yet more code for these additional patterns, but there is an easier way.

#### Finding Patterns of Text with Regular Expressions
Regular expressions, called regexes for short, are descriptions for a pattern
of text. For example, a \d in a regex stands for a digit character—that
is, any single numeral from 0 to 9. The regex \d\d\d-\d\d\d-\d\d\d\d is used
by Python to match the same text pattern the previous isPhoneNumber() function did

But regular expressions can be much more sophisticated. For example,
adding a 3 in braces ({3}) after a pattern is like saying, “Match this pattern
three times.” So the slightly shorter regex \d{3}-\d{3}-\d{4} also matches the
correct phone number format.

In [8]:
# Creating Regex Objects
import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-4242.')
print('Phone number found: ' + mo.group())

Phone number found: 415-555-4242


Passing a string value representing your regular expression to
**re.compile()** returns a Regex pattern object (or simply, a Regex object).<br>
A Regex object’s search() method searches the string it is passed for any
matches to the regex. The search() method will return None if the regex pattern
is not found in the string. <br>If the pattern is found, the search() method
returns a Match object, which have a group() method that will return the
actual matched text from the searched string. <br>
In the code cell above, we pass our desired pattern to re.compile() and store the resulting
Regex object in phoneNumRegex. <br>Then we call search() on phoneNumRegex and
pass search() the string we want to match for during the search. The result
of the search gets stored in the variable mo.<br> In above example, we know that
our pattern will be found in the string, so we know that a Match object will
be returned. Knowing that **mo** contains a Match object and not the null value
None, we can call group() on **mo** to return the match. <br>Writing mo.group() inside
our print() function call displays the whole match, 415-555-4242.

In [13]:
# Grouping with Parentheses
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My number is 415-555-4242.')
print(mo.group(1))
print(mo.group(2))
print(mo.group(0))
print(mo.group())
print(mo.groups()) # Total groups
areaCode, mainNumber = mo.groups()
print(areaCode)
print(mainNumber)

415
555-4242
415-555-4242
415-555-4242
('415', '555-4242')
415
555-4242


### Review of Regular Expression Matching
While there are several steps to using regular expressions in Python, each
step is fairly simple.
1. Import the regex module with import re.
2. Create a Regex object with the re.compile() function. (Remember to use
a raw string.)
3. Pass the string you want to search into the Regex object’s search()
method. This returns a Match object.
4. Call the Match object’s group() method to return a string of the actual
matched text.

In [14]:
# phone numbers you are trying to match have the area code set in parentheses. 
# In this case, we need to escape the ( and ) characters with a backslash.
phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My phone number is (415) 555-4242.')
print(mo.group())

(415) 555-4242


In [15]:
# Matching Multiple Groups with the Pipe
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = batRegex.search('Batmobile lost a wheel')
mo.group()

'Batmobile'

In [17]:
# Optional Matching with the Question Mark
batRegex = re.compile(r'Bat(wo)?man')
mo1 = batRegex.search('The Adventures of Batman')
print(mo1.group())
mo2 = batRegex.search('The Adventures of Batwoman')
print(mo2.group())

"""The (wo)? part of the regular expression means that the pattern wo is
an optional group. The regex will match text that has zero instances or one
instance of wo in it. This is why the regex matches both 'Batwoman' and 'Batman'.
 """


Batman
Batwoman


"The (wo)? part of the regular expression means that the pattern wo is\nan optional group. The regex will match text that has zero instances or one\ninstance of wo in it. This is why the regex matches both 'Batwoman' and 'Batman'.\n "

In [18]:
"""Using the earlier phone number example, we can make the regex look
for phone numbers that do or do not have an area code."""
phoneRegex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d')
mo1 = phoneRegex.search('My number is 415-555-4242')
print(mo1.group())
# Phone number without area code
mo2 = phoneRegex.search('My number is 555-4242')
print(mo2.group())

415-555-4242
555-4242


In [77]:
# Matching Specific Repetitions with Braces
"""For example, the regex 
(Ha){3} will match the string 'HaHaHa', but it will not match 'HaHa',
since the latter has only two repeats of the (Ha) group.\
The regex (Ha){3,5} will match 'HaHaHa', 'HaHaHaHa', and 'HaHaHaHaHa'.
(Ha){3,} will match three or more instances of the (Ha) group, 
while (Ha){,5} will match zero to five instances.
These two regular expressions also match identical patterns:
(Ha){3}
(Ha)(Ha)(Ha)
(Ha){3,5}
((Ha)(Ha)(Ha))|((Ha)(Ha)(Ha)(Ha))|((Ha)(Ha)(Ha)(Ha)(Ha))
"""
haRegex = re.compile(r'(Ha){3}')
mo1 = haRegex.search('HaHaHa')
print(mo1.group())
mo2 = haRegex.search('Ha')
print(mo2 == None)


HaHaHa
True


In [62]:
# findall function
msg = "The quick brown fox jumps over the lazy dog"
x = re.findall("[A-Za-z]he", msg)
print(x)

['The', 'the']


In [66]:
# split function
msg = "The quick brown fox jumps over the lazy dog"
x = re.split("\s", msg, 2)
print(x)

['The', 'quick', 'brown fox jumps over the lazy dog']


In [71]:
msg = "      Tariq Ahmad      "
y = msg.strip()
print(y)

Tariq Ahmad


In [40]:
# sub function
msg = "The quick brown fox jumps over the lazy dog"
x = re.sub("\s", '$', msg)
print(x)

The$quick$brown$fox$jumps$over$the$lazy$dog


### Match object
#### A Match Object is an object containing information about the search and the result.
The Match object has properties and methods used to retrieve information about the search, and the result:

.span() returns a tuple containing the start-, and end positions of the match <br>
.string returns the string passed into the function <br>
.group() returns the part of the string where there was a match

In [55]:
msg = "The quick brown fox jumps over the lazy dog"
x = re.search("brown", msg)
print(x) #this will print an object

<_sre.SRE_Match object; span=(10, 15), match='brown'>


In [47]:
# span() returns beginning and end of the searched word
msg = "The quick brown fox jumps over the lazy dog"
x = re.search("brown", msg)
print(x.span()) 

(10, 15)


In [75]:
# string returns the entire string if the searched word is present
msg = "The quick brown fox jumps over the lazy dog"
x = re.search("fox", msg)
print(x.string) 

The quick brown fox jumps over the lazy dog


In [53]:
# group returns the part of the string where there was a match
msg = "The quick brown fox jumps over the lazy dog"
x = re.search("brown", msg)
print(x.group())

brown
