## Important Notes

#### 1
Python’s regular expressions are greedy by default, which means that in
ambiguous situations they will match the longest string possible. The non
greedy
version of the curly brackets, which matches the shortest string pos
sible,
has the closing curly bracket followed by a question mark.

In [2]:
greedyHaRegex = re.compile(r'(Ha){3,5}')
mo1 = greedyHaRegex.search('HaHaHaHaHa')
mo1.group()


'HaHaHaHaHa'

In [3]:
nongreedyHaRegex = re.compile(r'(Ha){3,5}?')
mo2 = nongreedyHaRegex.search('HaHaHaHaHa')
mo2.group()

'HaHaHa'

#### 2
There are times when you want to match a set of characters but the shorthand
character classes (\d, \w, \s, and so on) are too broad. You can define
your own character class using square brackets. For example, the character
class [aeiouAEIOU] will match any vowel, both lowercase and uppercase. <br>
By placing a caret character (^) just after the character class’s opening
bracket, you can make a negative character class. A negative character class
will match all the characters that are not in the character class.
#### 3
The dot character means “any single character except the newline. By passing re.DOTALL as
the second argument to re.compile(), you can make the dot character match
all characters, including the newline character.
#### 4
To make your regex case-insensitive,
you can pass re.IGNORECASE or re.I as a second argument to re.compile().
#### 5
Sometimes you may need to use the matched text itself as part of the
substitution. In the first argument to sub(), you can type \1, \2, \3, and so
on, to mean “Enter the text of group 1, 2, 3, and so on, in the substitution.”

In [5]:
agentNamesRegex = re.compile(r'Agent (\w)\w*')
agentNamesRegex.sub(r'\1****', 'Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.')

'A**** told C**** that E**** knew B**** was a double agent.'

#### 6
You can mitigate the readability issue by telling the re.compile() function
to ignore whitespace and comments inside the regular expression string.
This “verbose mode” can be enabled by passing the variable re.VERBOSE as
the second argument to re.compile(). eg. Instead of

In [6]:
phoneRegex = re.compile(r'((\d{3}|\(\d{3}\))?(\s|-|\.)?\d{3}(\s|-|\.)\d{4}(\s*(ext|x|ext.)\s*\d{2,5})?)')

You can type

In [8]:
phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))? # area code
    (\s|-|\.)? # separator
    \d{3} # first 3 digits
    (\s|-|\.) # separator
    \d{4} # last 4 digits
    (\s*(ext|x|ext.)\s*\d{2,5})? # extension
    )''', re.VERBOSE)

<b>•</b> The ? matches zero or one of the preceding group.                 
<b>•</b> The * matches zero or more of the preceding group.                   <br>
<b>•</b> The + matches one or more of the preceding group.<br>
<b>•</b> The {n} matches exactly n of the preceding group.<br>
<b>•</b> The {n,} matches n or more of the preceding group.<br>
<b>•</b> The {,m} matches 0 to m of the preceding group.<br>
<b>•</b> The {n,m} matches at least n and at most m of the preceding group.<br>
<b>•</b> {n,m}? or *? or +? performs a nongreedy match of the preceding group.<br>
<b>•</b> ^spam means the string must begin with spam.<br>
<b>•</b> spam$ means the string must end with spam.<br>
<b>•</b> The . matches any character, except newline characters.<br>
<b>•</b> \d, \w, and \s match a digit, word, or space character, respectively.<br>
<b>•</b> \D, \W, and \S match anything except a digit, word, or space character,
<b>r</b>espectively.<br>
<b>•</b> [abc] matches any character between the brackets (such as a, b, or c).<br>
<b>•</b> [^abc] matches any character that isn’t between the brackets.<br>

## Excersices

In [1]:
import re

#### 20
How would you write a regex that matches a number with commas for
every three digits? It must match the following: <br>
• '42' <br>
• '1,234'<br>
• '6,368,745'<br>
but not the following:<br>
• '12,34,567' (which has only two digits between the commas)<br>
• '1234' (which lacks commas)<br>

In [130]:
reg = re.compile(r"""
    (^\d{1,3}$)? #Upto three digits
    (^\d{1,3},\d{3}$)? #Upto 6 digits
    (^\d{1,3},\d{3},\d{3}$)? #Upto 9 digits
""", re.VERBOSE)
text = '12,34,567'
reg.search(text)

<re.Match object; span=(0, 0), match=''>

#### 21
21. How would you write a regex that matches the full name of someone
whose last name is Nakamoto? You can assume that the first name that
comes before it will always be one word that begins with a capital letter.
The regex must match the following: <br>
• 'Satoshi Nakamoto'<br>
• 'Alice Nakamoto' <br>
• 'RoboCop Nakamoto'    <br>
but not the following:  <br>
• 'satoshi Nakamoto' (where the first name is not capitalized) <br>
• 'Mr. Nakamoto' (where the preceding word has a nonletter character) <br>
• 'Nakamoto' (which has no first name) <br>
• 'Satoshi nakamoto' (where Nakamoto is not capitalized)

In [150]:
reg = re.compile(r'''
    [A-Z] #First letter Capital
    \w* #Rest of the letters, no periods or special characters
    \s #Space
    Nakamoto #LAst name should be Nakamoto
''', re.VERBOSE)

text = 'Satoshi Nakamoto'
reg.search(string=text).group()

'Satoshi Nakamoto'

### Practice Projects

#### Strong Password Detection 
Write a function that uses regular expressions to make sure the password
string it is passed is strong. A strong password is defined as one that is at
least eight characters long, contains both uppercase and lowercase characters,
and has at least one digit. You may need to test the string against multiple
regex patterns to validate its strength.

In [159]:
reg1 = re.compile(r'''
    ([A-Z]){1,}   #At least one capital letter
    ([a-z]){1,}   #At least one lower letter   
    # ([A-Za-z0-9@#$%^&]){8,}?
''', re.VERBOSE)
password = 'aAbchyt123'
reg.search(string=password)

<re.Match object; span=(1, 7), match='Abchyt'>

In [180]:
reg1 = re.compile(r"[A-Z]")
reg2 = re.compile(r"[a-z]")
reg3 = re.compile(r"[0-9]")
# reg4 = re.compile(r"[^@#$%^&]")
reg4 = re.compile(r"[A-Za-z0-9@#$%^&]{8,}")


In [183]:
password = 'Aaa1hga'
if reg1.search(password) and reg2.search(password) and reg3.search(password) and reg4.search(password):
    print('Valid password')
elif not reg1.search(password):
    print('Password should have at least one Capital letter.')
elif not reg2.search(password):
    print('Password should have at least one lower letter.')
elif not reg3.search(password):
    print('Password should have at least one digit.')
elif not reg4.search(password):
    print('Password should have at least 8 letters long. Or you have used some special characters which are not allowed')
    print("Allowed special characters are : @ # % ^ &")

Password should have at least 8 letters long. Or you have used some special characters which are not allowed
Allowed special characters are : @ # % ^ &


#### Regex Version of strip()
Write a function that takes a string and does the same thing as the strip()
string method. If no other arguments are passed other than the string to
strip, then whitespace characters will be removed from the beginning and
end of the string. Otherwise, the characters specified in the second argument
to the function will be removed from the string.

In [233]:
def my_strip_function(string, chars=None):
    
    if chars is None:
        reg = re.compile(r'^\s+|\s+$')
        if reg.search(string) is not None:
            return reg.sub('', string)
        else:
            return string
    else:
        characters = list(chars)
        pattern = ",".join(characters)
        pattern = "[" + pattern + "]"
        reg = re.compile(pattern)
        if reg.search(string) is not None:
            print(reg.sub('', string))
        else:
            print(string)

In [232]:
my_strip_function('   Hello World   ', 'oWl')

[o,W,l]
   He rd   
