# Course Objectives

In this course, we are going to focus on three learning objectives:

- Construct regex patterns

- Validate passwords and user input in web forms

- Extract patterns and replace strings with regex

By the end of this course, you will be able to use regex patterns to validate web forms, extract and replace strings with regex.

# Project Structure:

The hands on project on Regular Expressions in Python is divided into following tasks:'

Task #1: Introduction to Regular Expressions in Python

Task #2: Intermediate Regular Expressions in Python

Task #3: Password Validation with Regular Expressions

Task #4: Form and User Input Validation with Regular Expressions

Task #5: Extraction and Word Replacement from Server Logs

In [3]:
# regex library
import re

### Metacharacters

^ (carrot sign) - used to search a patern at the start of a string

$ (dollar sign) - used to search a patern at the end of a string

In [4]:
st = 'dogs are adorable'

In [5]:
# search for a pattern
p = '^dogs'

In [6]:
re.search(p, st)

<re.Match object; span=(0, 4), match='dogs'>

The output of the above cell shows that indeed it has found the pattern p and has returned the matched object.

The output is a tuple and it shows where the pattern was found (location (0,4) )

In [7]:
p2 = 'dogs$'

In [8]:
re.search(p2,st)

In [9]:
print(re.search(p2,st))

None


if there is no match, just like in the case above its going to return None.

there is no dogs at the end of the string st

### Quantifiers

? - 0 or 1 occurance of the preceeding element

+  -  1 or more occurance of the preceding element

* (asterisk) - 	0 or 1 times

In [10]:
st2 = 'color'
st3 = 'colour'

In [11]:
# here it means that u is option, the character/element before the questionmark is optional , 0 or 1 occurance of the preceeding element

re.search('colou?r',st2)

<re.Match object; span=(0, 5), match='color'>

In [12]:
re.search('colou?r',st3)

<re.Match object; span=(0, 6), match='colour'>

In [13]:
st4 = 'ac'
st5 = 'abc'
st6 = 'abbc'

In [14]:
# here it means that u needs to be there at least once, or more
print(re.search('ab+c',st4))

# in this case st4 = 'ac' there is no b so it returns nothing

None


In [15]:
print(re.search('ab+c',st5))

<re.Match object; span=(0, 3), match='abc'>


In [16]:
print(re.search('ab+c',st6))

<re.Match object; span=(0, 4), match='abbc'>


In [17]:
re.search('a*',st6)

<re.Match object; span=(0, 1), match='a'>

In [18]:
re.search('d*',st6)

<re.Match object; span=(0, 0), match=''>

In [19]:
print(re.search('a*b*c*',st6))

<re.Match object; span=(0, 4), match='abbc'>


### Metacharacters


(hyphen) means a range  -

. (any single character except the new line character)

Sets
[ ]

[a-n] returns matches for any character in the alphabet between a and n (lowercase)

[^a-n] returns matches for any character in the alphabet except from a to n range

[0145] will return matches for characters 0 or 1 or 4 or 5 

[0-9] will search for any number between 0 and 9

[0-5][0-9] will search for any 2 digit number between 00 and 59

[a-Z] will return matches for all characters in the alphabet whether they are lowercase or uppercase




In [20]:
st = 'dogs are adorable'

In [21]:
re.search('[a-d]',st)

<re.Match object; span=(0, 1), match='d'>

In [22]:
re.findall('[a-d]',st)

['d', 'a', 'a', 'd', 'a', 'b']

### Special Sequences

\d returns a match where the string contains any digit  [ 0 - 9 ]

\w returns a match where the string contains any element and underscore _ [ 0-9 a - Z _]

In [23]:
re.search('\d', 'my password is pass1234')

<re.Match object; span=(19, 20), match='1'>

it has returned the first digit and its location in a tuple

In [24]:
re.search('\d\d', 'my password is pass1234')
# finds 2 digits

<re.Match object; span=(19, 21), match='12'>

In [25]:
re.findall('\d', 'my password is pass1234')
# finds all the digits

['1', '2', '3', '4']

In [26]:
re.search('\w', 'my password is pass1234')
# finds the first element and returns it

<re.Match object; span=(0, 1), match='m'>

In [27]:
re.findall('\w', 'my password is pass1234')
#returns all the alphanumerical elements - note that it does not return the spaces or ? or - or + , only underscrore

['m',
 'y',
 'p',
 'a',
 's',
 's',
 'w',
 'o',
 'r',
 'd',
 'i',
 's',
 'p',
 'a',
 's',
 's',
 '1',
 '2',
 '3',
 '4']

## Password Validation in Python with REGEX 

You need to write a regex that will validate a password to make sure it meets the following criteria:

- At least 8 characters
- Uppercase letters: A-Z
- Lowercase letters: a-z
- numbers: 0-9
- any special characters: @#$%^&+=

Letters/numbers/special characters are optional


In [28]:
import re
# all the requirements are optional execept the number of characters that the password needs to be
regex1 = '[A-Za-z0-9@#$%^&+=]{8,}'


In [29]:
# pwd = input("enter a password")
pwd = 'hey123456'

In [30]:
pwd

'hey123456'

In [31]:
type(pwd), type(regex1)

(str, str)

In [32]:
re.fullmatch(regex1, pwd)

<re.Match object; span=(0, 9), match='hey123456'>

In [33]:
if re.fullmatch(regex1,pwd):
    print("congratulations, there is a match!")
else:
    print("sorry there is no match")

congratulations, there is a match!


## Password Validation in Python with REGEX 

You need to write a regex that will validate a password to make sure it meets **ALL** the following criteria:

- At least 6 characters long
- Contains Uppercase letter: A-Z
- Contains Lowercase letter: a-z
- Contains number: 0-9
- Contains any special characters: @#$%^&+=

Valid password will only be alphanumeric characters.


In [34]:
# assertions:

# POSSITIVE LOOKAHEAD ASSERTION
# An assertion in regex is when a match is possible in some way.
# look at the string from left to right and search for a match.

regex = '(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9]{6,})'

In [35]:
re.fullmatch(regex,pwd)

#it has returned None


In [36]:
if re.fullmatch(regex,pwd):
    print("match, yay!")
else:
    print("sorry no match")


sorry no match


## Task 4: User input 

**Time Format Validation**

A web application calculates health statistic based on the sleep duration of your users. 

Your users enter the time they went to bed and the time they wake up. 

An example for a correct time format is 12:45

Write a time-format checker that determines whether the input is worth processing further with your backend application.



In [37]:
inputs = ['18:29', '23:55', '123', 'ab:de', '18:299', '99:99']

In [38]:
input1 = '12:455'
input2 = '12:48'

In [39]:
re.fullmatch('[0-9]{2}:[0-9]{2}', input1)

# returns None - meaning its not the correct format

In [40]:
re.fullmatch('[0-9]{2}:[0-9]{2}', input2)

<re.Match object; span=(0, 5), match='12:48'>

In [41]:
[re.fullmatch('[0-9]{2}:[0-9]{2}', x) for x in inputs]

[<re.Match object; span=(0, 5), match='18:29'>,
 <re.Match object; span=(0, 5), match='23:55'>,
 None,
 None,
 None,
 <re.Match object; span=(0, 5), match='99:99'>]

Next, the given time must be a valid time format in the 24-hour time ranging from 00:00 to 23:59

In [42]:
# anything between 00 and 23
regex = '([01][0-9]|2[0-3])'

# first digit is either 0 or the second is between 0 and 9    OR   the first digit is either 2 and the second between 0 and 3


In [43]:
# anything between 00:00 and 23:59
regex = '([01][0-9]|2[0-3]):([0-5][0-9])'

In [44]:
[re.fullmatch(regex, x) for x in inputs]

[<re.Match object; span=(0, 5), match='18:29'>,
 <re.Match object; span=(0, 5), match='23:55'>,
 None,
 None,
 None,
 None]

## Email Validation 

In [45]:
inputs = ['rog45@gmail.com','r_duke78o@outlook.com','s.rog78o@outlook.com','r_duke78o@outlook.coma','s.rog78$o@outlook.com']

In [46]:
regex = '^(\w|\.|\_|\-)+[@]\w+[.]\w{2,3}$'

In [47]:
[re.fullmatch(regex, x) for x in inputs]

[<re.Match object; span=(0, 15), match='rog45@gmail.com'>,
 <re.Match object; span=(0, 21), match='r_duke78o@outlook.com'>,
 <re.Match object; span=(0, 20), match='s.rog78o@outlook.com'>,
 None,
 None]

In [48]:
string_e = 'From: steve_doe@outlook.com, alex.drake@hotmail.fr, olivia.Montana123@gmail.com'

In [49]:
print(re.findall('From:.+?@',string_e))

['From: steve_doe@']


In [50]:
string_e2 = ['From: steve_doe@outlook.com', 'From: alex.drake@hotmail.fr', 'olivia.Montana123@gmail.com']

In [51]:
[re.findall('From:.+?@',string_e) for x in string_e2]

[['From: steve_doe@'], ['From: steve_doe@'], ['From: steve_doe@']]

## User validation

username is a character only allowing underscore _ and . only 

In [52]:
name_inputs = ['a_roger','aroges','a.roger_de','a.roger_2']

In [53]:
regex = '^[a-zA-Z_.]+$'

In [54]:
[re.fullmatch(regex, x) for x in name_inputs]

[<re.Match object; span=(0, 7), match='a_roger'>,
 <re.Match object; span=(0, 6), match='aroges'>,
 <re.Match object; span=(0, 10), match='a.roger_de'>,
 None]

### Task 5

Questionmark ? is the 'option' regex meaning 0 or 1 occurance.

Questionmark ? can also be used in combination with other special characters and means something else:

asterisk * means 0 or more occurance of the preceding character

By default, the engine returns 'more occurences' (greedy). If we want to force the 0 occurrences, we use ? after the * (non greedy)

In [57]:
text = 'abdcefghujhg'

In [58]:
re.findall('ab.*', text)

['abdcefghujhg']

In [59]:
re.findall('ab.*?', text)

['ab']

In [61]:
text2 = 'peter piper picked a peck of pickled peppers'

In [62]:
re.findall('p.*e.*r', text2)

['peter piper picked a peck of pickled pepper']

In [63]:
re.findall('p.*?e.*?r', text2)

['peter', 'piper', 'picked a peck of pickled pepper']

Problem 1:

Find a match that starts with 'crypto', then matches at the most 30 arbitary characters and match ends when the last word is 'coin'

In [64]:
text3 = 'crypto-bot that is trading Bitcoin and other currencies'

In [66]:
# the parenthesis creates a group of any character dot but from 1 to 30 at most
re.match('crypto(.{1,30})coin', text3)

<re.Match object; span=(0, 34), match='crypto-bot that is trading Bitcoin'>

Problem 2:

given a string, find a list of all occurences of dollar amounts with optional Regular Expressions decimal values

In [67]:
text4 = '''
If you invested $1 in the year 1801, you would have $18087791.41 today.
This is a 7.967% return on investment.
But if you invested only $0.25 in 1801 you would end up with $4521947.8525
'''

In [73]:
# because we want the dollar symbol but it is used as a wildcard and has a special meaning in regex expressions
# we use a backlash to escape that - same for . period or dot
re.findall('(\$[0-9]+(\.[0-9]*)?)',text4)

[('$1', ''),
 ('$18087791.41', '.41'),
 ('$0.25', '.25'),
 ('$4521947.8525', '.8525')]

Proble 3:

Replace Alice Wonderland with Alice Doe, but do not replace occurences of Alice Wonderland when you see single quotes. 

Introducing the '?!' Negative lookahead pattern


In [78]:
text5 = """
Alice Wonderland married John Doe.
The new name of former 'Alice Wonderland' is 'Alice Doe'.
Alice Wonderland replaces her old name 'Wonderland' with her new name 'Doe'.
Alice's sister Jane Wonderland still keeps her old name.
"""

In [79]:
re.sub("Alice Wonderland(?!')",'Alice Doe', text5)

"\nAlice Doe married John Doe.\nThe new name of former 'Alice Wonderland' is 'Alice Doe'.\nAlice Doe replaces her old name 'Wonderland' with her new name 'Doe'.\nAlice's sister Jane Wonderland still keeps her old name.\n"

## Examples

\d is for digits [0-9]

In [None]:
line = ['Three dollars is all', 'It will cost you $1.52', 'It will cost you 1.52 dollars']

[re.search('\$.+',x) for x in line]


In [None]:
# if re.search('^From:', line):
#     print(line)

# will print any line that starts with 'From'