# Lab13 - Using RegEx (re) with Python

### Author: <font color='red'> Brian Adams </font>

<div class="alert alert-block alert-warning">
<b>ATTENTION:</b> This lab corresponds to the YouTube video: 
    
<em>Python Tutorial: re Module - How to Write and Match Regular Expressions (Regex) by Corey Schafer (Oct 24, 2017)</em><br>
    <b>Watch this YouTube video and complete the code to complete the Lab.<b><br><br>
    <b>URL: https://youtu.be/K8L6KVGG-7o </b>
</div>

## URL: https://youtu.be/K8L6KVGG-7o 

## Lab 13 
#### 10 Questions @ 2 points each = 20 points
#### Part A, B, C & D @ 4 points each part = 80 points

In [1]:
# Import statements
import re

In [2]:
# Character strings 
text_to_search = '''
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890

Ha HaHa

MetaCharacters (Need to be escaped):
. ^ $ * + ? { } [ ] \ | ( )

coreyms.com

321-555-4321
123.555.1234

Mr. Schafer
Mr Smith
Ms. Davis
Mrs. Robinson
Mr. T
'''

Python __raw string__ is created by prefixing a string literal with ‘r’ or ‘R’. Python raw string treats backslash (\) as a literal character. This is useful when we want to have a string that contains backslash and don’t want it to be treated as an escape character. 

[Ref: https://www.journaldev.com/23598/python-raw-string ]

In [4]:
# Print string '\tTab' 

### INSERT CODE HERE ###
print('\tTab')

	Tab


In [6]:
# Print RAW string '\tTab'

### INSERT CODE HERE ###
print(r'\tTab')

\tTab


## Part A - RegEx Intro (52 points)

__Compiling Regular Expressions__

Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. 

[Ref: https://docs.python.org/3/howto/regex.html ]

In [7]:
# Define pattern for re.compile() to search text for pattern 'abc'
### INSERT CODE HERE ###
pattern = re.compile(r'abc')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(text_to_search)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

<re.Match object; span=(1, 4), match='abc'>


<font color='blue'> <strong>QUESTION 1: </strong> What is the beginning and ending index of the match? </font>

<font color='red'> <strong>ANSWER 1: </strong> ### INSERT ANSWER HERE ### </font>

it is the indexes within our text to search variable where abc matched (indexes 1 to 4)

In [8]:
# Use Python slicing to print the text that matched
### INSERT CODE HERE ###
print(text_to_search[1:4])

abc


__Python Regex Metacharacters__

Metacharacters are considered as the building blocks of regular expressions. Regular expressions are patterns used to match character combinations in the strings. Metacharacter has special meaning in finding patterns and are mostly used to define the search criteria and any text manipulations.
[Ref: https://www.geeksforgeeks.org/python-regex-metacharacters/ ]

In [10]:
# Re-define pattern for re.compile() to search for pattern with metacharacter '.'
# NOTE: You will need to copy & paste the code from above for matches & printing them
### INSERT CODE HERE ###
pattern = re.compile(r'\.')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(text_to_search)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

<re.Match object; span=(113, 114), match='.'>
<re.Match object; span=(149, 150), match='.'>
<re.Match object; span=(171, 172), match='.'>
<re.Match object; span=(175, 176), match='.'>
<re.Match object; span=(184, 185), match='.'>
<re.Match object; span=(205, 206), match='.'>
<re.Match object; span=(216, 217), match='.'>
<re.Match object; span=(229, 230), match='.'>


<font color='blue'> <strong>QUESTION 2</strong> - How many matches did you get?</font>

<font color='red'> <strong>ANSWER 2</strong>### INSERT ANSWER HERE ###</font>

we get 8 matches

In [11]:
# Re-define pattern for re.compile() to search for 'coreyms.com' 
# Remember to handle the metacharacter(s)
### INSERT CODE HERE ###
pattern = re.compile(r'coreyms\.com')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(text_to_search)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

<re.Match object; span=(142, 153), match='coreyms.com'>


<div class="alert alert-block alert-info">
    <b>NOTE:</b> The <strong>snippets.txt</strong> file has been provided for you if you'd like to view it during the lab. This file should be located in your current directory (the same directory that contains your Jupyter notebook file for this lab). </div>

<font color='blue'> <strong>QUESTION 3: </strong> - Using the snippets.txt file, what do the "backslash-Capital letter" patterns match? (\D, \W, \S, \B) </font>

<font color='red'> <strong>ANSWER 3:</strong>### INSERT ANSWERS HERE ### </font>

* \D Not a digit (0-9)
* \W Not a Word Character
* \S Not Whitespace (space, tab, newline)
* \B Not a word boundary

In [12]:
# Re-define pattern for re.compile() to search for 'Ha' values with a leading word boundary
# Remember to handle the metacharacter(s)
### INSERT CODE HERE ###
pattern = re.compile(r'\bHa')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(text_to_search)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

<re.Match object; span=(67, 69), match='Ha'>
<re.Match object; span=(70, 72), match='Ha'>


__Anchors__

Anchors are regex tokens that don't match any characters but that say or assert something about the string or the matching process. Anchors inform us that the engine's current position in the string matches a determined location: for example, the beginning of the string/line, or the end of a string/line.

This type of assertion is useful for  many reasons. First, it lets you specify that you want to match alphabets/digits at the beginning/end of a string/line, but not anywhere else. Second, when you tell the engine that you want to find a pattern at a certain location, it  need not find that pattern at any other locations. This is why  it is recommended to use anchors whenever possible.

__^ and $ are two examples of  anchor tokens in regex.__

[Ref: https://www.tutorialspoint.com/How-regular-expression-anchors-work-in-Python#:~:text=Anchors%20are%20regex%20tokens%20that,end%20of%20a%20string%2Fline.]

In [14]:
# Using the sentence variable (redefined here for clarity)
sentence = 'Start a sentence and then bring it to an end'

# Re-define pattern for re.compile() to search for 'end' using the "End of a String" anchor
# Rememeber to change text string you are searching!
### INSERT CODE HERE ###
pattern = re.compile(r'end$')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(sentence)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

<re.Match object; span=(41, 44), match='end'>


__Matching Phone Numbers__ using a simple pattern

In [15]:
# Re-define pattern for re.compile() to match both phone numbers in "text_to_search" string
# Remember to change text string you are searching!
### INSERT CODE HERE ###
pattern = re.compile(r'\d\d\d.\d\d\d.\d\d\d\d')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(text_to_search)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

<re.Match object; span=(155, 167), match='321-555-4321'>
<re.Match object; span=(168, 180), match='123.555.1234'>


<div class="alert alert-block alert-info">
    <b>NOTE:</b> The <strong>data.txt</strong> file has been provided for you. This file should be located in your current directory (the same directory that contains your Jupyter notebook file for this lab). </div>

In [30]:
# Using the pattern created in the previous step (to match phone numbers) ...
### INSERT CODE HERE ###
pattern = re.compile(r'\d\d\d.\d\d\d.\d\d\d\d')

# Open and read the contents of the "data.txt" file, search for phone numbers,
# and print the matches. 
# NOTE: You might want to count the number of matches 
          
### INSERT CODE HERE ###
with open('data.txt', 'r') as f:
          contents = f.read()
          
          matches = pattern.finditer(contents)
        
          count = 0  
          for match in matches:
            print(match)
            count+=1
            print(count)

<re.Match object; span=(12, 24), match='615-555-7164'>
1
<re.Match object; span=(102, 114), match='800-555-5669'>
2
<re.Match object; span=(191, 203), match='560-555-5153'>
3
<re.Match object; span=(281, 293), match='900-555-9340'>
4
<re.Match object; span=(378, 390), match='714-555-7405'>
5
<re.Match object; span=(467, 479), match='800-555-6771'>
6
<re.Match object; span=(557, 569), match='783-555-4799'>
7
<re.Match object; span=(647, 659), match='516-555-4615'>
8
<re.Match object; span=(740, 752), match='127-555-1867'>
9
<re.Match object; span=(829, 841), match='608-555-4938'>
10
<re.Match object; span=(915, 927), match='568-555-6051'>
11
<re.Match object; span=(1003, 1015), match='292-555-1875'>
12
<re.Match object; span=(1091, 1103), match='900-555-3205'>
13
<re.Match object; span=(1180, 1192), match='614-555-1166'>
14
<re.Match object; span=(1269, 1281), match='530-555-2676'>
15
<re.Match object; span=(1355, 1367), match='470-555-2750'>
16
<re.Match object; span=(1439, 1451), matc

<font color='blue'> <strong>QUESTION 4: </strong> How many phone numbers were found? <strong>HINT:</strong> Add a counter to print loop above to make this easier :-) </font>

<font color='red'> <strong>ANSWER 4: </strong>### INSERT ANSWER HERE ###</font>

100

__Matching Phone Numbers__ only if they have '-' or '.'

In [31]:
# Character strings (UPDATED WITH INVALID PHONE NUMBER STRING)
text_to_search = '''
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890

Ha HaHa

MetaCharacters (Need to be escaped):
. ^ $ * + ? { } [ ] \ | ( )

coreyms.com

321-555-4321
123.555.1234
123*555*1234

Mr. Schafer
Mr Smith
Ms. Davis
Mrs. Robinson
Mr. T

'''

In [33]:
# Re-define pattern for re.compile() to match only phone numbers in updated "text_to_search" string with '-' or '.'
### INSERT CODE HERE ###
pattern = re.compile(r'\d\d\d[-.]\d\d\d[-.]\d\d\d\d')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(text_to_search)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

<re.Match object; span=(155, 167), match='321-555-4321'>
<re.Match object; span=(168, 180), match='123.555.1234'>


__Matching Phone Numbers__ only if they are 800 | 900 numbers

In [34]:
# Re-define pattern for re.compile() to match only 800 & 900 phone numbers in the "data.txt" file
### INSERT CODE HERE ###
pattern = re.compile(r'[89]00[-.]\d\d\d[-.]\d\d\d\d')

# Open and read the contents of the "data.txt" file, search for only 800 & 900 phone numbers,
# and print the matches. 
# NOTE: You might want to count the number of matches 

### INSERT CODE HERE ###
with open('data.txt', 'r') as f:
          contents = f.read()
          
          matches = pattern.finditer(contents)
        
          count = 0  
          for match in matches:
            print(match)
            count+=1
            print(count)

<re.Match object; span=(102, 114), match='800-555-5669'>
1
<re.Match object; span=(281, 293), match='900-555-9340'>
2
<re.Match object; span=(467, 479), match='800-555-6771'>
3
<re.Match object; span=(1091, 1103), match='900-555-3205'>
4
<re.Match object; span=(1439, 1451), match='800-555-6089'>
5
<re.Match object; span=(1790, 1802), match='800-555-7100'>
6
<re.Match object; span=(2051, 2063), match='900-555-5118'>
7
<re.Match object; span=(2826, 2838), match='900-555-5428'>
8
<re.Match object; span=(3284, 3296), match='800-555-8810'>
9
<re.Match object; span=(3971, 3983), match='900-555-9598'>
10
<re.Match object; span=(4945, 4957), match='800-555-2420'>
11
<re.Match object; span=(5566, 5578), match='900-555-3567'>
12
<re.Match object; span=(6189, 6201), match='800-555-3216'>
13
<re.Match object; span=(6889, 6901), match='900-555-7755'>
14
<re.Match object; span=(7864, 7876), match='800-555-1372'>
15
<re.Match object; span=(8741, 8753), match='900-555-6426'>
16


<font color='blue'> <strong>QUESTION 5: </strong> - How many 800/900 phone numbers were found? <strong>HINT:</strong> Add a counter to print loop above to make this easier :-) </font>

<font color='red'> <strong>ANSWER 5: </strong> ### INSERT ANSWERS HERE ### </font>

16

__Matching specific characters__ while excluding others

In [35]:
# Character strings (UPDATED WITH JUST 3-CHAR WORDS)
text_to_search = '''
cat
mat
pat
bat

'''

In [36]:
# Re-define pattern for re.compile() to match only 3-char words that end in 'at' EXCEPT "bat"
### INSERT CODE HERE ###
pattern = re.compile(r'[^b]at')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(text_to_search)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

<re.Match object; span=(1, 4), match='cat'>
<re.Match object; span=(5, 8), match='mat'>
<re.Match object; span=(9, 12), match='pat'>


__Using Quantifiers__ ...

In [37]:
# Character strings (UPDATED JUST NAMES)
text_to_search = '''

Mr. Schafer
Mr Smith
Ms. Davis
Mrs. Robinson
Mr. T

'''

In [42]:
# Re-define pattern for re.compile() to match upper-case 'M' followed by 'r' followed by 0 or 1 '.'
# followed by an Uppercase letter followed by 0 or more characters
### INSERT CODE HERE ###
pattern = re.compile(r'Mr\.?\s[A-Z]\w*')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(text_to_search)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

<re.Match object; span=(2, 13), match='Mr. Schafer'>
<re.Match object; span=(14, 22), match='Mr Smith'>
<re.Match object; span=(47, 52), match='Mr. T'>


<font color='blue'> <strong>QUESTION 6: </strong> How many matches did you get? </font>

<font color='red'> <strong>ANSWER 6: </strong>### INSERT ANSWERS HERE ###</font>

3

__Using Quantifiers and Groups__ ...

In [43]:
# Re-define pattern for re.compile() to match ALL the names using Groups
# Use either of the 2 groupings demonstrated in the video
### INSERT CODE HERE ###
pattern = re.compile(r'M(r|s|rs)\.?\s[A-Z]\w*')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(text_to_search)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

<re.Match object; span=(2, 13), match='Mr. Schafer'>
<re.Match object; span=(14, 22), match='Mr Smith'>
<re.Match object; span=(23, 32), match='Ms. Davis'>
<re.Match object; span=(33, 46), match='Mrs. Robinson'>
<re.Match object; span=(47, 52), match='Mr. T'>


## Part B - Putting it all together (4 points)

__Matching email addresses__ ...

In [45]:
# Emails used for upcoming example
emails = '''
CoreyMSchafer@gmail.com
corey.schafer@university.edu
corey-321-schafer@my-work.net
'''

In [48]:
# Re-define pattern for re.compile() to match ALL the emails
# Follow the instructions provided in the video ending with the pattern that uses a group
# Remember to change finditer() to use the correct text string
### INSERT CODE HERE ###
pattern = re.compile(r'[a-zA-Z0-9.-]+@[a-zA-Z-]+\.(com|edu|net)')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(emails)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

<re.Match object; span=(1, 24), match='CoreyMSchafer@gmail.com'>
<re.Match object; span=(25, 53), match='corey.schafer@university.edu'>
<re.Match object; span=(54, 83), match='corey-321-schafer@my-work.net'>


<font color='blue'> <strong>QUESTION 7: </strong> What were the 3 text strings in the group? </font>

<font color='red'> <strong>ANSWER 7: </strong>### INSERT ANSWERS HERE ###</font>

the three emails 
* CoreyMSchafer@gmail.com
* corey.schafer@university.edu
* corey-321-schafer@my-work.net

## Part C - How to capture information from Groups (8 points)

__Matching URLs__ using groups ...

In [49]:
# URLs used for upcoming example
urls = '''
https://www.google.com
http://coreyms.com
https://youtube.com
https://www.nasa.gov
'''

In [55]:
# Re-define pattern for re.compile() to match ALL the URLs using Groups
# Follow the instructions provided in the video
# Remember to change finditer() to use the correct text string
### INSERT CODE HERE ###
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')

# Search text_to_search for the pattern using finditer() and save value(s) in matches 
### INSERT CODE HERE ###
matches = pattern.finditer(urls)

# Print all the matches
### INSERT CODE HERE ###
for match in matches:
    print(match.group(0))

https://www.google.com
http://coreyms.com
https://youtube.com
https://www.nasa.gov


<font color='blue'> <strong>QUESTION 8: </strong> What is the returned value when a group has no match? </font>

<font color='red'> <strong>ANSWER 8: </strong>### INSERT ANSWERS HERE ###</font>

None

__Using Substitution__ ...

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a carriage return, and so forth. Unknown escapes of ASCII letters are reserved for future use and treated as errors. Other unknown escapes such as \& are left alone. Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern. 

In [56]:
# Using the pattern from above, use the 'sub' capability 
### INSERT CODE HERE ###
subbed_urls = pattern.sub(r'\2\3', urls)

# Create new URls containing only the domain name & top level domain
### INSERT CODE HERE ###

# Print the new URLs
### INSERT CODE HERE ###
print(subbed_urls)


google.com
coreyms.com
youtube.com
nasa.gov



## Part D - Other methods  (16 points)

__findall__

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

In [58]:
# IMPORTANT: THIS DEVIATES FROM THE VIDEO ... BUT IT THINK IT IS WORTH SEEING

# Using the previous pattern with 3 groups ...
### INSERT CODE HERE ###
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')

# Use findall(urls) get the groups 
### INSERT CODE HERE ###
matches = pattern.findall(urls)

# Print matches
### INSERT CODE HERE ###
for match in matches:
    print(match)

('www.', 'google', '.com')
('', 'coreyms', '.com')
('', 'youtube', '.com')
('www.', 'nasa', '.gov')


<font color='blue'> <strong>QUESTION 9: </strong> Because there were groups in the pattern, what Python datatype was returned in the results list? </font>

<font color='red'> <strong>ANSWER 9: </strong>### INSERT ANSWERS HERE ###</font>

tuple

__match__

If zero or more characters at the beginning of string match this regular expression, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.

In [59]:
# Using the sentence variable (redefined here for clarity)
sentence = 'Start a sentence and then bring it to an end'

# Re-define pattern for re.compile() to search for 'Start'
### INSERT CODE HERE ###
pattern = re.compile(r'Start')
                     
# Use the match method to search pattern    
### INSERT CODE HERE ###
matches = pattern.match(sentence)

# Print the result
### INSERT CODE HERE ###
print(matches)

<re.Match object; span=(0, 5), match='Start'>


__search__

Scan through string looking for the first location where this regular expression produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

In [60]:
# Using the sentence variable (redefined here for clarity)
sentence = 'Start a sentence and then bring it to an end'

# Re-define pattern for re.compile() to search for 'sentence'
### INSERT CODE HERE ###
pattern = re.compile(r'Start')
                     
# Use the search method to search pattern    
### INSERT CODE HERE ###
matches = pattern.search(sentence)

# Print the result
### INSERT CODE HERE ###
print(matches)

<re.Match object; span=(0, 5), match='Start'>


__Option Flags__ 

Many Python Regex Functions and Regex Methods take a optional argument called “flags”. The flags modifies the meaning of the given regex pattern. To specify more than one of them, use | operator to connect them. For example, re.search(pattern,string,flags=re.IGNORECASE|re.MULTILINE|re.UNICODE).

![List of Flags](RegExOptionFlags.jpg)

[Ref: http://xahlee.info/python/python_regex_flags.html ]

In [62]:
# Using the sentence variable (redefined here for clarity)
sentence = 'Start a sentence and then bring it to an end'

# Re-define pattern for re.compile() to search for 'start' while ignoring case
### INSERT CODE HERE ###
pattern = re.compile(r'start', re.I)
                     
# Use the search method to search pattern    
### INSERT CODE HERE ###
matches = pattern.search(sentence)

# Print the result
### INSERT CODE HERE ###
print(matches)

<re.Match object; span=(0, 5), match='Start'>


<font color='blue'> <strong>QUESTION 10: </strong> According to the Regex Flags table above, what flag could be used to match a newline? </font>

<font color='red'> <strong>ANSWER 10: </strong>### INSERT ANSWERS HERE ###</font>

re.M

In [63]:
text_to_search = 'trala lafa lala'  
pattern = re.compile(r'\bla')    
matches = pattern.finditer(text_to_search)    

for match in matches:
    print(match)


<re.Match object; span=(6, 8), match='la'>
<re.Match object; span=(11, 13), match='la'>


In [64]:
text_to_search = '''  800-700-1234  800-100-1234  900-600-1234  900-900-1234  '''  

pattern = re.compile(r'[89]00[-.][67]00[-.]\d\d\d\d')    
matches = pattern.finditer(text_to_search)    
for match in matches:      
    print(match)


<re.Match object; span=(2, 14), match='800-700-1234'>
<re.Match object; span=(30, 42), match='900-600-1234'>


In [65]:
text_to_search = '''  
I am Sam  Sam I am  
That Sam-I-am!  
Than Sam-I-am!  
I do not like  
that Sam-I-am!    
Do you like green eggs and ham?  
I do not like them, Sam-I-am.  
I do not like green eggs and ham.    
'''    
pattern = re.compile(r'[^h-]am')    
matches = pattern.finditer(text_to_search)    
for match in matches:      
    print(match)


<re.Match object; span=(4, 7), match=' am'>
<re.Match object; span=(8, 11), match='Sam'>
<re.Match object; span=(13, 16), match='Sam'>
<re.Match object; span=(18, 21), match=' am'>
<re.Match object; span=(29, 32), match='Sam'>
<re.Match object; span=(46, 49), match='Sam'>
<re.Match object; span=(79, 82), match='Sam'>
<re.Match object; span=(147, 150), match='Sam'>


In [67]:
url = '''  
http://waketech.edu  
https://www.google.com  
http://coreyms.com  
https://youtube.com  
https://www.nasa.gov  
'''  
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')  
matches = pattern.finditer(url)  
for match in matches:      
    print(match.group(2))


waketech
google
coreyms
youtube
nasa
