<h3>More methods from the <font color = blue>re</font> module</h3>

In this notebook you will be introduced to the `finditer()`, `group()` and `sub()` function.

1. The `finditer()` returns an iterator over all matches for the regex pattern in the string. Note that an iterator object can be only used once.
    
2. The `group()` method of a match object can be used to selectively extract different portions of a regex pattern.  
   a. The `group()` method accepts one argument whose value ranges between 0 and the number of groups in the regex pattern.  
   b. The default value of 0 means that the complete match is extracted.  
   c. You can also name  groups by using the syntax shown below
        `?P<groupname>`
      Inside the parenthses for a group prefix the group with the name you wish to give it
3.   The `sub()` function replaces the matches with the text of your choice.      

In [1]:
import re

Write a regex to extract each word.

In [2]:
string = 'John has 6 cats but I think my friend Susan has 3 dogs and Mike has 8 fishes'
print(re.findall('[A-Za-z]+',string))

['John', 'has', 'cats', 'but', 'I', 'think', 'my', 'friend', 'Susan', 'has', 'dogs', 'and', 'Mike', 'has', 'fishes']


Write a regex to extract all numbers.

In [3]:
string = 'John has 6 cats but I think my friend Susan has 13 dogs and Mike has 8 fishes'
print(re.findall('\d+',string))

['6', '13', '8']


Write a regex to extract the phrases  
`John has 6 cats`, `Susan has 3 dogs`, `Mike has 8 fishes`

In [4]:
string = 'John h_as 6 cats but I think my friend Susan has 13 dogs and Mike has 8 fishes'
print(re.findall('\w+ \d+ \w+', string)) # Note the spaces inside the regex

['h_as 6 cats', 'has 13 dogs', 'has 8 fishes']


You can extract different elements separately by creating groups.  
Groups can be created by enclosing consecutive elements of interest inside a pair of parentheses.  
Use `finditer()` and `group()` to extract and print the matches of the regex pattern in the example above.

In [8]:
string = 'John has 6 cats but I think my friend Susan has 3 dogs and Mike has 8 fishes'
results = re.finditer('[A-Za-z]+ \d+ \w+', string)
for match in results:
    print(match.group(0))

has 6 cats
has 3 dogs
has 8 fishes


Extract and print only the names of the pet owners.

In [5]:
string = 'John has 6 cats but I think my friend Susan has 3 dogs and Mike has 8 fishes'
results = re.finditer('([A-Za-z]+) \w+ (\d+) (\w+)', string)

for match in results:
    print(match.group(0))

John has 6 cats
Susan has 3 dogs
Mike has 8 fishes


In [9]:
string = 'John has 6 cats but I think my friend Susan has 3 dogs and Mike has 8 fishes'
results = re.finditer('([A-Za-z]+) \w+ (\d+) (\w+)', string)

for match in results:
    print(match.group(0))

John has 6 cats
Susan has 3 dogs
Mike has 8 fishes


In [20]:
string = 'John has 6 cats but I think my friend Susan has 3 dogs and Mike has 8 fishes'
results = re.finditer('([A-Za-z]+) \w+ (\d+) (\w+)', string)

for match in results:
    print(match.group(1))

John
Susan
Mike


Extract and first print the names of the pet owners in one line and then print the number of pets owned in the next line.

_Note:_  An iterator object can be used only once.  So, to print out the second line, you will need to create another iterator object.

In [19]:
string = 'John has 6 cats but I think my friend Susan has 3 dogs and Mike has 8 fishes'
results = re.finditer('([A-Za-z]+) \w+ (\d+) (\w+)', string)

for match in results:
    print(match.group(1), end=' ')
print()   

results = re.finditer('([A-Za-z]+) \w+ (\d+) \w+', string)
for match in results:
    print(match.group(2), end=' ')
print()


John Susan Mike 
6 3 8 


In [22]:
string = 'John has 6 cats but I think my friend Susan has 3 dogs and Mike has 8 fishes'
results = re.finditer('([A-Za-z]+) \w+ (\d+) (\w+)', string)

for match in results:
    print(match.group(1), end=' ')
print()   

#results = re.finditer('([A-Za-z]+) \w+ (\d+) \w+', string)
for match in results:
    print(match.group(2), end=' ')
print()

John Susan Mike 



Extract and print the names of the pet owners.  
Use a group name of _Owner_.  
You can name groups by using the syntax shown below.  
`?P<groupname>`  
Inside the parenthses for a group prefix the group with the name you wish to give it

In [23]:
string = 'John has 6 cats but I think my friend Susan has 3 dogs and Mike has 8 fishes'
results = re.finditer('(?P<Owner>[A-Za-z]+) \w+ (?P<NumOfPets>\d+) (?P<Pet>\w+)', string)

for match in results:
    print(match.group('Pet'))

cats
dogs
fishes


The `sub()` function replaces the matches with the text of your choice.  
The syntax of the `sub()` function is as follows:  
`re.sub(string to replace, string to replace with, input string)`

In [24]:
string = "The rain in Spain"
new_string = re.sub("\s", "*", string) #replace white space charater with a *
print(new_string)

The*rain*in*Spain


The `sub()` function replaces the matches with the text of your choice.  You can add an additional argument to specify the number of replacements

In [25]:
string = "The rain in Spain"
new_string = re.sub("\s", "*", string, 5)
print(new_string)

The*rain*in*Spain


In [9]:
string = "The rain in Spain"
new_string = re.sub("\s", "*", string, 2)
print(new_string)

The*rain*in Spain
