# Regular Expressions

## Methods search() and sub()

### The re.search() function performs regular expression-based string searching.
### The re.sub() function performs regular expression-based string substitutions.

#### 1. Does the string "Space" contain a, b, or c? Yes, it contains a and c.

In [None]:
import re
re.search('[abc]', 'Space')

In [None]:
re.search('[ate]', 'h The The Space')

#### 2. OK, now find a, b, or c, and replace it with o. Space becomes Spooe.

In [None]:
re.sub('[abc]', 'o', 'Space')

#### 3. Let's take the output and use it as an input to replace a, e, or u with n.
#### 4. As a result, the Space turned into a Spoon.

In [None]:
re.sub('[abc]', 'o', 'Space')

In [None]:
re.sub('[aeu]', 'n', 'Spooe')

In [None]:
re.sub('[aeu]', 'n', re.sub('[abc]', 'o', 'Space'))

## Practice 

In [None]:
import re
re.search('[c]y$', 'emergency')

In [None]:
import re
re.search('[c]y$', 'fancy')

In [None]:
re.search('[^c]y$', 'emergensy')

In [None]:
re.search('[^aeiou]y$', 'emergency')

In [None]:
re.search('[aeiou]y$', 'toy')

## Another example

In [None]:
re.sub('y$', 'ies', 'emergency')

In [None]:
re.sub('y$', 'ies', 'semitransparency y')

#### It is possible to combine these two regular expressions (one to find out if the rule applies, and another to actually apply it) into a single regular expression.
#### Here's what that would look like. We're using a remembered group. The group is used to remember the character before the letter y. Then in the substitution string, we use a new syntax, \1, which means, that first group we remembered? put it right here. 
#### In this case, we remember the c before the y; when we do the substitution, we substitute c in place of c, and ies in place of y. (If we have more than one remembered group, we can use \2 and \3 and so on.)

In [None]:
re.sub('([c])y$', r'\1ies', 'emergency')

In [None]:
re.sub('([^aeiou])y$', r'\1ies', 'emergency')

## Example 1: IP address substitution in Python script

In [4]:
import re

ipaddress_old = 'The IPs are 173.254.28.78 and 167.81.178.97'

pattern = re.compile(r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b')

ipaddress_new = '127.0.0.1'

replaced = re.sub(pattern, ipaddress_new, ipaddress_old)

print('replaced = %s' %(replaced))

replaced = The IPs are 127.0.0.1 and 127.0.0.1


## Example 2: Pluralize nouns with regular expressions
#### Will learn some of the regular expressions while doing pluralization of nouns.

In [7]:
def pluralize(noun):          
    if re.search('[sxz]$', noun):     
        return re.sub('$', 'es', noun)
    
    elif re.search('[^aeioudgkprt]h$', noun):
        return re.sub('$', 'es', noun)   
    
    elif re.search('[^aeiou]y$', noun):      
        return re.sub('y$', 'ies', noun) 
    
    else:
        return noun + 's'
pluralize('husky')

'huskies'

### The four branches of if statements are the implementation of the following four rules of pluralization

#### If a word ends in s, x, or z, add es. Example: Bass becomes basses, fax becomes faxes, and waltz becomes waltzes.
#### If a word ends in a noisy h, add es; if it ends in a silent h, just add s. What's a noisy h? One that gets combined with other letters to make a sound that we can hear. So coach becomes coaches and rash becomes rashes, because we can hear the ch and sh sounds when we say them. But cheetah becomes cheetahs, because the h is silent.
#### If a word ends in y that sounds like i, change the y to ies; if the y is combined with a vowel to sound like something else, just add s. So vacancy becomes vacancies, but day becomes days.
#### If all else fails, just add s.


## Explanation of 1st if condition:
The square brackets [] mean match exactly one of these characters. So [sxz] means s, or x, or z, but only one of them. The $ matches the end of string. Combined, this regular expression tests whether noun ends with s, x, or z.

Here, we're replacing the end of the string matched by $ with the string es. In other words, adding es to the string. We could accomplish the same thing with string concatenation, for example noun + 'es', but we opted to use regular expressions for each rule.

## Explanation of 2nd if condition:
This is another new variation. The ^ as the first character inside the square brackets means something special: negation. [^abc] means any single character except a, b, or c. So [^aeioudgkprt] means any character except a, e, i, o, u, d, g, k, p, r, or t. Then that character needs to be followed by h, followed by end of string. We're looking for words that end in h where the h can be heard.

## Explanation of 3rd if condition:
Same pattern here: match words that end in y, where the character before the y is not a, e, i, o, or u. We're looking for words that end in y that sounds like i.
