# Substitution

Now, we are going to look at a method which will replace all the **leftmost non-overlapping occurrences** of a pattern in a given string and return the new string as result.

### `sub(repl, string[, count=0])`

- `repl` is the replacement string which gets substituted in the place of match

- `string` is the input text on which substitution takes place.

- `count` is an optional argument (default is 0) which specifies the max no. of substitutions that can take place.  0 means there is no limit on substitution count.


Let us consider a case where we want to replace all occurances of numbers with a `-` in the given text.

In [1]:
import re

In [2]:
txt = "100 cats, 23 dogs, 3 rabbits"

In [3]:
pattern = re.compile("\d+")

In [4]:
pattern.sub("-", txt)

'- cats, - dogs, - rabbits'

### `subn(repl, string[, count=0])`

- Returns the substituted string as well as the no. of substitutions.

- Can be thought of as a utility function over `sub()`.

In [5]:
pattern.subn("-", txt)

('- cats, - dogs, - rabbits', 3)

In [6]:
pattern.subn("-", txt)[1]

3

![](images/memes/meme20.jpg)

In [7]:
pattern.subn("-", txt)[0]

'- cats, - dogs, - rabbits'

In [1]:
import re

In [2]:
string ="""U.S. stock-index futures pointed
to a solidly higher open on Monday, 
indicating that major 
benchmarks were poised to USA reboundfrom last week’s sharp decline, 
\nwhich represented their biggest weekly drops in months."""

In [3]:
print(re.sub('U.S.|US|USA', 'United States ', string ))

United States  stock-index futures pointed
to a solidly higher open on Monday, 
indicating that major 
benchmarks were poised to United States A reboundfrom last week’s sharp decline, 

which represented their biggest weekly drops in months.


# Using Functions with Sub

In [4]:
#brief explanation of lambda
def square(x):
    return (x ** 2)

square(3)


9

In [5]:
square = lambda x: x**2     
square(3)

9

In [6]:
string = 'Dan has 3 snails. Mike has 4 cats. Alisa has 9 monkeys.'

In [7]:
re.search('(\d+)', string).group()

'3'

In [8]:
re.findall('(\d+)', string)

['3', '4', '9']

In [9]:
re.search('(\d+)', string)

<re.Match object; span=(8, 9), match='3'>

In [10]:
re.sub('(\d+)', '1', string)  #find all instances like findall

'Dan has 1 snails. Mike has 1 cats. Alisa has 1 monkeys.'

In [11]:
#In this example we change the 
re.sub('(\d+)', lambda x: str(square(int(x.group(0)))), string)

'Dan has 9 snails. Mike has 16 cats. Alisa has 81 monkeys.'

In [12]:
re.sub('(\d+)', lambda x: str(x), string)

"Dan has <re.Match object; span=(8, 9), match='3'> snails. Mike has <re.Match object; span=(27, 28), match='4'> cats. Alisa has <re.Match object; span=(45, 46), match='9'> monkeys."

 step 1   lambda x: x.group   x is match object
#step 2   turn the result into int
#step 3   Use Square function
#step 4   turn back to string

In [15]:
re.sub('(\d+)', lambda x: (x.group(0)), string)

'Dan has 3 snails. Mike has 4 cats. Alisa has 9 monkeys.'

In [16]:
#m = match object
import re

# The input string.
input = "eat laugh sleep study"

# Use lambda to add "ing" to all words.
result = re.sub("\w+", lambda m: m.group() + "ing", input)

# Display result.
print(result)

eating laughing sleeping studying


# backreferencing with subs

In [18]:
string = 'Merry Merry Christmas'

In [19]:
re.search(r'(\w+ )(\1)', string).groups()

('Merry ', 'Merry ')

In [20]:
re.search(r'(\w+ )(\1)', string).group(1,2)

('Merry ', 'Merry ')

In [22]:
#backreferencing example with sub
re.sub(r'(\w+) (\1)',r'Happy \1', string)   # \1 = Merry

'Happy Merry Christmas'

In [23]:
re.sub(r'(\w+) (\1)',r'\1 Happy', string)    #Merry Happy

'Merry Happy Christmas'

In [24]:
re.sub(r'(\w+) (\1)',r'Happy \2', string)

'Happy Merry Christmas'