# <center>RegEx in Python</center>

![](images/memes/meme25.jpg)

# Backreferencing

**Backreferences** in a pattern allow you to specify that the contents of an earlier capturing group must also be found at the current location in the string. 

> For example, `\1` will succeed if the exact contents of group `1` can be found at the current position, and fails otherwise.

### Example 1

Consider a scenario where we want to find all the duplicated words in the given text.

In [1]:
import re

In [2]:
txt = """
hello hello
how are you
bye bye
"""

In [3]:
pattern = re.compile("(\w+) \\1")

In [4]:
pattern.findall(txt)

['hello', 'bye']

> Since Python’s string literals also use a **backslash followed by numbers** to allow including arbitrary characters in a string, backreferences need to be **escaped** so that regex engine gets proper format. We can also use **raw strings** to ignore escaping.

Here is an example using raw strings.

In [5]:
pattern = re.compile(r"(\w+) \1")

In [6]:
pattern.findall(txt)

['hello', 'bye']

### Example 2

Consider a scenario where we want to find all dates with the format `dd/mm/yyy` and change them to `yyyy-mm-dd` format. 

In [7]:
txt = """
today is 23/02/2019.
yesterday was 22/02/2019.
tomorrow is 24/02/2019.
"""

In [8]:
pattern = re.compile("(\d{2})\/(\d{2})\/(\d{4})")

In [9]:
newtxt = pattern.sub(r"\3-\2-\1", txt)

In [10]:
print(newtxt)


today is 2019-02-23.
yesterday was 2019-02-22.
tomorrow is 2019-02-24.



> Backreferences, too, cannot be used inside a character class. The `\1` in a regex like `(a)[\1b]` is either an error or a needlessly escaped literal 1. 

![](images/memes/meme26.jpg)