# <center>RegEx in Python</center>

![](images/memes/meme7.jpg)

# The Backslash Plague

Let's start with an example.

Consider a text containing some Windows style directory addresses in which we have to find `C:\Windows\System32` substring.

In [1]:
import re

In [2]:
txt = """
C:\Windows
C:\Python
C:\Windows\System32
"""

In [3]:
pattern = re.compile("C:\Windows\System32")

In [4]:
pattern.search(txt)

### Why are no matches found for above pattern?

Regex Engine is treateing `\` as metacharacters, whereas we intend to treat it like a literal.

### Solution???

We need to escape the metacharacters. A metacharacter can be escaped by putting a `\` before it.

In [5]:
pattern = re.compile("C:\\Windows\\System32")

In [6]:
pattern.search(txt)

In [7]:
print("C:\\Windows\\System32")

C:\Windows\System32


### Still no match found. Why???

`\` is used as an escape at two different levels. 

- First, the Python interpreter itself performs substitutions for `\` before the `re` module ever sees the pattern string. For instance, `\n` is converted to a newline character, `\t` is converted to a tab character, etc. 

- Finally, `re` reads the substituted pattern string and will apply its own substitutions for `\` character. 

Hence, to use `\` as a **literal**, we first escape `\` with `\\` for python interpreter and then escape `\\` as `\\\\` for regex engine.

In [8]:
pattern = re.compile("C:\\\\Windows\\\\System32")

In [9]:
pattern.search(txt)

<_sre.SRE_Match object; span=(22, 41), match='C:\\Windows\\System32'>

### Can we use 2 backslashes instead of 4 here?

Yes. By using **raw-strings**, we do not need to put escapes at first level. 

> Python raw strings are represented as ***r"your string"***. In raw strings, no escaping is required as escape sequences like `\n`, `\t`, etc are not processed.

In [10]:
pattern = re.compile(r"C:\\Windows\\System32")

In [11]:
pattern.search(txt)

<_sre.SRE_Match object; span=(22, 41), match='C:\\Windows\\System32'>

### Do we really need to use 2 backslashes?

If you are **not using any metacharacters** in your regex pattern, you can use `re.escape()` method to escape all the characters in pattern except ASCII letters, numbers and '_'.

In [12]:
re.escape("C:\Windows\System32")

'C\\:\\\\Windows\\\\System32'

In [13]:
re.search(re.escape("C:\Windows\System32"), txt)

<_sre.SRE_Match object; span=(22, 41), match='C:\\Windows\\System32'>

![](images/memes/meme8.jpg)