# <center>RegEx in Python</center>

<img src="images/memes/meme13.jpg" height=400 width=400>

# Greedy Behaviour

Let's consider an example.

In [1]:
import re

In [2]:
txt = """<html><head><title>Title</title>"""

In [3]:
# Actual design of the text

txt

'<html><head><title>Title</title>'

In [4]:
# Creating a pattern to find all the parts written inside the "<>"
# So here we are using ".*"

pattern = re.compile("<.*>")

In [5]:
pattern.findall(txt)

['<html><head><title>Title</title>']

In [6]:
from utils import highlight_regex_matches
highlight_regex_matches(pattern, txt)

[43m[1m<html><head><title>Title</title>[0m


In above example, one may expect to get 4 matches, i.e. `<html>`, `<head>`, `<title>` and `</title>`. Instead, we get the longest match, i.e. `<html><head><title>Title</title>`.

This particular behaviour (to find longest match) is called **greedy** behaviour.

> The greedy behavior of the quantifiers is applied by default in the quantifiers. A greedy quantifier will try to match as much as possible to have the biggest match result possible.

# Non-Greedy behaviour

The **non-greedy** (or **reluctant**) behaviour can be requested by adding an extra question mark to the quantifier.

For example, `??`, `*?` or `+?`. 

> A quantifier marked as reluctant will behave like the exact opposite of the greedy ones. They will try to have the smallest match possible.

In [7]:
# Now we want to see only the html tags

pattern = re.compile("<.*?>")

In [8]:
pattern.findall(txt)

['<html>', '<head>', '<title>', '</title>']

In [9]:
highlight_regex_matches(pattern, txt)

[43m[1m<html>[0m[43m[1m<head>[0m[43m[1m<title>[0mTitle[43m[1m</title>[0m


![](images/memes/meme14.jpg)