## -----------------------------WHY "?"----------------------------
# 1. Matching the Shortest Possible String:

When you want to match the shortest possible string between two specific patterns, you can use *?. For example:
pattern = r'<.*?>': This pattern matches HTML tags but does so in a non-greedy way, ensuring that it matches the shortest possible tag.

In [5]:
import re
html_text = "<div>This is <b>bold</b> and <i>italic</i></div>"
pattern = r'<.*?>'
matches = re.findall(pattern, html_text)
print(matches)


['<div>', '<b>', '</b>', '<i>', '</i>', '</div>']


In [4]:
# Without ? ( take all things from the first < to the last > )
import re
html_text = "<div>This is <b>bold</b> and <i>italic</i></div>"
pattern = r'<.*>'
matches = re.findall(pattern, html_text)
print(matches)

['<div>This is <b>bold</b> and <i>italic</i></div>']


# 2- Avoiding Over-Matching: 
If you have a pattern that could potentially match more than you intend, you can use *? to avoid over-matching. For example: pattern = r'a.*?b': This pattern matches anything starting with 'a' and ending with 'b', but it will match the shortest string possible. Without *?, it would match the longest string starting with 'a' and ending with 'b'.

In [1]:
import re
text = "aabababb"
pattern = r'a.*?b'
matches = re.findall(pattern, text)
print(matches)

['aab', 'ab', 'ab']


In [2]:
import re
text = "aabababb"
pattern = r'a.*b'
matches = re.findall(pattern, text)
print(matches)

['aabababb']


# 3- Matching Optional Patterns:
 When you have an optional pattern that may or may not be present, and you want to match it if it exists, you can use *?. For example:pattern = r'a.*?b': This pattern matches 'a' followed by any characters (zero or more), then 'b'. The *? ensures that if 'b' is not present, it still matches the 'a' part.

In [3]:
import re
text = "This is a sample text with optional pattern"
pattern = r'sample.*?pattern'
matches = re.findall(pattern, text)
print(matches)


['sample text with optional pattern']


# 4- Capturing the Smallest Group: 

When capturing groups in regular expressions, *? can be used to capture the smallest possible group. For example:pattern = r'(.*?)(\d+)': This pattern captures the smallest group of characters followed by digits. Without *?, it would capture the longest group of characters possible followed by digits.

In [4]:
import re
text = "abc123def456ghi789"
pattern = r'(.*?)(\d+)'
matches = re.findall(pattern, text)
print(matches)


[('abc', '123'), ('def', '456'), ('ghi', '789')]
