## Regular Expression

In [1]:
import re
import warnings
warnings.filterwarnings(action="ignore")

<center>Cheat Sheet</center>

**Anchors**
- `\A` : Matches only at the beginning of the string
- `\Z` : Matches only at the end of the string.
- `^` :  Matches at the beginning of a line.
- `$` : Matches at the end of a line.
- `\n` : Matches a newline character.
- `re.MULTILINE` or `re.M` : Makes ^ and $ match the start and end of each line (not just the start and end of the string). 
- `\b` : Matches the boundary between a word and a non-word character.
- `\B` :  Matches positions where `\b` does not match (inside words, or between non-word characters).

In [23]:
# Examples 

# \A
print(re.search("\AHello", string="Hello World"))

# Doesnot consider the line break, begining or not. 
print(re.search("\AWorld", string="Hello\nWorld")) # Not Found
print(re.search("\AWorld", string="Hello\nWorld", flags=re.MULTILINE)) # Still not Found


# \Z
print(re.search(r'World!\Z', 'Hello World!'))

print(re.search(r'World!\Z', 'Hello\nWorld!')) # Match Found


# ^
print(re.search("^Hello", string="Hello World"))
print(re.search("^World", string="Hello\nWorld")) # No Match
print(re.search("^World", string="Hello\nWorld", flags=re.MULTILINE)) # Found


# $ 
print(re.search(r'World!$', 'Hello\nWorld!'))
print(re.search(r'Hello$', 'Hello\nWorld!')) # No Match
print(re.search(r'Hello$', 'Hello\nWorld!', re.MULTILINE)) # Found


# \b
print(re.search(r'\bHello\b', 'Hello World!')) # Match Found
print(re.search(r'\bHello\b', 'HelloWorld!')) # No Match

# \B
# Matches "word" only if it is not at a word boundary
print(re.search(r'\Bword\B', 'swordfish'))            # Match found
print(re.search(r'\Bword\B', 'a word in a sentence')) # No match
print(re.search(r'\Bll\B', 'Hello'))                  # Match found



<re.Match object; span=(0, 5), match='Hello'>
None
None
<re.Match object; span=(6, 12), match='World!'>
<re.Match object; span=(6, 12), match='World!'>
<re.Match object; span=(0, 5), match='Hello'>
None
<re.Match object; span=(6, 11), match='World'>
<re.Match object; span=(6, 12), match='World!'>
None
<re.Match object; span=(0, 5), match='Hello'>
<re.Match object; span=(0, 5), match='Hello'>
None
<re.Match object; span=(1, 5), match='word'>
None
<re.Match object; span=(2, 4), match='ll'>


**Features**

- `|` : Conditional OR, Combines multiple regular expressions as alternatives.Each alternative can have independent anchors.
- `(pat)` : Capturing Group, Groups a pattern or patterns. Also captures the matched substring for back-references.
- `(?:pat)` : Non-Capturing Group, Groups a pattern or patterns without capturing the matched substring. 
- `?P<name>pat` :  Named Capture Group, Groups a pattern and assigns a name to the captured substring. 
- `.` :  Matches any single character except the newline character.
- `[]`: Matches one character among many specified inside the brackets 

In [33]:

# | 
print(re.search(r'cat|dog', 'I have a cat'))
print(re.search(r'cat|dog', 'I have a fish') ) # No Match

# (pat)
match = re.search(r'(Hello) (World)', 'Hello World')
print(match.group(1))  # Outputs: Hello
print(match.group(2))  # Outputs: World

text = "Hello\nWorld"
match = re.search(r"(^H\w+)\n(^W\w+)", text, re.MULTILINE)
print(match.group(1))  # Outputs: Hello
print(match.group(2))  # Outputs: World


# (?:pat)
# capture chai gar but use chai na gar. 
match = re.search(r'(?:Hello) (World)', 'Hello World')
print(match.group(0))  # Outputs: Hello World
print(match.groups())  # Outputs: ('World',)


# . 
print(re.search(r'H.llo', 'H\nllo')) # No match
print(re.search(r'H.llo', 'Hillo'))  # Match found

# []
print(re.search(r'[aeiou]', 'Hello'))  # Match found
print(re.search(r'[aeiou]', 'Sky'))    # No match
print(re.search(r'[a-z]', '123abc'))   # Match found

<re.Match object; span=(9, 12), match='cat'>
None
Hello
World
Hello
World
Hello World
('World',)
None
<re.Match object; span=(0, 5), match='Hillo'>
<re.Match object; span=(1, 2), match='e'>
None
<re.Match object; span=(3, 4), match='a'>
