# Introduction to Regex
A regular expression (regex) is a sequence of characters that define a search pattern.
Use cases include validation, searching, and extracting text.

In [100]:
#importing necessary liabrary
import re

### Concept of escape sequence

Escape sequences are combinations of characters used to represent characters that cannot be typed or displayed directly. They are preceded by a backslash (\) and are interpreted specially by the language or system. Escape sequences are commonly used in programming languages, text editors, and command-line interfaces to represent special characters, control characters, and non-printable characters.


In Python and many other programming languages, escape sequences are used in string literals to represent characters such as newline (\n), tab (\t), backspace (\b), and others.


Here are some common escape sequences used in Python:

\n: Newline - Moves the cursor to the beginning of the next line.

\t: Tab - Inserts a tab character.

\r: Carriage Return - Moves the cursor to the beginning of the current line.

\b: Backspace - Removes the previous character.

\\: Backslash - Inserts a literal backslash.

\': Single Quote - Inserts a literal single quote (useful in single-quoted strings).

\": Double Quote - Inserts a literal double quote (useful in double-quoted strings).

\xhh: Hexadecimal Escape - Inserts the character with the hexadecimal ASCII value hh.

\uhhhh or \Uhhhhhhhh: Unicode Escape - Inserts the Unicode character with the hexadecimal code point hhhh or hhhhhhhh.


In [182]:
print("\\")

\


In [186]:
print("I'D")

I'D


In [194]:
print('\t\t\tTab')

			Tab


In [105]:
print('Name\tAge\tLocation')
print('Alice\t30\tNew York')
print('Bob\t25\tLos Angeles')


Name	Age	Location
Alice	30	New York
Bob	25	Los Angeles


In [112]:
print("This is a backslash: \\")

This is a backslash: \


In [114]:
print('It\'s a beautiful day')

It's a beautiful day


In [117]:
print("She said, \"Hello!\"")

She said, "Hello!"


In [118]:
print("Hello\nWorld")

Hello
World


In [195]:
print("Hello\rWorld\rvishal")

vishal


In [123]:
print("Hello\bWorld")

HelloWorld


In [132]:
print("\x44\x42\x43")

DBC


FOR UNICODE HEXADECIMAL CODE USE THESE:
https://www.rapidtables.com/code/text/unicode-characters.html

In [165]:
print("\U+2192")


SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \UXXXXXXXX escape (2257064305.py, line 1)

In [166]:
print("\U00002764")


❤


### Types of strings:

In [169]:
unicode_string = "Café"  # Unicode string
print(unicode_string)

Café


In [170]:
byte_string = b'binary data'
print(byte_string)

b'binary data'


In [172]:
name = "Alice"
age = 30
formatted_string = f"Name: {name}, Age: {age}"
print(formatted_string)

Name: Alice, Age: 30


In [174]:
byte_array = bytearray(b'hello')
print(byte_array)

bytearray(b'hello')


In [175]:
mutable_string = list("hello")
mutable_string[0] = 'H'  # Modify the first character
mutable_string = ''.join(mutable_string)  # Convert back to string
print(mutable_string)

Hello


In [167]:
single_quoted = 'Hello'
double_quoted = "World"
triple_quoted = '''This is a 
                  multi-line string'''

In [178]:
empty_string = ''
print(empty_string)




In [179]:
print(r'C:\Users\Name') # raw stirng

C:\Users\Name


In [7]:
#example of raw string -  r is used to denote raw string 
print(r'\tTab')

\tTab


In Python, re.compile() is a function provided by the re module, which is used for working with regular expressions (regex). The re.compile() function compiles a regular expression pattern into a regex object, which can then be used for various operations such as searching, matching, and replacing text within strings.


Here's how re.compile() works:


Compilation: The re.compile() function takes a regex pattern as its first argument and compiles it into a regex object.


Regex Object: The compiled regex object represents the regex pattern, and it can be reused for multiple string operations without having to recompile the pattern each time.


Efficiency: Compiling the regex pattern with re.compile() can improve the performance of regex operations, especially if the same pattern is used multiple times in a program.


Flags: Optional flags can be provided as a second argument to re.compile() to modify the behavior of the regex pattern, such as case-insensitive matching, multiline matching, and others.


Usage: Once the regex object is created with re.compile(), it can be used with methods like search(), match(), findall(), finditer(), and sub() to perform various string operations based on the regex pattern.


In [10]:
text_to_search = '''
abcdefghijklmnopqurtuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890

Ha HaHa

MetaCharacters (Need to be escaped):
. ^ $ * + ? { } [ ] \ | ( )

havinsoh.com

321-555-4321
123.555.1234
123*555*1234
800-555-1234
900-555-1234

Mr. Vishal
Mr Smith
Ms Davis
Mrs. Robinson
Mr. T

cat
mat
bat
pat
flat
fat
'''

.       - Any Character Except New Line

\d      - Digit (0-9)

\D      - Not a Digit (0-9)

\w      - Word Character (a-z, A-Z, 0-9, _)

\W      - Not a Word Character

\s      - Whitespace (space, tab, newline)

\S      - Not Whitespace (space, tab, newline)


\b      - Word Boundary

\B      - Not a Word Boundary

^       - Beginning of a String

$       - End of a String


[]      - Matches Characters in brackets

[^ ]    - Matches Characters NOT in brackets

|       - Either Or

( )     - Group


Quantifiers:

*       - 0 or More

+       - 1 or More

?       - 0 or One

{3}     - Exact Number

{3,4}   - Range of Numbers (Minimum, Maximum)

In [211]:
pattern = re.compile(r'abc')
matches = pattern.finditer(text_to_search)
print(matches)
for match in matches:
    print(match)

<callable_iterator object at 0x7f9b4a7f9180>
<re.Match object; span=(1, 4), match='abc'>


In [212]:
print(text_to_search[1:4])

abc


In [214]:
pattern = re.compile(r'.')
matches = pattern.finditer(text_to_search)
for i in matches:
    print(i)

<re.Match object; span=(1, 2), match='a'>
<re.Match object; span=(2, 3), match='b'>
<re.Match object; span=(3, 4), match='c'>
<re.Match object; span=(4, 5), match='d'>
<re.Match object; span=(5, 6), match='e'>
<re.Match object; span=(6, 7), match='f'>
<re.Match object; span=(7, 8), match='g'>
<re.Match object; span=(8, 9), match='h'>
<re.Match object; span=(9, 10), match='i'>
<re.Match object; span=(10, 11), match='j'>
<re.Match object; span=(11, 12), match='k'>
<re.Match object; span=(12, 13), match='l'>
<re.Match object; span=(13, 14), match='m'>
<re.Match object; span=(14, 15), match='n'>
<re.Match object; span=(15, 16), match='o'>
<re.Match object; span=(16, 17), match='p'>
<re.Match object; span=(17, 18), match='q'>
<re.Match object; span=(18, 19), match='u'>
<re.Match object; span=(19, 20), match='r'>
<re.Match object; span=(20, 21), match='t'>
<re.Match object; span=(21, 22), match='u'>
<re.Match object; span=(22, 23), match='v'>
<re.Match object; span=(23, 24), match='w'>
<re.M

In [221]:
text = 'vishal\nsingh\n   \t\tsangral'

In [222]:
pattern = re.compile(r'\s')
matches = pattern.finditer(text)
for match in matches:
    print(match)

<re.Match object; span=(6, 7), match='\n'>
<re.Match object; span=(12, 13), match='\n'>
<re.Match object; span=(13, 14), match=' '>
<re.Match object; span=(14, 15), match=' '>
<re.Match object; span=(15, 16), match=' '>
<re.Match object; span=(16, 17), match='\t'>
<re.Match object; span=(17, 18), match='\t'>


In [223]:
pattern = re.compile(r'\n')
matches = pattern.finditer(text)
for match in matches:
    print(match)

<re.Match object; span=(6, 7), match='\n'>
<re.Match object; span=(12, 13), match='\n'>


In [224]:
pattern = re.compile(r' ')
matches = pattern.finditer(text)
for match in matches:
    print(match)

<re.Match object; span=(13, 14), match=' '>
<re.Match object; span=(14, 15), match=' '>
<re.Match object; span=(15, 16), match=' '>


In [225]:
pattern = re.compile(r'\.')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(113, 114), match='.'>
<re.Match object; span=(150, 151), match='.'>
<re.Match object; span=(172, 173), match='.'>
<re.Match object; span=(176, 177), match='.'>
<re.Match object; span=(224, 225), match='.'>
<re.Match object; span=(254, 255), match='.'>
<re.Match object; span=(267, 268), match='.'>


In [17]:
#CONCEPT OF WORD BOUNDARY 
pattern = re.compile(r'\bHa')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(113, 114), match='.'>
<re.Match object; span=(149, 150), match='.'>
<re.Match object; span=(171, 172), match='.'>
<re.Match object; span=(175, 176), match='.'>
<re.Match object; span=(223, 224), match='.'>
<re.Match object; span=(254, 255), match='.'>
<re.Match object; span=(267, 268), match='.'>


In [234]:
#CONCEPT OF NO WORD BOUNDARIES
pattern = re.compile(r'\BHa')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(72, 74), match='Ha'>


In [235]:
##ONE MORE EXAMPLE
text = 'vishal singh Sa SaSa Sa'

In [236]:
#CONCEPT OF WORD BOUNDARY 
pattern = re.compile(r'\bSa')
matches = pattern.finditer(text)
for match in matches:
    print(match)

<re.Match object; span=(13, 15), match='Sa'>
<re.Match object; span=(16, 18), match='Sa'>
<re.Match object; span=(21, 23), match='Sa'>


In [237]:
#CONCEPT OF NO WORD BOUNDARIES
pattern = re.compile(r'\BSa')
matches = pattern.finditer(text)
for match in matches:
    print(match)

<re.Match object; span=(18, 20), match='Sa'>


In [3]:
sentence = 'Start a sentence and then bring it to an end'

In [5]:
#concept of carrot operator
pattern = re.compile(r'^Start')
matches = pattern.finditer(sentence)
for match in matches:
    print(match)

<re.Match object; span=(0, 5), match='Start'>


In [6]:
pattern = re.compile(r'^a')
matches = pattern.finditer(sentence)
for match in matches:
    print(match)

In [4]:
import re
pattern = re.compile(r'a')
matches = pattern.finditer(sentence)
for match in matches:
    print(match)

<re.Match object; span=(2, 3), match='a'>
<re.Match object; span=(6, 7), match='a'>
<re.Match object; span=(17, 18), match='a'>
<re.Match object; span=(38, 39), match='a'>


In [7]:
pattern = re.compile(r'end$')
matches = pattern.finditer(sentence)
for match in matches:
    print(match)

<re.Match object; span=(41, 44), match='end'>


In [11]:
pattern = re.compile(r'\d\d\d[*]\d\d\d[*]\d\d\d\d')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(182, 194), match='123*555*1234'>


In [12]:
pattern = re.compile(r'[89]00[-.]\d\d\d[-.]\d\d\d\d')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(195, 207), match='800-555-1234'>
<re.Match object; span=(208, 220), match='900-555-1234'>


In [15]:
pattern = re.compile(r'[4-7]')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(58, 59), match='4'>
<re.Match object; span=(59, 60), match='5'>
<re.Match object; span=(60, 61), match='6'>
<re.Match object; span=(61, 62), match='7'>
<re.Match object; span=(160, 161), match='5'>
<re.Match object; span=(161, 162), match='5'>
<re.Match object; span=(162, 163), match='5'>
<re.Match object; span=(164, 165), match='4'>
<re.Match object; span=(173, 174), match='5'>
<re.Match object; span=(174, 175), match='5'>
<re.Match object; span=(175, 176), match='5'>
<re.Match object; span=(180, 181), match='4'>
<re.Match object; span=(186, 187), match='5'>
<re.Match object; span=(187, 188), match='5'>
<re.Match object; span=(188, 189), match='5'>
<re.Match object; span=(193, 194), match='4'>
<re.Match object; span=(199, 200), match='5'>
<re.Match object; span=(200, 201), match='5'>
<re.Match object; span=(201, 202), match='5'>
<re.Match object; span=(206, 207), match='4'>
<re.Match object; span=(212, 213), match='5'>
<re.Match object; span=(213, 214), match='

In [17]:
pattern = re.compile(r'[k-zK-Z]')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(11, 12), match='k'>
<re.Match object; span=(12, 13), match='l'>
<re.Match object; span=(13, 14), match='m'>
<re.Match object; span=(14, 15), match='n'>
<re.Match object; span=(15, 16), match='o'>
<re.Match object; span=(16, 17), match='p'>
<re.Match object; span=(17, 18), match='q'>
<re.Match object; span=(18, 19), match='u'>
<re.Match object; span=(19, 20), match='r'>
<re.Match object; span=(20, 21), match='t'>
<re.Match object; span=(21, 22), match='u'>
<re.Match object; span=(22, 23), match='v'>
<re.Match object; span=(23, 24), match='w'>
<re.Match object; span=(24, 25), match='x'>
<re.Match object; span=(25, 26), match='y'>
<re.Match object; span=(26, 27), match='z'>
<re.Match object; span=(38, 39), match='K'>
<re.Match object; span=(39, 40), match='L'>
<re.Match object; span=(40, 41), match='M'>
<re.Match object; span=(41, 42), match='N'>
<re.Match object; span=(42, 43), match='O'>
<re.Match object; span=(43, 44), match='P'>
<re.Match object; span=(44, 45),

In [42]:
pattern = re.compile(r'[^a-zA-Z]')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(0, 1), match='\n'>
<re.Match object; span=(27, 28), match='\n'>
<re.Match object; span=(54, 55), match='\n'>
<re.Match object; span=(55, 56), match='1'>
<re.Match object; span=(56, 57), match='2'>
<re.Match object; span=(57, 58), match='3'>
<re.Match object; span=(58, 59), match='4'>
<re.Match object; span=(59, 60), match='5'>
<re.Match object; span=(60, 61), match='6'>
<re.Match object; span=(61, 62), match='7'>
<re.Match object; span=(62, 63), match='8'>
<re.Match object; span=(63, 64), match='9'>
<re.Match object; span=(64, 65), match='0'>
<re.Match object; span=(65, 66), match='\n'>
<re.Match object; span=(66, 67), match='\n'>
<re.Match object; span=(69, 70), match=' '>
<re.Match object; span=(74, 75), match='\n'>
<re.Match object; span=(75, 76), match='\n'>
<re.Match object; span=(90, 91), match=' '>
<re.Match object; span=(91, 92), match='('>
<re.Match object; span=(96, 97), match=' '>
<re.Match object; span=(99, 100), match=' '>
<re.Match object; span=(10

In [18]:
#here we are finding words not starting with b
pattern = re.compile(r'[^b]at')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(272, 275), match='cat'>
<re.Match object; span=(276, 279), match='mat'>
<re.Match object; span=(284, 287), match='pat'>
<re.Match object; span=(289, 292), match='lat'>
<re.Match object; span=(293, 296), match='fat'>


In [19]:
pattern = re.compile(r'\d{3}.\d{3}.\d{4}')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(156, 168), match='321-555-4321'>
<re.Match object; span=(169, 181), match='123.555.1234'>
<re.Match object; span=(182, 194), match='123*555*1234'>
<re.Match object; span=(195, 207), match='800-555-1234'>
<re.Match object; span=(208, 220), match='900-555-1234'>


In [23]:
pattern = re.compile(r'Mr\.')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(222, 225), match='Mr.'>
<re.Match object; span=(265, 268), match='Mr.'>


In [24]:
pattern = re.compile(r'Mr\.?')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(222, 225), match='Mr.'>
<re.Match object; span=(233, 235), match='Mr'>
<re.Match object; span=(251, 253), match='Mr'>
<re.Match object; span=(265, 268), match='Mr.'>


In [28]:
pattern = re.compile(r'Mr\.?\s[A-Z]\w*')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(222, 232), match='Mr. Vishal'>
<re.Match object; span=(233, 241), match='Mr Smith'>
<re.Match object; span=(265, 270), match='Mr. T'>


In [29]:
pattern = re.compile(r'(Mr|Ms|Mrs)\.?\s[A-Z]\w*')
matches = pattern.finditer(text_to_search)
for match in matches:
    print(match)

<re.Match object; span=(222, 232), match='Mr. Vishal'>
<re.Match object; span=(233, 241), match='Mr Smith'>
<re.Match object; span=(242, 250), match='Ms Davis'>
<re.Match object; span=(251, 264), match='Mrs. Robinson'>
<re.Match object; span=(265, 270), match='Mr. T'>


In [42]:
import re

# Pattern that uses a greedy quantifier '*'
greedy_pattern = re.compile(r'<.*>')
# Pattern that uses a lazy quantifier '*?'
lazy_pattern = re.compile(r'<.*?>')

# Test string
text = "<div>Hello</div><span>World</span>"

# Find all matches using greedy and lazy quantifiers
greedy_matches = greedy_pattern.findall(text)
lazy_matches = lazy_pattern.findall(text)

print(f"Greedy matches: {greedy_matches}")
print(f"Lazy matches: {lazy_matches}")

Greedy matches: ['<div>Hello</div><span>World</span>']
Lazy matches: ['<div>', '</div>', '<span>', '</span>']


In [30]:
emails = '''
VishalSsangral@gmail.com
sangral.singh@university.edu
vishal-321-sangral@my-work.net
'''

In [31]:
pattern = re.compile(r'[a-zA-Z0-9.-]+@[a-zA-Z-]+\.(com|edu|net)')
matches = pattern.finditer(emails)
for match in matches:
    print(match)

<re.Match object; span=(1, 25), match='VishalSsangral@gmail.com'>
<re.Match object; span=(26, 54), match='sangral.singh@university.edu'>
<re.Match object; span=(55, 85), match='vishal-321-sangral@my-work.net'>


In [32]:
pattern = re.compile(r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')
matches = pattern.finditer(emails)
for match in matches:
    print(match)

<re.Match object; span=(1, 25), match='VishalSsangral@gmail.com'>
<re.Match object; span=(26, 54), match='sangral.singh@university.edu'>
<re.Match object; span=(55, 85), match='vishal-321-sangral@my-work.net'>


In [33]:
urls = '''
https://www.google.com
http://havinosh.com
https://youtube.com
https://www.isro.gov
'''

In [34]:
print(urls)


https://www.google.com
http://havinosh.com
https://youtube.com
https://www.isro.gov



In [35]:
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')
matches = pattern.finditer(urls)
for match in matches:
    print(match)

<re.Match object; span=(1, 23), match='https://www.google.com'>
<re.Match object; span=(24, 43), match='http://havinosh.com'>
<re.Match object; span=(44, 63), match='https://youtube.com'>
<re.Match object; span=(64, 84), match='https://www.isro.gov'>


In [36]:
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')
matches = pattern.finditer(urls)
for match in matches:
    print(match.group(0))

https://www.google.com
http://havinosh.com
https://youtube.com
https://www.isro.gov


In [37]:
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')
matches = pattern.finditer(urls)
for match in matches:
    print(match.group(1))

www.
None
None
www.


In [38]:
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')
matches = pattern.finditer(urls)
for match in matches:
    print(match.group(2))

google
havinosh
youtube
isro


In [39]:
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')
matches = pattern.finditer(urls)
for match in matches:
    print(match.group(3))

.com
.com
.com
.gov


In [40]:
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')
subbed_urls = pattern.sub(r'\2\3',urls)
print(subbed_urls)


google.com
havinosh.com
youtube.com
isro.gov



In [41]:
pattern = re.compile(r'(Mr|Ms|Mrs)\.?\s[A-Z]\w*')
matches = pattern.findall(text_to_search)
for match in matches:
    print(match)

Mr
Mr
Ms
Mrs
Mr


### Match Method
The match method checks for a match only at the beginning of the string. If the pattern matches at the start, it returns a match object; otherwise, it returns None.

In [43]:
import re

pattern = re.compile(r'\d+')

text1 = "123abc"
text2 = "abc123"

# Using match method
match1 = pattern.match(text1)
match2 = pattern.match(text2)

print(f"Match result for text1: {match1.group() if match1 else 'No match'}")  # Output: 123
print(f"Match result for text2: {match2.group() if match2 else 'No match'}")  # Output: No match

Match result for text1: 123
Match result for text2: No match


In [98]:
#flags - re.IGNORECASE or use for re.I

In [46]:
import re

pattern = re.compile(r'\d+')

text1 = "123abc"
text2 = "abc123"

# Using match method
match1 = pattern.finditer(text1)
match2 = pattern.finditer(text2)

for i in match1:
    print(i)
    
for i in match2:
    print(i)

<re.Match object; span=(0, 3), match='123'>
<re.Match object; span=(3, 6), match='123'>


### search Method
The search method scans through the entire string and returns the first match it finds. It returns a match object if there is a match anywhere in the string, otherwise None.

In [47]:
import re

pattern = re.compile(r'\d+')

text1 = "123abc"
text2 = "abc123"

# Using search method
search1 = pattern.search(text1)
search2 = pattern.search(text2)

print(f"Search result for text1: {search1.group() if search1 else 'No match'}")  # Output: 123
print(f"Search result for text2: {search2.group() if search2 else 'No match'}")  # Output: 123

Search result for text1: 123
Search result for text2: 123


### Finditer Method
The finditer method returns an iterator yielding match objects for all non-overlapping matches of the pattern in the string.



In [48]:
import re

pattern = re.compile(r'\d+')

text = "123abc456def789"

# Using finditer method
matches = pattern.finditer(text)

for match in matches:
    print(f"Finditer match: {match.group()} at position {match.start()}-{match.end()}")

Finditer match: 123 at position 0-3
Finditer match: 456 at position 6-9
Finditer match: 789 at position 12-15


### Explanation of the Examples:
match Method:

Behavior: Checks if the pattern matches from the beginning of the string.
Example:
text1: Matches because the string starts with digits (123).
text2: Does not match because the string starts with letters (abc).
search Method:

Behavior: Searches the entire string and returns the first occurrence of the pattern.
Example:
text1: Finds 123 at the beginning.
text2: Finds 123 after the letters (abc).
finditer Method:

Behavior: Finds all non-overlapping matches and returns them as an iterator of match objects.
Example:
text: Finds three matches: 123, 456, and 789.


## Flags in regular expressions (regex) are special parameters that modify the behavior of the regex engine. They can be used to control how the pattern matching process is executed. Python's re module supports several flags that can be passed as optional parameters to regex functions. Here are some commonly used flags:

re.IGNORECASE (re.I): Makes the pattern case-insensitive, so it matches both uppercase and lowercase letters.

re.MULTILINE (re.M): Allows the ^ and $ anchors to match at the beginning and end of each line within a multiline string, rather than just at the beginning and end of the entire string.

re.DOTALL (re.S): Makes the dot (.) metacharacter match all characters, including newline characters (\n).

re.ASCII: Makes \w, \W, \b, \B, \d, \D, \s, and \S perform ASCII-only matching, disregarding any Unicode characters.

re.UNICODE (re.U): Makes \w, \W, \b, \B, \d, \D, \s, and \S perform Unicode matching.

re.VERBOSE (re.X): Allows you to write regex patterns more legibly by ignoring whitespace and comments within the pattern. Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash. Additionally, # marks the start of a comment, which lasts until the end of the line.

In [50]:
import re

# Using flags with compile method
pattern = re.compile(r'hello', re.IGNORECASE | re.MULTILINE)

# Using flags with search method
text = "Hello, World!\nHello, Universe!"
matches = pattern.findall(text)

print(matches)  # Output: ['Hello', 'Hello']


['Hello', 'Hello']


In [51]:
import re

pattern = re.compile(r'hello.*world', re.DOTALL)

text = "hello\nworld"

match = pattern.search(text)
print(f"DOTALL match: {match.group() if match else 'No match'}")


DOTALL match: hello
world


In [52]:
import re

pattern = re.compile(r'\w+', re.ASCII)

text = "café"

match = pattern.findall(text)
print(f"ASCII match: {match}")


ASCII match: ['caf']


In [53]:
import re

pattern = re.compile(r'\w+', re.UNICODE)

text = "café"

match = pattern.findall(text)
print(f"UNICODE match: {match}")


UNICODE match: ['café']


In [54]:
import re

pattern = re.compile(r'''
    \b      # Word boundary
    \d{3}   # Three digits
    -       # Hyphen
    \d{2}   # Two digits
    \b      # Word boundary
    ''', re.VERBOSE)

text = "Phone number: 123-45"

match = pattern.search(text)
print(f"VERBOSE match: {match.group() if match else 'No match'}")


VERBOSE match: 123-45


In [55]:
import re

pattern = re.compile(r'''
    ^hello      # Line starts with 'hello'
    .*          # Any character (dot) zero or more times (greedy)
    world$      # Line ends with 'world'
    ''', re.IGNORECASE | re.MULTILINE | re.DOTALL | re.VERBOSE)

text = "Hello\nworld"

match = pattern.search(text)
print(f"Combined flags match: {match.group() if match else 'No match'}")


Combined flags match: Hello
world
