In [1]:
import re

https://docs.python.org/3/library/re.html

Basic Matching

In [5]:
pattern=r'xyz'
text='abcdxyz'
match=re.search(pattern,text)
if match:
  print('Match Found,',match.group())
else:
  print('No Match')

Match Found, xyz


Using Raw Strings (prefixed with r) to avoid backslashes being interpreted as escape characters

In [9]:
pattern=r'\d+'
number='12234abc'
match=re.search(pattern,number)
if match:
  print(match.group())
else:
  print('no match')

12234


Matching Special Characters and Metacharacters:
1. '.' (dot): Matches any single character except a newline.
2.  '*' (asterisk): Matches zero or more occurrences of the previous character or group.
3. '+' (plus): Matches one or more occurrences of the previous character or group.
4. '?' (question mark): Matches zero or one occurrence of the previous character or group.
5. '[]' (character class): Matches any single character within the brackets.
6. '|' (pipe): Matches either the pattern before or after the pipe.
python


In [12]:
pattern = r"gr.y"
text='acbgray'
match=re.search(pattern,text)
if match:
  print(match.group())
else:
  print('No match found')

gray


In [21]:
pattern = r"gr*y"
text='acbgry'
match=re.search(pattern,text)
if match:
  print(match.group())
else:
  print('No match found')

gry


In [25]:
pattern = r"badgr+y"
text='imbadgry'
match=re.search(pattern,text)
if match:
  print(match.group())
else:
  print('No match found')

badgry


In [35]:
pattern = r"[g]"
text='[acg]'
match=re.search(pattern,text)
if match:
  print(match.group())
else:
  print('No match found')

g


In [36]:
pattern=r'cold|hot'
text='cold as ice'
match=re.search(pattern,text)
if match:
  print(match.group())
else:
  print('match not found')

cold


Character classes allow you to define a set of characters that can match at a particular position. 
1. [abc] matches either 'a', 'b', or 'c'.
2. [0-9] matches any digit from 0 to 9.

In [40]:
pattern=r'[a-z]'
text='b'
match=re.search(pattern,text)
if match:
  print(match.group())
else:
  print('match not found')

b


In [43]:
pattern = r"[aeiou]"
text='ant'
match=re.search(pattern,text)
if match:
  print('vowels',match.group())
else:
  print('match not found')

vowels a


Quantifiers

In [49]:
pattern = r"a{3}"
text='scream aaa'
match=re.search(pattern,text)
if match:
  print('True',match.group())
else:
  print('match not found')

True aaa


Anchors:
The caret (^) and dollar sign ($) are anchors that match the start and end of a line or string.

In [56]:
pattern = r"^love"
text='love is blind'
match=re.search(pattern,text)
if match:
  print('found',match.group())
else:
  print('match not found')

found love


 Finding All Matches with re.findall()

In [70]:
pattern=r'\d+'
text='i have 23 apples and 4 bananas'
matches=re.findall(pattern,text)
print(matches)

['23', '4']


Groups and Capturing

In [71]:
pattern = r"(\d+)"
text = "First number: 123, Second number: 456"
matches = re.findall(pattern, text)

if matches:
    print("Matches found!")
    print("Matched numbers:", matches)
else:
    print("No match.")


Matches found!
Matched numbers: ['123', '456']


Backreferences-  allow you to refer to a previously captured group

In [72]:
import re
pattern = r"(\d+)-\1"  
text = "123-123, 456-456, 789-123"

matches = re.findall(pattern, text)
if matches:
    print("Matches found!")
    for match in matches:
        print("Matched pattern:", match)
else:
    print("No match.")

Matches found!
Matched pattern: 123
Matched pattern: 456


Splitting with re.split()

In [73]:
pattern=r'\s+'
text='Hello Sakthi'
split=re.split(pattern,text)
print(split)

['Hello', 'Sakthi']


Substituting with re.sub()

In [74]:
pattern = r"apple"
text = "I have an apple."
replacement = "banana"
new_text = re.sub(pattern, replacement, text)
print(new_text) 

I have an banana.


Compiling Regular Expressions- Regular expressions can be compiled into pattern objects using re.compile()

In [75]:
pattern = re.compile(r"apple")
text = "I have an apple."
match = pattern.search(text)
print(match)

<re.Match object; span=(10, 15), match='apple'>


Greedy Matching

In [76]:
text = "Hello, world!"
pattern = r"H.*o"
match = re.search(pattern, text)
if match:
    print("Greedy match:", match.group())
else:
    print("No match.")


Greedy match: Hello, wo


Non-Greedy (Lazy) Matching:

In [77]:
text = "Hello, world!"
pattern = r"H.*?o"
match = re.search(pattern, text)
if match:
    print("Non-greedy match:", match.group())
else:
    print("No match.")


Non-greedy match: Hello


Escaping Special Characters

In [78]:
pattern = r"\."
text = "Hello. World."
match = re.search(pattern, text)
print(match.group())

.
