**1) Regex Syntax**

Regular Expressions (RegEx) are sequences of characters that define a search pattern. This pattern is used to match, find, or replace text in a string. 

**2) Quantifiers**

In RegEx, quantifiers determine how often an element can occur. Here are the main ones:

- `*`: 0 or more occurrences
- `+`: 1 or more occurrences
- `?`: 0 or 1 occurrence
- `{n}`: exactly n occurrences
- `{n,}`: n or more occurrences
- `{,n}`: up to n occurrences
- `{n,m}`: at least n, but not more than m occurrences

In [4]:
import re

txt = "I have 123 cats, 4444 dogs, and 1 bird."
print(re.findall("\d+", txt)) # matches one or more digits

['123', '4444', '1']





**3) Metacharacters**

Metacharacters are characters with a special meaning:

- `.`: Any character (except newline character)
- `^`: Starts with
- `$`: Ends with
- `*`: Zero or more occurrences
- `+`: One or more occurrences
- `?`: Zero or one occurrences
- `{}`: Exactly the specified number of occurrences
- `|`: Either or
- `()`: Capture and group

In [5]:
import re

txt = "I love cats and dogs."
print(re.findall("^I love", txt)) # matches string starting with 'I love'

['I love']


**4) Special Sequences**

- `\A`: Returns a match if the specified characters are at the beginning of the string
- `\b`: Returns a match where the specified characters are at the beginning or at the end of a word
- `\B`: Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
- `\d`: Returns a match where the string contains digits (numbers from 0-9)
- `\D`: Returns a match where the string DOES NOT contain digits
- `\s`: Returns a match where the string contains a white space character
- `\S`: Returns a match where the string DOES NOT contain a white space character
- `\w`: Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)
- `\W`: Returns a match where the string DOES NOT contain any word characters
- `\Z`: Returns a match if the specified characters are at the end of the string

In [6]:
import re

txt = "3 cats, 2 dogs."
print(re.findall("\d", txt)) # matches all digits in the string

['3', '2']



**5) Sets**

- `[]`: A set of characters
- `\`: Signals a special sequence (can also be used to escape special characters)
- `. ^ $ * + ? {} [] \ | ()`: Metacharacters
- `[]`: A set of characters
- `[^]`: Matches every character except the ones in brackets
- `[abc]`: Matches 'a', 'b', or 'c'
- `[a-z]`: Any lowercase letter
- `[A-Z]`: Any uppercase letter
- `[0-9]`: Any digit
- `[0123]`: Matches '0', '1', '2', or '3'

In [7]:
import re

txt = "I love cats and dogs."
print(re.findall("[acdg]", txt)) # matches

['c', 'a', 'a', 'd', 'd', 'g']


**6) Python re module**

The `re` module in Python provides support for regular expressions. Here are some commonly used methods:

- `re.findall()`: Returns a list containing all matches
- `re.search()`: Returns a Match object if there is a match anywhere in the string
- `re.split()`: Returns a list where the string has been split at each match
- `re.sub()`: Replaces one or many matches with a string

In [8]:
import re

txt = "I have 123 cats, 4444 dogs, and 1 bird."
x = re.findall("\d+", txt) # find all groups of digits
print(x)

['123', '4444', '1']


**7) Methods with regex usage**

In [9]:
import re

txt = "Hello, my name is John Doe and my phone number is 123-456-7890."

# Using re.search(): Find the first phone number in the string
x = re.search("\d{3}-\d{3}-\d{4}", txt)
print(x.group())

# Using re.findall(): Find all phone numbers in the string
x = re.findall("\d{3}-\d{3}-\d{4}", txt)
print(x)

# Using re.split(): Split the string at every white-space character
x = re.split("\s", txt)
print(x)

# Using re.sub(): Replace all digits in the string with "X"
x = re.sub("\d", "X", txt)
print(x)

123-456-7890
['123-456-7890']
['Hello,', 'my', 'name', 'is', 'John', 'Doe', 'and', 'my', 'phone', 'number', 'is', '123-456-7890.']
Hello, my name is John Doe and my phone number is XXX-XXX-XXXX.
