### Introduction to Regular Expressions:
Regular expressions are sequences of characters used to define search patterns. They are widely used in text processing to find, match, and manipulate strings based on specific patterns.

### Basic Syntax:
In Python, regular expressions are supported through the re module. To use it, you need to import the module first: import re. The re module provides functions and methods for working with regular expressions.

### Matching Patterns:
The most basic operation in regex is matching patterns. The re.match() function is used to check if a pattern matches at the beginning of a string. For example:

In [19]:
import re

pattern = r"hello"
string = "hello world"

if re.match(pattern, string):
    print("Match found!")
else:
    print("Match not found.")

Match found!


### Searching Patterns:
The re.search() function is used to search for a pattern anywhere in a string. It returns the first occurrence of the pattern. For example:

In [31]:
import re

pattern = r"world"
string = "hello world"

if re.search(pattern, string):
    print("Match found!")
else:
    print("Match not found.")

Match found!


### Pattern Modifiers:
Regex supports various modifiers to refine pattern matching. Some commonly used modifiers are:

* re.I or re.IGNORECASE: Ignore case when matching.
* re.M or re.MULTILINE: Make the pattern match across multiple lines.
* re.S or re.DOTALL: Make the dot character (.) match any character, including newlines.

These modifiers can be passed as flags to the regex functions. For example:

In [30]:
import re

pattern = r"hello"
string = "HELLO world"

if re.search(pattern, string, re.IGNORECASE):
    print("Match found!")
else:
    print("Match not found.")

Match found!


### Character Classes:
Character classes allow you to specify a set of characters to match. Some commonly used character classes are:

* [abc]: Matches either 'a', 'b', or 'c'.
* [0-9]: Matches any digit.
* [a-zA-Z]: Matches any lowercase or uppercase letter.

For example:

In [29]:
import re

pattern = r"[aeiou]"
string = "hello world"

matches = re.findall(pattern, string)
print(matches)  # Output: ['e', 'o', 'o']

['e', 'o', 'o']


### Quantifiers:
Quantifiers allow you to specify how many times a pattern should occur. Some commonly used quantifiers are:

* *: Matches zero or more occurrences.
* +: Matches one or more occurrences.
* ?: Matches zero or one occurrence.
* {n}: Matches exactly n occurrences.
* {n,}: Matches at least n occurrences.
* {n,m}: Matches between n and m occurrences.

In [28]:
import re

pattern = r"ab*"
string = "aa ab abb abbb"

matches = re.findall(pattern, string)
print(matches)  # Output: ['a', 'ab', 'abb', 'abbb']

['a', 'a', 'ab', 'abb', 'abbb']


### Grouping and Capturing:
You can use parentheses to group parts of a pattern and capture the matched text. For example:

In [27]:
import re

pattern = r"(a[bcd])\1"
string = "abab acac adad"

matches = re.findall(pattern, string)
print(matches)  # Output: ['ab', 'ac', 'ad']

['ab', 'ac', 'ad']


### Special Sequences:
Regex provides special sequences that match specific types of characters. Some commonly used special sequences are:

* \d: Matches any digit.
* \D: Matches any non-digit character.
* \w: Matches any alphanumeric character.
* \W: Matches any non-alphanumeric character.
* \s: Matches any whitespace character.
* \S: Matches any non-whitespace character.

For example:

In [26]:
import re

pattern = r"\b\w+\b"
string = "Hello, world!"

matches = re.findall(pattern, string)
print(matches)  # Output: ['Hello', 'world']

['Hello', 'world']


### Replacement and Substitution:
You can use regex to replace matched patterns in a string. The re.sub() function is used for substitution. For example:

In [32]:
import re

pattern = r"world"
string = "hello world"

new_string = re.sub(pattern, "Python", string)
print(new_string)  # Output: "hello Python"

hello Python


### Splitting Strings:
You can split strings based on regex patterns using the re.split() function. For example:

In [34]:
import re

pattern = r"\s"
string = "hello world"

parts = re.split(pattern, string)
print(parts)  # Output: ['hello', 'world']

['hello', 'world']


### Anchors:
Anchors allow you to match patterns at specific positions in a string. Some commonly used anchors are:

* ^: Matches at the beginning of a string.
* $: Matches at the end of a string.
* \b: Matches at a word boundary.

For example:

In [35]:
import re

pattern = r"^hello"
string = "hello world"

if re.search(pattern, string):
    print("Match found!")
else:
    print("Match not found.")

Match found!


### Match Objects:
When using the re.search() or re.match() functions, a match object is returned if a match is found. The match object contains information about the match and provides various methods and attributes to access that information. For example:

In [36]:
import re

pattern = r"world"
string = "hello world"

match = re.search(pattern, string)
if match:
    print("Match found!")
    print("Match start:", match.start())
    print("Match end:", match.end())
    print("Match span:", match.span())
    print("Match group:", match.group())
else:
    print("Match not found.")

Match found!
Match start: 6
Match end: 11
Match span: (6, 11)
Match group: world


### Raw Strings:
Regular expressions often contain backslashes, which are special characters in Python strings. To treat a string as a raw string and interpret backslashes literally, you can prefix it with an 'r'. For example:

In [37]:
import re

pattern = r"\d+"
string = "12345"

match = re.search(pattern, string)
print(match.group())  # Output: "12345"

12345


### Greedy and Non-Greedy Matches:
By default, regular expressions use greedy matching, which means they try to match as much as possible. To make a match non-greedy, you can use the ? modifier after a quantifier. For example:

In [38]:
import re

pattern = r"<.*?>"
string = "<p>Hello</p><p>World</p>"

matches = re.findall(pattern, string)
print(matches)  # Output: ['<p>', '</p>', '<p>', '</p>']

['<p>', '</p>', '<p>', '</p>']


### Lookahead and Lookbehind:
Lookahead and lookbehind assertions allow you to specify patterns that must be followed or preceded by another pattern without including them in the match. Lookahead is denoted by (?=...), and lookbehind is denoted by (?<=...) or (?<!...). For example:

In [39]:
import re

pattern = r"\w+(?=\spython)"
string = "I love Python programming"

match = re.search(pattern, string)
print(match.group())  # Output: "Python"

AttributeError: 'NoneType' object has no attribute 'group'

### Flags:
Flags provide additional options for pattern matching. They can be passed as arguments to the regex functions or embedded within the pattern using (?...) syntax. Some commonly used flags are:

* re.IGNORECASE or re.I: Ignore case when matching.
* re.MULTILINE or re.M: Make the pattern match across multiple lines.
* re.DOTALL or re.S: Make the dot character (.) match any character, including newlines.
* re.VERBOSE or re.X: Allow multiline patterns with comments and whitespace.

For example:

In [40]:
import re

pattern = r"""
    hello     # Match 'hello'
    \s+       # Match one or more whitespace characters
    world     # Match 'world'
"""
string = "Hello     world"

match = re.search(pattern, string, re.IGNORECASE | re.VERBOSE)
if match:
    print("Match found!")
else:
    print("Match not found.")

Match found!


### Backreferences:
Backreferences allow you to refer to captured groups within the pattern itself. You can use \1, \2, etc., to refer to the first, second, etc., captured group. For example:

In [42]:
import re

pattern = r"(\w+) \1"
string = "hello hello"

match = re.search(pattern, string)
if match:
    print("Match found!")
else:
    print("Match not found.")

Match found!


### Compiled Regular Expressions:
In addition to using regex functions directly, you can also compile regular expressions into pattern objects for better performance when reusing the same pattern multiple times. This can be done using the re.compile() function. For example:

In [43]:
import re

pattern = re.compile(r"hello")
string = "hello world"

match = pattern.search(string)
if match:
    print("Match found!")
else:
    print("Match not found.")

Match found!
