# Regular Expression in Python

Regular expressions (also known as regex or regexp) are a powerful tool for searching and manipulating text. They allow you to define a pattern or set of rules that describe a particular string of characters, and then search for or manipulate any text that matches that pattern.

Regular expressions are commonly used in programming, particularly for tasks like data validation, searching and replacing text, and parsing strings. They are also useful in text editors, command-line tools, and other applications that involve working with text.

Some of the benefits of using regular expressions include:

- **Flexibility:** Regular expressions are incredibly flexible and can match a wide range of patterns, from simple strings to complex sequences of characters.
- **Efficiency:** Regular expressions are often faster than alternative methods for text processing, particularly for large amounts of data.
- **Accuracy:** Regular expressions are very precise and can be used to match specific patterns, ensuring that you only work with the data that you need.
- **Standardization:** Regular expressions are a widely accepted standard for working with text, making it easier to share and collaborate on code that involves text processing.

```
https://regexr.com/
```

## Python RegEx Methods

Python provides a powerful module called `re` for working with regular expressions. This module provides various methods for working with regular expressions in Python, including:

### 1. re.search(pattern, string, flags=0) 

The `re.search()` function is used to search for a pattern in a string and return the first occurrence of the pattern. It returns `None` if the match is not found. This is equivalent to `in` operator used with python string. Since the result is either some value or None, depending on whether a match was found or not, the result can be used with conditional expressions as well.

- `pattern`: The regular expression pattern to search for
- `string`: The string to search in
- `flags (optional)`: A set of flags that modify the behavior of the search

It's a good idea to use raw strings (represented as `r'...'`) to define regular expression patterns. This will make more sense later on.

The match object contains information about the match. Some of the useful methods and attributes of the match object are:

- `group()`: Returns the matched string
- `start()`: Returns the starting index of the match
- `end()`: Returns the ending index of the match
- `span()`: Returns a tuple containing the starting and ending indices of the match

In [1]:
import re

In [2]:
string = "The quice brown fox jumps over the lazy dog."
pattern = r"he"

match = re.search(pattern, string)

In [9]:
# Using re.search() with conditional expressions
if match:
    print("Match Object:", match)
    print("Match Group:", match.group())
    print("Match Start:", match.start())
    print("Match End:", match.end())
    print("Match Span:", match.span())
else:
    print("No match found.")

Match Object: <re.Match object; span=(1, 3), match='he'>
Match Group: he
Match Start: 1
Match End: 3
Match Span: (1, 3)


In the expression `<re.Match object; span=(1, 3), match='he'>` above, `re.Match` is the data type of the object, `match='he'` is the string that has been matched and `span=(1, 3)` is the index of start and end of the matched pattern `pattern` within the entire text `string`, where indexing starts from `0` as in regular python.


### 2. re.findall(pattern, string, flags=0)

The `re.findall()` function is used to find all occurrences of a regular expression pattern in a string. All the parameters are same as that used with `re.search()`. The result of `re.findall()` is a list of all the matches found. The result in the example below is quite simple, we will discuss the pattern design later to draw more insights on the upcoming topic. 

In [18]:
string = "The quice brown fox jumps over the lazy dog."
pattern = r"[A-Z]he"

match = re.findall(pattern, string)

In [19]:
print(match)

['The']


### 3. re.match(pattern, string, flags=0)

`re.match()` is a method that searches for a pattern in the beginning of a string. It returns a match object if it finds a match, and None if it does not. All the parameters are same as that used with `re.search()` and `re.findall()`. Similar to `re.search()` object, `re.match()` object also has methods like `group()`, `start()`, `end()`, `span()`.

In [20]:
string = "The quice brown fox jumps over the lazy dog."
pattern = r"[A-Z]he"

match = re.match(pattern, string)

In [21]:
# Using re.match() with conditional expressions
if match:
    print("Match Object:", match)
    print("Match Group:", match.group())
    print("Match Start:", match.start())
    print("Match End:", match.end())
    print("Match Span:", match.span())
else:
    print("No match found.")

Match Object: <re.Match object; span=(0, 3), match='The'>
Match Group: The
Match Start: 0
Match End: 3
Match Span: (0, 3)


### 4. re.sub(pattern, repl, string, count=0, flags=0)

`re.sub()` is a method that is used to replace occurrences of a pattern in a string with a replacement string. It returns a new string with the replacements made. Here are the parameters used:

- `pattern`: The regular expression pattern to search for
- `repl`: replacement string that you want to use in place of matched pattern
- `string`: The string to search in
- `count`: Maximum number of replacements to make
- `flags (optional)`: A set of flags that modify the behavior of the search

In [24]:
string = "The quice brown fox jumps over the lazy dog."
pattern = r"[a-z]he"
repl = "The"

match = re.sub(pattern, repl, string, count=1)


In [25]:
match

'The quice brown fox jumps over The lazy dog.'

### 5. re.split(pattern, string, maxsplit=0, flags=0)

`re.split()` is a method that is used to split a string into a list of substrings based on a regular expression pattern. It returns a list of the substrings. It is similar to Python's `split()` method use with Python `str` objects. Let's see how each parameter works:

- `pattern`: The regular expression pattern to search for
- `string`: The string to search in
- `maxsplit`: Maximum number of splits to make
- `flags (optional)`: A set of flags that modify the behavior of the search

In [36]:
string = "The quice brown fox 1 <div> over the 2 lazy dog."
pattern = r"<[a-z]\w+>"

segements = re.split(pattern, string)

In [37]:
segements

['The quice brown fox 1 ', ' over the 2 lazy dog.']