### Regular Expressions

#### Functions

In [26]:
import re

txt = "The rain in Spain"

`search()` searches the string for a match, and returns a Match object if there is a match, None otherwise. If there is more than one match, only the first occurrence of the match will be returned.

In [27]:
x = re.search("^The.*Spain$", txt)
print(x)

<re.Match object; span=(0, 17), match='The rain in Spain'>


`findall()` returns a list containing all matches

In [28]:
x = re.findall("ain", txt)
print(x)

['ain', 'ain']


`split()` returns a list where the string has been split at each match

In [29]:
x = re.split(
    r"\s", txt
)  # r tells Python to treat backslashes as literal characters and not as the beginning of escape sequences
print(x)

['The', 'rain', 'in', 'Spain']


`sub()` replaces the matches with the text of your choice

In [30]:
x = re.sub(r"\s", "_", txt)
print(x)

The_rain_in_Spain


#### Pattern syntax

| Symbol | Description |
|--------|-------------|
| `.`    | Matches any character except a newline. |
| `^`    | Matches the start of the string. |
| `$`    | Matches the end of the string. |
| `*`    | Matches 0 or more repetitions of the preceding element. |
| `+`    | Matches 1 or more repetitions of the preceding element. |
| `?`    | Matches 0 or 1 repetition of the preceding element. |
| `\s`   | Matches whitespace characters. |
| `\d`   | Matches digits. Equivalent to `[0-9]`. |
| `\w`   | Matches word characters (letters, digits, and underscore). |
| `\b`   | Matches the boundary between word and non-word characters. |
| `[abc]`| Matches any of the characters a, b, or c. |
| `[a-z]`| Matches any lowercase letter. |
| `[A-Z]`| Matches any uppercase letter. |
| `[^abc]`| Matches any character except a, b, or c. |

### Examples

In [33]:
# Email Validation
email_pattern = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")

# URL Validation
url_pattern = re.compile(r"https?://(?:www\.)?\w+\.\w+(?:\.\w+)*")

# IP Address Validation
ip_pattern = re.compile(r"^(\d{1,3}\.){3}\d{1,3}$")