# RegEx

RegEx or Regular Expression is a sequence of characters that form a search pattern. Python has a built-in package called `re`, which can be used to work with regular expressions.

In [3]:
import re

# Search a string if it starts with "Hello" and ends with "you"
string = "Hello, how are you"
if (re.search("^Hello.*you$", string)):
    print("Yes")
else:
    print("no")

Yes


## RegEx Functions

1. `findall`: Returns a list containing all matches
2. `search`: Returns a match object
3. `split`: Returns a list where the string has been split at each match
4. `sub`: Replaces/substitutes match(es) with a string

In [12]:
string = "The quick brown fox jumps over the lazy dog"
print(re.findall(".he", string))

['The', 'the']


In [13]:
# Split the string by each white space
string = "Is this what I think it is?"
print(re.split("\s", string))

['Is', 'this', 'what', 'I', 'think', 'it', 'is?']


In [15]:
# Replace every occurence with a string of our choice
txt = "Hello, here you are"
print(re.sub("e", "3", txt))

H3llo, h3r3 you ar3


## Metacharacters

Characters with a specified meaning.

| Character | Description                                 | Example         |
|-----------|---------------------------------------------|-----------------|
| []        | A set of characters                         | "[a-m]"         |
| \         | Signals a special sequence                  | "\d"            |
| .         | Any character (except newline character)    | "he..o"         |
| ^         | Starts with                                 | "^hello"        |
| $         | Ends with                                   | "planet$"       |
| *         | Zero or more occurrences                    | "he.*o"         |
| +         | One or more occurrences                     | "he.+o"         |
| ?         | Zero or one occurrences                     | "he.?o"         |
| {}        | Exactly the specified number of occurrences | "he.{2}o"       |
| \|        | Either or                                   | "falls\|stays"  |
| ()        | Capture and group                           | |


## Special Sequences

A `\` followed by one or more characters that has a special meaning.

| Character | Description                                           | Example        |
|-----------|-------------------------------------------------------|----------------|
| \A        | Returns a match if specified characters are at the beginning of the string | "\AThe"         |
| \b        | Returns a match where specified characters are at the beginning or end of a word | r"\bain" r"ain\b" |
| \B        | Returns a match where specified characters are present but NOT at the beginning or end of a word | r"\Bain" r"ain\B" |
| \d        | Returns a match where the string contains digits (numbers 0-9) | "\d"           |
| \D        | Returns a match where the string DOES NOT contain digits | "\D"           |
| \s        | Returns a match where the string contains a white space character | "\s"          |
| \S        | Returns a match where the string DOES NOT contain a white space character | "\S"          |
| \w        | Returns a match where the string contains any word characters (a to Z, 0-9, _) | "\w"         |
| \W        | Returns a match where the string DOES NOT contain any word characters | "\W"         |
| \Z        | Returns a match if specified characters are at the end of the string | "Spain\Z"     |


## Sets

Set of characters inside `[]` with a special meaning.

| Character | Description |
|-----------|-------------|
| [arn]     | Returns a match where one of the specified characters (a, r, or n) is present |
| [a-n]     | Returns a match for any lowercase character between a and n alphabetically |
| [^arn]    | Returns a match for any character EXCEPT a, r, and n   |              |
| [0123]    | Returns a match where any of the specified digits (0, 1, 2, or 3) are present |
| [0-9]     | Returns a match for any digit between 0 and 9         |
| [0-5][0-9]| Returns a match for any two-digit numbers from 00 to 59 |
| [a-zA-Z]  | Returns a match for any character alphabetically between a and z, lowercase OR uppercase |
| [+]       | In sets, +, *, ., (), $, {} has no special meaning, so [+] means: return a match for any + character in the string |

### Match Object

The search function returns a match object, which contains information about the search and the result.

The match object has properties and methods:
* `.span()`: returns a tuple containing the start and end positions of the match
* `.string`: the string passed into the function
* `group()`: returns the part of the string where there was a match.

In [30]:
txt = "Inheritance and Polymorphism are concepts in Object Oriented Programming."
x = re.search(".*re", txt)
print(x.span())
print(x.group())
print(x.string)

(0, 32)
Inheritance and Polymorphism are
Inheritance and Polymorphism are concepts in Object Oriented Programming.
