# The language of pattern matching
## Regular Expressions

### Cheat Sheet



| Symbol | Meaning                                                 |
| ------ | ------------------------------------------------------- |
|   --   | ---------------- Metacharacters                         |
|        |  `. ^ $ * + ? { } [ ] \ | ( )`            |
|   --   | ---------------- Anchors                                |
| ^      | string starts with                                      |
| $      | string ends with                                        |
|   --   | ---------------- Boundaries                             |
| \b     | word boundary                                           |
| \B     | not a word boundary                                     |
|        |  ---------------- Quantifiers                           |
| ?      | zero or one                                             |
| *      | zero or more                                            |
| +      | one or more                                             |
| {2}    | two occurrences                                         |
| {4,}   | four or more occurrences                                |
| {1,4}  | at least one and up to 4 occurrences                    |
|  --    |  ---------------- Character Class                       |
| [abc]  | character that is a,b, or c                             |
| [^abc] | any character that is not a,b, or c                     |
| [a-z]  | any character in the range a through z (lower case)     |
| [^a-z] | any character not in the range a through z (lower case) |
|  --    |  ---------------- Characters                        |
|  .     | any character except a line break (Note. a period will need to be escaped \\.)|
| \d     | any digit (roughly [0-9])                               |
| \w     | any word (roughly [a-zA-Z0-9])                          |
| \s     | any whitespace (rougly [\f\n\r\t\v])                    |
| \D     | any non digit                                           |
| \W     | any non word                                            |
| \S     | any non whitespace                                      |
| --     |  ---------------- Logic                                 |
| x\|y   | either x or y                                           |
| ()     | capture                                                 |
|        |                                                         |



In [1]:
import re

The Python built in re module.

As a network engineer, and always preferring built in modules if possible, more often that not, **re** is all you need, particularly if you are searching specific values (vs a long string of text).


In [2]:
import regex

ModuleNotFoundError: No module named 'regex'

The third-party **regex** module, better at unicode and more advanced features. 

In [None]:
semiform = """
oceans-cs01#show int status | i connected
Gi1/0/1                      connected    1          a-full  a-100 10/100/1000BaseTX
Gi1/0/3                      connected    1          a-half   a-10 10/100/1000BaseTX
Gi1/0/7                      connected    1          a-full  a-100 10/100/1000BaseTX
Gi1/0/14                     connected    1          a-full   a-10 10/100/1000BaseTX
Gi1/0/23                     connected    1          a-full a-1000 10/100/1000BaseTX
Gi1/0/46                     connected    1          a-full a-1000 10/100/1000BaseTX
Gi1/1/4                      connected    1          a-full a-1000 10/100/1000BaseTX SFP
Gi2/0/2                      connected    1          a-full   a-10 10/100/1000BaseTX
Gi2/1/1                      connected    1          a-full a-1000 10/100/1000BaseTX
Gi3/0/31                     connected    1          a-full a-1000 10/100/1000BaseTX SFP
Te4/1/1                      connected    1          a-full a-1000 10/100/1000BaseTX SFP
Fa0                          connected    routed     a-full  a-100 10/100BaseTX
oceans-cs01#
"""

*Avoid searching one big block of text, particularly if its semi-formatted text*

In [None]:
semiform_lines = semiform.split("\n")
itemized_lines = []
for line in semiform_lines:
    # Make sure line has something in it so you don't wind up with an empty list!
    if line:
        itemized_lines.append(line.strip().split())

In [None]:
len(semiform_lines)

In [None]:
itemized_lines

In [None]:
for line in itemized_lines:
    print(line[0])
    if re.search(r"^Gi\d/0/\d{1,2}$", line[0]):
        print(line)
    else: 
        print(f"\tPattern not found!")
    print()

In [None]:
for line in semiform_lines:
    if re.search(r"^(Gi|Te)\d/(0|1)/\d", line):
        print(line)
    print()

In [None]:
for line in semiform_lines:
    
    if re.search(r"^(Gi|Te)\d/1/\d{1}", line):
        print(line)

In [None]:
print(semiform_lines[2])
intf = re.search(r"^(Gi\d/(0|1)/\d{1})", semiform_lines[2])

In [None]:
intf

In [None]:
dir(intf)

In [None]:
intf.group()

In [None]:
intf.groups()