<a href="https://colab.research.google.com/github/KawaiiZT/IPA/blob/main/Regular_Expression_Summary_and_Exercises_for_Students.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Regular Expression (re)

Ref: https://pyneng.readthedocs.io/en/latest/book/Part_III.html

Regular expressions can be used, for example, to:
- After processing the output of show version command, you can collect information about OS version and uptime.
- get from log file the lines that correspond to the template.
- get from configuration those interfaces that do not have a description

In [1]:
import re

Python uses re module to work with regular expressions.

Core functions of re module:
- match - searches a sequence at the beginning of the line
- search - searches for first match with template
- findall - searches for all matches with template. Returns the resulting strings as a list
- finditer - searches for any matches with template. Returns an iterator
- compile - compiles regex. You can then apply all of listed functions to this object
- fullmatch - the entire line must conform to regex described

In addition to functions that search matches, module has the following functions:
- re.sub - for replacement in strings
- re.split - to split string into parts

Syntax of search function is:
match = re.search(pattern, string, flags=0)

Function search has three parameters:
1.   pattern - regular expression
2.   string - string in which search pattern is searched
3.   flags - change regex behavior


In [2]:
help(re)

Help on package re:

NAME
    re - Support for regular expressions (RE).

MODULE REFERENCE
    https://docs.python.org/3.11/library/re.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module provides regular expression matching operations similar to
    those found in Perl.  It supports both 8-bit and Unicode strings; both
    the pattern and the strings being processed can contain null bytes and
    characters outside the US ASCII range.
    
    Regular expressions can contain both special and ordinary characters.
    Most ordinary characters, like "A", "a", or "0", are the simplest
    regular expressions; they simply match themselves.  You can
    concatenate ordinary characters, so l

In [3]:
int_line = '  MTU 1500 bytes, BW 10000 Kbit, DLY 1000 usec,'

In re module, several functions return **Match** object if a match is found:

- search

- match

- finditer - returns an iterator with Match objects



In [4]:
match = re.search('MTU', int_line)
type(match)

re.Match

In [5]:
match = re.search('MU', int_line)
type(match)

NoneType

If a match is found, function will return special object **Match**. If there is no match, function will return
**None**.

In [6]:
match = re.search('MTU', int_line)
print(match)

<re.Match object; span=(2, 5), match='MTU'>


In [8]:
match.span()

(2, 5)

In [9]:
match.span()[0]

2

In [10]:
match.span()[1]

5

In [11]:
int_line[match.span()[0]:match.span()[1]]

'MTU'

Much more convinient way

In [7]:
match.group()

'MTU'

Python has special designations for character sets:
- \d - any digit
- \D - any non-numeric value
- \s - whitespace character
- \S - all except whitespace characters
- \w - any letter, digit or underline character
- \W - all except letter, digit or underline character

Repeating character sets:
- regex+ - one or more repetitions of preceding element
- regex* - zero or more repetitions of preceding element
- regex? – zero or one repetition of preceding element
- regex{n} - exactly n repetitions of preceding element
- regex{n,m} - from n to m repetitions of preceding element
- regex{n,} - n or more repetitions of preceding element

In [12]:
match = re.search('MTU \d+', int_line)

In [13]:
match.group()

'MTU 1500'

In [14]:
match.group(0)

'MTU 1500'

In [15]:
match.group(1)

IndexError: no such group

In [16]:
match = re.search('MTU (\d+)', int_line)

In [17]:
match.group()

'MTU 1500'

In [18]:
match.group(0)

'MTU 1500'

In [19]:
match.group(1)

'1500'

In [20]:
match.group(2)

IndexError: no such group

In [21]:
line = '100 aab1.a1a1.a5d3 FastEthernet0/1'
re.search('a1+', line).group()

'a1'

In [22]:
re.search('(a1)+', line).group()

'a1a1'

Special symbols
- . - any character except new line character
- ^ - beginning of line
- $ - end of line
- [abc] - any symbol in square brackets
- [^abc] - any symbol except those in square brackets
- a|b - element a or b
- (regex) - expression is treated as one element. In addition, substring that matches an expression is memorized

In [23]:
int_line = '  MTU 1500 bytes, BW 10000 Kbit, DLY 1000 usec,'

In [24]:
match = re.search('MTU (\d+) .* BW (\d+) .* DLY (\d+)', int_line)

In [25]:
match.group()

'MTU 1500 bytes, BW 10000 Kbit, DLY 1000'

In [26]:
print(match.group(0))
print(match.group(1))
print(match.group(2))
print(match.group(3))

MTU 1500 bytes, BW 10000 Kbit, DLY 1000
1500
10000
1000


In [27]:
match.groups()

('1500', '10000', '1000')

In [28]:
print(match.groups()[0])
print(match.groups()[1])
print(match.groups()[2])

1500
10000
1000


**Exercise 1**: Find time (HH:MM:SS)

In [35]:
log = '*Jul 7 06:15:18.695: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/3, changed state to down'

In [62]:
match = re.search('(\d\d:\d\d:\d\d)', log)

In [63]:
print(match.groups())

('06:15:18',)


**Exercise 2**: Find MAC address

In [64]:
log2 = 'Jun 3 14:39:05.941: %SW_MATM-4-MACFLAP_NOTIF: Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'

In [182]:
match = re.search('.{4}\..{4}\..{4}',log2)

In [183]:
print(match)

<re.Match object; span=(51, 65), match='f03a.b216.7ad7'>


**Exercise 3**: Find MAC address and Flapping Ports

In [184]:
log3 = 'Jun 3 14:39:05.941: %SW_MATM-4-MACFLAP_NOTIF: Host f03a.b216.7ad7 in vlan 10 is flapping between port Gi0/5 and port Gi0/15'

In [190]:
re.search('([0-9a-f]{4}\.[0-9a-f]{4}\.[0-9a-f]{4}) .+ port (\w+/\d+) .+ port (\w+/\d+)', log3).groups()

('f03a.b216.7ad7', 'Gi0/5', 'Gi0/15')

**Exercise 4**: Find number of interfaces that are down

In [191]:
sh_ip_int_br = """Router# show ip interface brief
Interface             IP-Address      OK?    Method Status     	Protocol
GigabitEthernet0/1    unassigned      YES    unset  up         	up
GigabitEthernet0/2    192.168.190.235 YES    unset  up         	up
GigabitEthernet0/3    unassigned      YES    unset  up         	up
GigabitEthernet0/4    192.168.191.2   YES    unset  up         	up
TenGigabitEthernet2/1 unassigned      YES    unset  up         	up
TenGigabitEthernet2/2 unassigned      YES    unset  up         	up
TenGigabitEthernet2/3 unassigned      YES    unset  up         	up
TenGigabitEthernet2/4 unassigned      YES    unset  down       	down
GigabitEthernet36/1   unassigned      YES    unset  down        down
GigabitEthernet36/2   unassigned      YES    unset  down        down
GigabitEthernet36/11  unassigned      YES    unset  down       	down
GigabitEthernet36/25  unassigned      YES    unset  down       	down
Te36/45               unassigned      YES    unset  down       	down
Te36/46               unassigned      YES    unset  down       	down
Te36/47               unassigned      YES    unset  down       	down
Te36/48               unassigned      YES    unset  down       	down
Virtual36             unassigned      YES    unset  up         	up"""

In [222]:
match = re.findall('(\S+\d+/+\d+).*down', sh_ip_int_br)
print(match)
len(match)

['TenGigabitEthernet2/4', 'GigabitEthernet36/1', 'GigabitEthernet36/2', 'GigabitEthernet36/11', 'GigabitEthernet36/25', 'Te36/45', 'Te36/46', 'Te36/47', 'Te36/48']


9

In [237]:
match = re.findall('.*down', sh_ip_int_br)
len(match)

9

**Exercise 5**: Show interface name that has an IP address and interface status is up up.

In [None]:
sh_ip_int_br = """Router# show ip interface brief
Interface             IP-Address      OK?    Method Status     	Protocol
GigabitEthernet0/1    unassigned      YES    unset  up         	up
GigabitEthernet0/2    192.168.190.235 YES    unset  up         	up
GigabitEthernet0/3    unassigned      YES    unset  up         	up
GigabitEthernet0/4    192.168.191.2   YES    unset  up         	up
TenGigabitEthernet2/1 unassigned      YES    unset  up         	up
TenGigabitEthernet2/2 unassigned      YES    unset  up         	up
TenGigabitEthernet2/3 unassigned      YES    unset  up         	up
TenGigabitEthernet2/4 192.168.192.2   YES    unset  down       	down
GigabitEthernet36/1   unassigned      YES    unset  down        down
GigabitEthernet36/2   unassigned      YES    unset  down        down
GigabitEthernet36/11  unassigned      YES    unset  down       	down
GigabitEthernet36/25  unassigned      YES    unset  down       	down
Te36/45               unassigned      YES    unset  down       	down
Te36/46               unassigned      YES    unset  down       	down
Te36/47               unassigned      YES    unset  down       	down
Te36/48               unassigned      YES    unset  down       	down
Virtual36             unassigned      YES    unset  up         	up"""

In [221]:
match = re.findall('(\S+\d+/+\d+.* \d+\.\d+\.\d+\.\d+) .*up.*up', sh_ip_int_br)
print(match)

['GigabitEthernet0/2    192.168.190.235', 'GigabitEthernet0/4    192.168.191.2']
