# Regular Expressions in Python

### First, we import the library

In [1]:
import re

### Creating patterns

In [2]:
pattern = re.compile('\$(.*?)\$')
pattern

re.compile(r'\$(.*?)\$', re.UNICODE)

### Searching for patterns in strings


In [3]:
str = "Mathematicians like to go from $\\alpha$ to $\\zeta$"
str

'Mathematicians like to go from $\\alpha$ to $\\zeta$'

In [4]:
x = pattern.findall(str)
print(x)

['\\alpha', '\\zeta']


 ### What type is it?

In [5]:
y = pattern.search(str)
print(y)

<re.Match object; span=(31, 39), match='$\\alpha$'>


### Searching for patterns in strings


In [6]:
z = pattern.match(str)
print(z)

None


This looks for an <b> exact match </b>

In [21]:
z = re.match("\$(.*?)\$\b","$x=y$ eqeqeqe")
print(z)

None


### Search and Replace

In [8]:
n_str = pattern.sub('beta',str)
print(n_str)

Mathematicians like to go from beta to beta


Missing backslash

In [9]:
n_str = pattern.sub('\beta',str)
print(n_str)

Mathematicians like to go frometa toeta


In [10]:
n_str = pattern.sub(r"\\beta",str)
print(n_str)

Mathematicians like to go from \beta to \beta


r means "raw"  - it means it does not interpret backslashes (aside from escape characters)

In [11]:
n_str = pattern.sub(r"\beta",str)
print(n_str)

Mathematicians like to go frometa toeta


### How do we use it?

Let's start with a string

In [12]:
string = 'May the Fourth be with you!'

### Examples

In [13]:
re.findall('Fourth',string)

['Fourth']

### How about all three letter words?

In [14]:
re.findall('[A-Za-z]{3}',string)

['May', 'the', 'Fou', 'rth', 'wit', 'you']

### Not quite

In [15]:
re.findall('\\b[A-Za-z]{3}\\b',string)

['May', 'the', 'you']

the \\b means boundaries of words

For line by line search, there are easier ways to proceed.

## How  do we read in longer text?

### Three things - open a file, read it into a sting

In [16]:
f = open('activity_log.txt','r') # open
text = f.read()        # read
f.close()              # close file
print(text)            # check what we get

Day Year UniqueID Activity
May 1, 1596 ujS2DVVOG8 Studying
April 10, 1697 m9JzMIOv2m Reading
August 5, 1544 oTBi4MDxTg Studying
July 17, 1843 n9evYzmdQK MATH!
September 4, 2004 AaHaOBb8vk Reading
October 4, 1554 HbJvtcTVyk Reading
January 3, 1946 PopmqPkUBG Weightlifting
August 23, 1952 9C6XpNDrLO Reading
November 28, 1533 1ayB1SIPUi Programming
September 9, 2016 VjzgHNSWj3 Studying
August 26, 1967 LyLJORwD0O Weightlifting
April 12, 1540 0rtC7y7AVW Weightlifting
May 27, 1781 WpIycPc7qQ Reading
October 28, 1534 dths1p1G4l Swimming
October 13, 1783 g5yWITccms Programming
August 0, 1691 dMzJUwf4NG Weightlifting
January 13, 1559 ARpj40WycE Studying
July 4, 1521 s62xPkf9jV Swimming
August 2, 1668 WKRryzNvpE Reading
May 26, 1904 ux3diRCZYw Programming
August 16, 1550 r2V6MSFIPR MATH!
September 16, 1634 u1SqgFZocb Programming
June 20, 1885 4LGArkpSZR MATH!
May 6, 1668 QF3FxXNvKv MATH!
February 26, 1802 bo4ZhzMsUU Weightlifting
January 27, 1934 B9ApcbhPE2 Weightlifting
October 10, 1671 SbOkMY2

### Lets Count Number of Activities in October

In [17]:
matches = re.findall('October',text)
print(matches)
len(matches)

['October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October', 'October']


60

re is useful for finding matches -- for searching for things there is grep.

## From Here

- There is much more functionality to re
- Use the tool which seems most comfortable
- You should know how to build some basic regular expressions
- Many editors support it - VS Code