# Regular Expressions in Python

### First, we import the library

In [1]:
import re

### Creating patterns

In [2]:
pattern = re.compile('\$(.*?)\$')
pattern

re.compile(r'\$(.*?)\$', re.UNICODE)

### Searching for patterns in strings


In [3]:
str = "Mathematicians like to go from $\\alpha$ to $\\zeta$"
str

'Mathematicians like to go from $\\alpha$ to $\\zeta$'

In [4]:
x = pattern.findall(str)
print(x)

['\\alpha', '\\zeta']


 ### What type is it?

In [5]:
y = pattern.search(str)
print(y)

<re.Match object; span=(31, 39), match='$\\alpha$'>


### Searching for patterns in strings


In [8]:
z = pattern.match(str)
print(z)

None


This looks for an <b> exact match </b>

In [9]:
z = re.match("\$(.*?)\$","$x=y$")
print(z)

<re.Match object; span=(0, 5), match='$x=y$'>


### Search and Replace

In [10]:
n_str = pattern.sub('beta',str)
print(n_str)

Mathematicians like to go from beta to beta


Missing backslash

In [11]:
n_str = pattern.sub('\beta',str)
print(n_str)

Mathematicians like to go from eta to eta


In [12]:
n_str = pattern.sub(r"\\beta",str)
print(n_str)

Mathematicians like to go from \beta to \beta


r means "raw"  - it means it does not interpret backslashes (aside from escape characters)

In [13]:
n_str = pattern.sub(r"\beta",str)
print(n_str)

Mathematicians like to go from eta to eta


### How do we use it?

Let's start with a string

In [14]:
string = 'May the Fourth be with you!'

### Examples

In [15]:
re.findall('Fourth',string)

['Fourth']

### How about all three letter words?

In [16]:
re.findall('[A-Za-z]{3}',string)

['May', 'the', 'Fou', 'rth', 'wit', 'you']

### Not quite

In [17]:
re.findall('\\b[A-Za-z]{3}\\b',string)

['May', 'the', 'you']

the \\b means boundaries of words

For line by line search, there are easier ways to proceed.

## How  do we read in longer text?

### Three things - open a file, read it into a sting

In [None]:
f = open('activity_log.txt','r') # open
text = f.read()        # read
f.close()              # close file
print(text)            # check what we get

### Lets Count Number of Activities in October

In [None]:
matches = re.findall('October',text)
print(matches)
len(matches)

re is useful for finding matches -- for searching for things there is grep.

## From Here

- There is much more functionality to re
- Use the tool which seems most comfortable
- You should know how to build some basic regular expressions
- Many editors support it - VS Code