# Regular Expressions

Useful for finding things in a piece of  text.

In [1]:
string = 'search inside of this text, if you please'

print('search' in string)

True


RegEx goes beyond this.

- how many times does `search` occur?
- complicated strings

In [2]:
import re

string = 'search inside of this text, if you please'

print('search' in string)

print(re.search('this', string))

True
<re.Match object; span=(17, 21), match='this'>


Here, we get the Match object, telling us more about `'this'` than with a simple Boolean. What can we do with it?

In [5]:
import re

string = 'search inside of this text, if of this you please'

a = re.search('this', string)
print(a.span())

(17, 21)


Tells the occurrence of the string as a tuple.

In [7]:
print(a.start()) # When the text starts

17


In [8]:
print(a.end()) # Were it ends

21


In [10]:
print(a.group())

this


^^^ Returns the part of the string where there __was__ the match. It only gives the one instance of `'this'`.

`group()` is good for doing multiple searches.

In [11]:
a = re.search('THis', string)
print(a.group())

AttributeError: 'NoneType' object has no attribute 'group'

`a` is going to be `none` here. If the regex doesn't find anything, it returns `none`.

Let's create a pattern to check for, using `compile()`

In [17]:
import re

pattern = re.compile('this')

string = 'search inside of this text, if of this you please'

a = pattern.search(string)
b = pattern.findall(string)
c = pattern.fullmatch(string) # has to be the exact string that we're searching
print(a.group())
print(b)
print(c)

this
['this', 'this']
None


If all the characters of the `fullmatch()` corresponded, we'd get a `Match object`:

In [21]:
import re

pattern = re.compile('search inside of this text, if of this you please')

string = 'search inside of this text, if of this you please'

a = pattern.search(string)
b = pattern.findall(string)
c = pattern.fullmatch(string) # has to be the exact string that we're searching

print(c)

<re.Match object; span=(0, 49), match='search inside of this text, if of this you please>


If we add a question mark at the end of the pattern, it would still return a match because the entire string is found within the pattern:

In [23]:
pattern = re.compile('search inside of this text, if of this you please?')
print(c)

<re.Match object; span=(0, 49), match='search inside of this text, if of this you please>


Another option is `match()`

In [25]:
import re

pattern = re.compile('search inside of this text, if of this you please!')

string = 'search inside of this text, if of this you please! Hey, Bubba!'

a = pattern.search(string)
b = pattern.findall(string)
c = pattern.fullmatch(string) # has to be the exact string that we're searching
d = pattern.match(string)
print(d)

<re.Match object; span=(0, 50), match='search inside of this text, if of this you please>


We still get a `Match object`, but if we `print(c)`:

In [26]:
print(c) # We only want an exact match

None
