# Python standard library: re
Need some help finding the right regex? Use a regex tool online, for example: https://regexr.com/ 

Check which packages / libraries are available on your system?

In [7]:
help ('modules')


Please wait a moment while I gather a list of all available modules...

IPython             bz2                 msvcrt              sysconfig
__future__          cProfile            multiprocessing     tabnanny
_abc                calendar            netbios             tarfile
_aix_support        cgi                 netrc               telnetlib
_ast                cgitb               nntplib             tempfile
_asyncio            chunk               normalizeForInterpreter testlauncher
_bisect             cmath               nt                  tests
_blake2             cmd                 ntpath              textwrap
_bootlocale         code                ntsecuritycon       this
_bootsubprocess     codecs              nturl2path          threading
_bz2                codeop              numbers             time
_codecs             collections         numpy               timeit
_codecs_cn          colorama            odbc                timer
_codecs_hk          colorsys        

## Python prefers it raw

In strings, the backslash character (\) is used to escape characters that have special meaning, such as a newline, a quotation mark or the backslash itself. When we prefix a string with the letter `r' or `R', Python will threat the backslash as a literal character, very usefull in handling regular expressions.


In [8]:
string = 'Hi\nHello\n'
print(string)

raw_string = R'Hi\nHello'
print(raw_string)

Hi
Hello

Hi\nHello


## Who seeks shall find
The __search__ method scans through a string looking for the first location where the regex pattern produces a match. It returns a corresponding MatchObject or None if nothing was found. The MatchObject has properties and methods used to retrieve information about the search:
- span() returns a tuple containing the start-, and end positions of the match.
- group() returns the part of the string where there was a match

The __findall__ method returns a list containing all matches.





In [34]:
import re

string = "There are 10 kinds of people, those who understand binary and those who don't! Are you 1 of the 1st group?"

#finding all digits
result = re.search(r'\d', string)
print("First occurence = " + result.group())
print("Location of match = " + str(result.span()))

result = re.findall(r'\d', string)
print(result)
#finding all numbers
result = re.findall(r'\d+', string)
print(result)

#many ways to regex
result = re.findall('[0-9]', string)
print(result)


First occurence = 1
Location of match = (10, 11)
['1', '0', '1', '1']
['10', '1', '1']
['1', '0', '1', '1']


## To compile or not to compile?
The __compile__ method compiles a regular expression into a pattern object that can be easily reused. 

The previous example generates the exact same output as the following, but using re.compile is more efficient when the expression is used multiple times in one program.

In [36]:
import re

string = "There are 10 kinds of people, those who understand binary and those who don't! Are you 1 of the 1st group?"

#finding all digits
regex_object = re.compile(r'\d')
result = regex_object.search(string)
print("First occurence = " + result.group())
result = regex_object.findall(string)
print(result)

result = regex_object.findall("My number is 0456 12 45 78")
print(result)


First occurence = 1
['1', '0', '1', '1']
['0', '4', '5', '6', '1', '2', '4', '5', '7', '8']


## We've got a match!
What about the method __match__? It only searches for a match in the beginning of a string!

In [40]:
import re

string = "There are 10 kinds of people, those who understand binary and those who don't! Are you 1 of the 1st group?"

#it will not find a digit... because the string doesn't start with a number...
regex_object = re.compile(r'\d')
print(regex_object.match(string))
regex_object = re.compile(r'\d')
print(regex_object.search(string))


None
<re.Match object; span=(10, 11), match='1'>


## You need a match to split
The __split__ returns a list where the string has been split at each match


In [45]:
import re

string = "There are 10 kinds of people, those who understand binary and those who don't! Are you 1 of the 1st group?"
#finding all digits
regex_object = re.compile(r'\d')
result = regex_object.split(string)
print(result)

['There are ', '', ' kinds of people, those who understand binary and those who dont! Are you ', ' of the ', 'st group?']


## Replace or be replaced, substitute

The method __sub__ replaces one or many matches with a string

In [46]:
import re

string = "There are 10 kinds of people, those who understand binary and those who don't! Are you 1 of the 1st group?"

#finding all digits
regex_object = re.compile(r'\d')
result = regex_object.sub("DIGIT", string)
print(result)

There are DIGITDIGIT kinds of people, those who understand binary and those who don't! Are you DIGIT of the DIGITst group?


## Practice makes perfect

Try to find:
- all numbers (not the individual digits)
- all numbers starting a string
- alle numbers ending a string
- all numbers surrounded by alfabethical characters
- all strings containing only alfabethical characters
- all strings containing only only digits
- all strings containing only consonants
- all strings containing only vowels
- all strings consisting of the letters of lorem: lorem, merol, moerl...
- all strings consisting of the letters of a given word (use input)
- can you find more?


In [None]:
import re

string = "Pleff lorem monaq morel plaff lerom baple merol pliff ipsum ponaq mipsu ploff pimsu caple supim pluff sumip qon4q issum daple ussum 100aq ossom fap25 abcde tonaq fghij gaple klmno vonaq pqrst haple uvwxy nonaq zzzz. Laple pleff lorem monaq mo5el plaff sumip qonaq issum daple ussum ponaq gaple klm50 lorem monaq morel plaff lerom baple merol pliff ipsum ponaq mipsu ploff 12345 caple supim pluff sumip qonaq issum daple ussum ronaq ossom faple abc75 tonaq fghij gaple klmno vonaq pqrst haple uvwxy nonaq zzzzz laple pleff lorem monaq morel plaff sumip qonaq issum daple ussum ponaq gapl."

##Go GO GO 
