# Practicing with regex


## Links

source of the tutorial: https://automatetheboringstuff.com/2e/chapter7/

official docs of "re" module https://docs.python.org/3.8/library/re.html# with examples:

- checking for a pair
- making a phone book
- text mugling (mixing letters)
- find all adverbs
- writing a tokenizer

## Raw string and why it is used in "re" module

raw string (r"") is used for a backslash sign to be a backslash character and not
an escaping character.

## Special characters

In regular expressions, the following characters have special meanings (if wished to use them in a string, they need to be escaped with a backslash sign "\"):

.  ^  $  *  +  ?  {  }  [  ]  \  |  (  )


## Special sequences:

\d - digit (any single numeral from 0 to 9)

see a cheatsheet on regex here https://pythex.org/

## Quantifiers:

* "*"	    0 or more (append ? for non-greedy)
* "+"	    1 or more (append ? for non-greedy)
* "?"	    0 or 1 (append ? for non-greedy)
* "{m}"	    exactly mm occurrences
* "{m, n}"	from m to n. m defaults to 0, n to infinity
* "{m, n}?"	from m to n, as few as possible

## search() vs match()

Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string (this is what Perl does by default).





## Creating regex object

All the regex functions in Python are in the re module. 

Documentation of re module: https://docs.python.org/3.8/library/re.html

Passing a string value representing your regular expression to re.compile() returns a Regex pattern object (or simply, a Regex object).

## Matching Regex Objects

A Regex object’s search() method searches the string it is passed for any matches to the regex. The search() method will return None if the regex pattern is not found in the string. If the pattern is found, the search() method returns a Match object, which have a group() method that will return the actual matched text from the searched string.


In [20]:
import re

phoneNumRegex = re.compile(r"\d\d\d-\d\d\d-\d\d\d\d")

# mo stands for a match object
# match() and search() methods return None if there is no match
# use if statement to proceed further with a match
mo = phoneNumRegex.search('My number is 415-555-4242.')
if mo:
    print('Phone number found: ' + mo.group())

Phone number found: 415-555-4242


## Named groups

syntax:  (?P<group_name>regexp)

Note: P might stand for placeholder. More on this thread https://stackoverflow.com/questions/10059673/named-regular-expression-group-pgroup-nameregexp-what-does-p-stand-for



In [11]:
m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
print(m.groupdict())
print(m.groups())
print(m.group("first_name"))
print(m.group("last_name"))

{'first_name': 'Malcolm', 'last_name': 'Reynolds'}
('Malcolm', 'Reynolds')
Malcolm
Reynolds


## Delete text from the string (e.g from mail)

In [29]:
email = "tony@tiremove_thisger.net"
m = re.search("remove_this", email)
print(m.start())
print(m.end())
print(f"{email[:m.start()]}{email[m.end():]}")

7
18
tony@tiger.net


## Making a phone book example

In [59]:
text = """Ross McFluff: 834.345.1254 155 Elm Street

Ronald Heathmore: 892.345.3428 436 Finley Avenue
Frank Burger: 925.541.7625 662 South Dogwood Way
  

Heather Albrecht: 548.326.4584 919 Park Place"""

entries = re.split("\n+\s*", text)

splitted_entries = [re.split(":? ", entry, 3) for entry in entries]

phone_num_regex = re.compile(r"^(\d{3})\.(\d{3})\.(\d+)$")

for entry in splitted_entries:
    phone_num = entry[2]
    last_numbers = phone_num_regex.search(phone_num).group(3)
    print(last_numbers)
    
    



1254
3428
7625
4584
