# Dis 3: Regular Expression Tutorial

# 0. Important Dates
1. Most of the code examples can all be found in [this link](https://www.datacamp.com/community/tutorials/python-regular-expression-tutorial#REinPython).Please look at it!
1. A great website containing some regular expression cheatsheets: [the link](https://pythex.org/).
1. Some inspiration: how to implement [chatbot](https://apps.worldwritable.com/tutorials/chatbot/)? 
1. The most efficient way to learn regular expression is to play it with yourself.

# 1. Regular Expression    
Matching certain patterns of a text, a string...   

In [1]:
import re

## Basic Patterns: Ordinary Characters

### r'\[text\]'
<big>r before the Cookie is raw string literal.   
some characters have special meanings in programming language like python:   
For example, \ is just a backslash when prefixed with a r rather than being interpreted as an escape sequence.   <\big>

In [19]:
pattern = r"Cookie"
sequence = "Cookie"

if re.match(pattern, sequence):
  print("Match!")
else: print("Not a match!")

Match!


### Wild Card Character: Special Characters

### '.' (the dot)
<big>matches any single character except newline character, i.e., '\n'   
return the value in the latter that matches by the former one.   <\big>

In [14]:
re.search(r'Co.k.e', 'Coik_e').group()

'Coik_e'

In [21]:
re.search(r'windows.0', 'windows10').group()

'windows10'

### \d   and \w     
* <big>\d mathces a digit number   
* \w Lowercase w matches any single letter, digit or underscore. 

In [50]:
print(re.match(r'00\d', '00A'))

re.search(r'00\d', '007').group()

None


'007'

### '^'    
<big>Matches a pattern at the start of the string. 

In [16]:
re.search(r'^Ch', 'Chocolate Cookie').group()

'Ch'

### $   
Matches a pattern at the end of string.

In [15]:
re.search(r'\d$', 'Cookie10').group()

'0'

In [33]:
re.search(r'cake$', 'Eat cake').group()

'cake'

### [abc] - Matches a or b or c.

In [53]:
re.search(r'c[abc]ke', 'cake').group()

'cake'

In [6]:
re.search(r'c[abc]ke', 'cabcke').group()

AttributeError: 'NoneType' object has no attribute 'group'

### [a-zA-Z0-9] - Matches any letter from (a to z) or (A to Z) or (0 to 9). 

In [17]:
re.search(r'Number: [\w\d]', 'Number: a0').group()

'Number: a'

### [^...]    
<big>Characters that are not within a range can be matched by complementing the set.   
If the first character of the set is ^, all the characters that are not in the set will be matched.   
Matches any character except 5.   

In [55]:
re.search(r'Number: [^12345]', 'Number: 0').group()

'Number: 0'

In [63]:
re.search(r'Number: [^1]', 'Number: 1').group()

AttributeError: 'NoneType' object has no attribute 'group'

## Quantifiers

One of the most important one: $*$   
Checks for zero or more characters to its left.

In [21]:
# Checks for any occurrence of a or o or both in the given sequence
re.search(r'Ca*o*kie', 'Cokie').group()

'Cokie'

In [13]:
# Checks for exactly zero or one occurrence of a or o or both in the given sequence
re.search(r'Colou?r', 'Color').group()

'Color'

## Match() and Group()

In [3]:
#define if this is a valid email address
def is_valid_email(addr):
    return re.match(r'([\w\.-]+)@([\w\.-]+)',addr)

assert is_valid_email('someone@gmail.com')
assert is_valid_email('bill.gates@microsoft.com')
assert not is_valid_email('bob#example.com')
print('ok')

ok


In [36]:
#how to use group: a regular expression pattern bounded by parenthesis()
email_address = 'Please contact us at: support@datacamp.com'
match = re.search(r'([\w\.-]+)@([\w\.-]+)',email_address)
if match:
  print(match.group()) # The whole matched text
  print(match.group(1)) # The username (group 1)
  print(match.group(2)) # The host (group 2)

support@datacamp.com
support
datacamp.com


# 2. Radio Ga Ga: simple ELIZA implementation   
tell the machine to repeat lyrics in Queen's famous Radio Ga Ga :)

In [2]:
import random
# when you choose a random response from a 'reponse' list, use this module.
# I didn't use it here.

In [3]:
lyrics = [[r'\wll we hear is(.*)',
     'Radio Ga Ga'],[r'\wama I just(.*)',
     'Sorry this sounds like Bohemian Rhapsody']]

In [4]:
def match(sing):
    for word, response in lyrics:
        match = re.match(word, sing)
        if match:
            return response
        
def main():
    print('This is Queen\'s Radio Ga Ga')
 
    while True:
        sing = input('ELIZA: ')
        print(match(sing))
        
        if sing == 'quit':
            print('Bye')
            break
 
 
if __name__ == "__main__":
    main()

This is Queen's Radio Ga Ga
ELIZA: all we hear is
Radio Ga Ga
ELIZA: quit
None
Bye


In [11]:
string = 'mask?'
string.rstrip('[.?!]')

'mask'