# Regular Expressions and Text
The primary goal of this practice will be to engage with an use regular expressions. Text processing--particularly with the support of regexes (regular expressions)--is one of Python's strong suits. Not to mention, that as of Python 3.6 (the version you downloaded), regexes are faster than ever.

**If you are not familiar with regular expressions or feel a bit weak in that regard, try a gentle exercise-oriented introduction like: https://regexone.com/lesson/introduction_abcs**

## For the Regex Expert
If, on the other hand, you're a regex pro, make sure you understand how regular expressions work (e.g., regular languages, formal language theory, etc.)--always have a decent grasp of what's going on at the level below which you actually work. :) Check out these resources:
* https://en.wikipedia.org/wiki/Regular_expression
* https://softwareengineering.stackexchange.com/a/122469
* http://www.moserware.com/2009/03/how-net-regular-expressions-really-work.html <- C#, but still helpful

## Homework

When using regular expressions, most of the time, we want to start by compiling a pattern. This has the advantage of:
* Making our code more maintainable
* Telling Python to keep track of the compiled regex

## Task: Create a List of All Area Codes in a Text Block
We're going to take some small steps to build up a regular expression.

If you get stuck, copy the regex and the text into regexr.com (or your favorite regex site!).

In [None]:
import re  # import re

In [None]:
# Step 1: Create a regex to match a single number
text = '4'
pat = re.compile(r'\d')
# match tries to match the entire string
# starting from the beginning
m = pat.match(text)  
m  # you should get a result containing the number

In [None]:
# Step 2: Generalize to any three numbers
texts = ['458', '324', '123', '444']
pat = re.compile(r'\d+')  # use the digit capture
for text in texts:
    m = pat.match(text)
    print(m.group())
    assert len(m.group()) == 3  # make sure there are only 3 characters

In [None]:
# Step 3: Only capture first three numbers
# You can specify the number you expect rather than
#  being vague with the `+` or `*`.
# r't{2,4}'  # capture 2, 3, or 4 't's
texts = ['4582', '324', '123405', '444-33']
pat = re.compile(r'\d{3}')  # specify number using `{#}`
for text in texts:
    m = pat.match(text)
    print(m.group())
    assert len(m.group()) == 3

In [None]:
# Step 4: Use capturing parentheses
# every () combo captures what's inside
# to do a non-capturing parenthesis, use (?:)
# Build a regex for entire expression, but capture
#   the first three values
# Use \D for non-digit, use \D? for optional non-digit
texts = [
    '4582112312', 
    '324 231 2234',
    '123 405 4999', 
    '444-332-2311']
pat = re.compile(r'(\d{3})\D?\d{3}\D?\d{4}')  
for text in texts:
    m = pat.match(text)
    print(m.group(1))  # group 1, so need to capture
    assert len(m.group(1)) == 3

In [None]:
# Step 5: Account for more variation
# We're now going to need to account for escaping characters
#  that have special meaning in the regex language: +, (, and ).
# To escape a character that has special meaning, but a
#  slash in front of it. This tells the regex compiler to 
#  treat the following symbol literally.
# Other hints:
#   * Don't forget to account for the optional spaces
#        using \s* or \D*
#   * I would account for the ( and ) directly since these
#        are common features of American telephone numbers.
texts = [
    '4582112312', 
    '+1 324 231 2234', # you want to skip the country code
        # to escape the + sign, use \+
    '(123) 405 4999', 
        # to escape the ( and ), use \( and \)
    '444-332-2311']
pat = re.compile(r'(?:\+\d{,3}\D*)?\(?(\d{3})\)?\D?\d{3}\D?\d{4}')  
for text in texts:
    m = pat.match(text)
    print(m.group(1))
    assert len(m.group(1)) == 3

In [None]:
# Step 6: Awesomeness
# Let's use the regex you built!
text = '''
Apple Online Store
Apple.com is a convenient place to purchase Apple products and accessories from Apple and other manufacturers. You can buy online, chat, or call (800) MY–APPLE (800–692–7753), 7 days a week from 7:00 a.m. to 11:00 p.m. Central time.

You can get information about an order you placed on the Apple Online Store through the Order Status page. If you prefer, you can also get order status or make changes by phone at (800) 692–7753, 7 days a week from 7:00 a.m. to 11:00 p.m. Central time.

Apple Retail Stores
Experience the digital lifestyle at any of the Apple Retail Stores around the country. Find store hours and contact information for all locations.

2656 NE University Village Street
Seattle, WA 98105
(206) 892-0433

213 Bellevue Square
Bellevue, WA 98004
(425) 519-0080

3000 184th Street S.W.
Lynnwood, WA 98037
(425) 921-1560

300 Post Street
San Francisco, CA 94108
(415) 486-4800

Apple Stonestown
3251 20th Avenue
San Francisco, CA 94132

3251 20th Avenue
San Francisco, CA 94132
(415) 571-2780

2125 Chestnut Street
San Francisco, CA 94123
(415) 848-4445

6455 Macleod Trail SW
Calgary, Alberta T2H 0K8
(403) 444-3759

3625 Shaganappi Trail NW
Calgary, Alberta T3A 0E2
(403) 648-4865

5015 111 St
Edmonton, Alberta T6H 4M6
(780) 801-3820

320 4700 Kingsway
Burnaby, British Columbia V5H 4J2
(778) 373-4810

Richmond, Richmond Centre
6551 No. 3 Road
Richmond, British Columbia V6Y 2B6
(604) 248-3940

1485 Portage Avenue
Winnipeg, Manitoba R3G 0W4
(204) 777-4500

7001 Mumford Road
Halifax, Nova Scotia B3L 4N9
(902) 442-3495

5000 Highway 7 East
Markham, Ontario L3R 4M9
(905) 513-2860

100 City Centre Drive
Mississauga, Ontario L5B 2C9
(905) 366-0580

Ottawa, Bayshore Shopping Centre
100 Bayshore Drive
Ottawa, Ontario K2B 8C1
(613) 288-7950

Ottawa, Rideau
50 Rideau Street
Ottawa, Ontario K1N 9J7
(613) 688-5575

Toronto, Eaton Centre
220 Yonge Street
Toronto, Ontario M5B 2H1
(647) 258-0801

Toronto, Fairview
1800 Sheppard Avenue East
Toronto, Ontario M2J 5A7
(416) 646-4412

Waterloo, Conestoga
550 King Street North
Waterloo, Ontario N2L 5W6
(519) 772-5150

Brossard, DIX30
9120 boul. Leduc
Brossard, Quebec J4Y 0L3
(450) 618-1400

Laval, Carrefour Laval
3035, boulevard Le Carrefour, local C14B
Laval, Quebec H7T 1C8
(450) 902-4400

Montreal, Sainte-Catherine
1321 Rue Ste-Catherine Ouest
Montreal, Quebec H3G 1P7
(514) 906-8400

Pointe-Claire, Fairview Pointe-Claire
6801, Transcanada Highway
Pointe-Claire, Quebec H9R 5J2
(514) 630-8800

Quebec City, Place Ste-Foy
2450 Boulevard Laurier
Quebec City, Quebec G1V 2L1
(418) 266-8600

17711 Chenal Parkway
Little Rock, AR 72223
(501) 821-5130

8687 North Central Expressway
Dallas, TX 75225
(214) 765-0820

8687 North Central Expressway
Dallas, TX 75225
(214) 765-0820

3401 Nicholasville Road
Lexington, KY 40503
(859) 971-5400

4305 La Jolla Village Drive
San Diego, CA 92122
(858) 795-6870

Get Financing for You, Your Business, or Your School
Apple Financial Services offers financing on Apple products for consumers, educational institutions, and businesses. Speak with your Apple representative to learn more.

Find Consultants
Visit our Apple Consultants Network page to find a consultant in the U.S. or Canada.

Find Authorized Training Centers
Use our Training Center Locator to find Apple Authorized Training Centers worldwide.

How to Buy for Business
If you are a business or professional user, visit the Apple Store for Business or call 1–800–854–3680, 7 days a week from 7:00 a.m. to 7:00 p.m. Central time.

Corporate and Government Sales:

Apple Enterprise Sales (877) 412–7753
Apple Government Sales (877) 412–7753
How to Buy for Education
If you are a student or teacher, visit the Apple Store for Education or call 1–800–692–7753, 7 days a week from 7:00 a.m. to 10:00 p.m. Central time.

If you are buying on behalf of an educational institution, visit the Apple Store for Education Institutions or call 1–800–800–2775, 7 days a week from 9:00 a.m. to 6:00 p.m. Central time.

Find Apple Authorized Resellers
Use our Reseller Locator to find an Apple Authorized Reseller in the U.S.

Apple Authorized Resellers offer industry expertise, multi-platform services, and Mac-based solutions for a wide variety of organizations.
'''
for match in pat.finditer(text):
    print(match.group(1))

Step 7: How many different area codes did you find? (No, don't count by hand, fix the for-in loop!)

Step 7b: What was the most common code you found?
* Check out collections.Counter: https://docs.python.org/3.6/library/collections.html#collections.Counter
    * It takes an iterable/list, so you can create one of those and pass it in
    * Or, you can treat it like a dictionary c[key] += 1

In [None]:
# Step 7
results = set()  # sets will only keep unique elements
for match in pat.finditer(text):
    results.add(match.group(1))
len(results)

In [None]:
from collections import Counter
c = Counter()
for match in pat.finditer(text):
    c[match.group(1)] += 1
c.most_common()

In [None]:
results = []
for match in pat.finditer(text):
    results.append(match.group(1))
c = Counter(results)
c.most_common()

# Extra Credit
Can you capture the street addresses too? How would you approach that?

* `^` for beginning of line and `$` for end of line might help
    * if you use either of these, include re.MULTILINE after the regex like this 
    * you can play with multline (and other flags) on regexr: look for "Flags" at top right
    
```
re.compile(r'^...', re.MULTILINE)
```


In [None]:
pat = re.compile(r'^\W*?(\d{2,},?(?:\s+\w+){0,5})\W*(?:\w+\W)\w+,\W\w+\W(\w{3}\W\w{3}|\d{5})',
                re.MULTILINE)
for match in pat.finditer(text):
    print(match.group(0))