# Regex Character Classes and the _findall()_ Method

In [3]:
import re

In [4]:
phoneRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
phoneRegex

re.compile(r'\d\d\d-\d\d\d-\d\d\d\d', re.UNICODE)

## The _findall()_ Method

We used the _regex.search()_ function to find the first match of a regular expression.

With the _regex.findall()_ function we can search a string for all instances of a regular expression.

With _phoneRegex_ we are trying to find all instances of a Northern American phone number in the format: _123-456-7890_

In [5]:
resume = '''
Jesse Kendall
123 Elm Street, Iowa City, IA 52241
H: 319-555-5555, C: 319-444-4444, jkendall@notmail.com
DENTAL HYGIENIST
Dedicated and dynamic Dental Hygienist with extensive experience providing expert dental treatment.
Exceptional patient skills focused on oral hygiene treatment and care as well as preventative management.
Empathetic professional capable of reassuring and alleviating patient fears. Enthusiastic about dental
health; enjoy educating patients on dental care and hygiene.
AREAS OF EXPERTISE
• Preventative Care
• Protective Sealants
• Disease Exam/Screening
• Dental Charting
• Oral Cavity Exam
• Self-Care Programs
• Stain Removal
• Patient Management
• Nutrition Counseling
EDUCATION
XYZ COMMUNITY COLLEGE, Iowa City, Iowa
Associate of Science, Registered Dental Hygienist, Certified Dental Assistant
PROFESSIONAL EXPERIENCE
SMILE IOWA, Iowa City, Iowa, 20xx-Present
Dental Hygienist, Family and Cosmetic Dentistry
Provide exceptional oral hygiene care to a variety of patients. Conduct nutritional and personal hygiene
counseling as well as educate patients on preventative care to ensure health and wellness. Call patients to
schedule recall appointments.
• Care for each patient with personal attention while performing prophylaxis/dental cleanings and
routine oral exams.
• Instructed patients on the best methods for practicing oral hygiene.
• Devised and implemented customized personal treatment plans.
• Delivered fluoride applications.
• Examined patients and took intra-oral X-rays.
JOHN DOWE, DMD, Iowa City, Iowa, 20xx-20xx
Dental Hygienist, Family and Cosmetic Dentistry
Delivered expert oral hygiene care to each patient. Provided patients with preventative care techniques to
ensure optimal dental health and personal well-being.
• Cared for a minimum of eight patients per day.
• Treated some of the same patients for 25 years, and retained the same patients after the dentist
retired.
• Worked collaboratively with patients to develop personalized treatment plans.
• Partnered with the dentist and patients in implementing their dental treatments and continued
dental care.
JOHN SMITH, DMD, Iowa City, Iowa, 20xx-20xx
Dental Assistant
Provided diligent dental and hygiene assistance during surgery, services, and cleanings. Maintained
equipment and sterilized dental instruments. Answered phone calls, scheduled appointments, and charted
dental records. Trained new employees on back office duties.
'''

In [6]:
phoneRegex.search(resume)

<re.Match object; span=(54, 66), match='319-555-5555'>

In [7]:
phoneRegex.findall(resume)

['319-555-5555', '319-444-4444']

When regex objections have one or less groups (parenthesis), then the findall() method will return any matching strings in a list, as per the above example.

We can show what the return on regular expressions when there are two groups, by grouping the set of the first three digits and the latter seven digits:

In [8]:
phoneRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
phoneRegex.findall(resume)

[('319', '555-5555'), ('319', '444-4444')]

As we have two groups for this regular expression, the _findall()_ returns a list of tuples of strings.

We also are missing the seperator between the two groups because there is no string that contains this.

To return a string that does contain this seperator, we can put brackets around the whole expression and it it will now return three strings in the tuple:

In [9]:
phoneRegex = re.compile(r'((\d\d\d)-(\d\d\d-\d\d\d\d))')
phoneRegex.findall(resume)

[('319-555-5555', '319', '555-5555'), ('319-444-4444', '319', '444-4444')]

## Character Classes

The character classes include:

| Shorthand character class | Represents|
| :-: | :- |
| \d | Any numeric **d** igit from 0 to 9 |
| \D | Any character that is _not_ a numberic digit from 0 to 9 |
| \w | Any letter, numeric digit, or the underscore character. <br>(Think of this a matching "**w** ord" characters.) |
| \W | Any character that is not a letter, numeric digit, or the underscore character |
| \s | Any white space character: space, tab, or newline.<br>(Think of this as matching "**s** pace" characters.)
| \S | Any character that is _not_ a space, tab, or newline.


In [11]:
'''
12 Days of Christmas
'''

lyrics = '''
12 drummers drumming
11 pipers piping
10 lords a-leaping
9 ladies dancing
8 maids a-milking
7 swans a-swimming
6 geese a-laying
5 golden rings (five golden rings)
4 calling birds
3 French hens
2 turtle-doves
And 1 partridge in a pear tree
And a partridge in a pear tree
'''
xmasRegex = re.compile(r'\d+\s\w+')
xmasRegex.findall(lyrics)

['12 drummers drumming',
 '11 pipers piping',
 '10 lords a',
 '9 ladies dancing',
 '8 maids a',
 '7 swans a',
 '6 geese a',
 '5 golden rings',
 '4 calling birds',
 '3 French hens',
 '1 partridge in']

In [18]:
'''
We can use square brackets to create a custom character class.
Below, we can use either regular expression:
'''
vowelRegex = re.compile(r'(a|e|i|o|u)')
vowelRegex = re.compile(r'[aeiou]')

'''
We can use a dash to do a range i.e. [a-z] [a-dA-D] [0-6] etc.
'''
vowelRegex = re.compile(r'[aeiouAEIOU]')
vowelRegex.findall('Robocop eats baby food.')

['o', 'o', 'o', 'e', 'a', 'a', 'o', 'o']

In [19]:
'''
We can use curly brackets to give us an exact amount of occurances:
'''
DoubleVowelRegex = re.compile(r'[aeiouAEIOU]{2}')
DoubleVowelRegex.findall('Robocop eats baby food.')

['ea', 'oo']

In [20]:
'''
We can use the caret (^) character to give us all characters that are not in the class:
'''
notVowelRegex = re.compile(r'[^aeiouAEIOU]')
notVowelRegex.findall('Robocop eats baby food.')

['R', 'b', 'c', 'p', ' ', 't', 's', ' ', 'b', 'b', 'y', ' ', 'f', 'd', '.']