# Matching Characters

## Overview

Individual characters can be matched in regular expressions simply by indicating the characters as a pattern. A special character, `.`, is used as a wildcard to indicate that any character which is not a newline can be matched instead. Some special characters, such as `\`, `(`, and `)` might need to be escaped in order to be matched. To escape a character put a `\` in front of it.

## Examples
In these examples we will use the text of article five of the United States Constitution, as found on wikipedia at https://en.wikipedia.org/wiki/Article_Five_of_the_United_States_Constitution.

### Example One

In [13]:
text = r"""
The Congress, whenever two thirds of both houses shall deem it necessary, shall propose amendments to this 
Constitution, or, on the application of the legislatures of two thirds of the several states, shall call a 
convention for proposing amendments, which, in either case, shall be valid to all intents and purposes, as 
part of this Constitution, when ratified by the legislatures of three fourths of the several states, or by 
conventions in three fourths thereof, as the one or the other mode of ratification may be proposed by the 
Congress; provided that no amendment which may be made prior to the year one thousand eight hundred and 
eight shall in any manner affect the first and fourth clauses in the ninth section of the first article; 
and that no state, without its consent, shall be deprived of its equal suffrage in the Senate."""

# To match all instances of the word "the" we just set that string to be the pattern
pattern=r"the"

# We then import the python regular expression library
import re

# And we look for all instances of this pattern
re.findall(pattern, text)

['the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the',
 'the']

### Example Two

In [14]:
# To match all ithree letter words that begin with the letter "t" we just set our pattern to be a "t" followed
# by two wildcard characters and both preceded and followed by space characters, which indicate the beginning
# and ends of a word. Note that this won't work for words which end a sentence, as punctuation isn't a space, 
# but we can deal with that in a different manner
pattern=r" t.. "

# And we look for all instances of this pattern
re.findall(pattern, text)

[' two ',
 ' the ',
 ' the ',
 ' two ',
 ' the ',
 ' the ',
 ' the ',
 ' the ',
 ' the ',
 ' the ',
 ' the ',
 ' the ',
 ' the ',
 ' the ',
 ' the ']

## Problems
In these problems you will use text from the Canadian Charter of Rights and Freedoms section 2 from wikipedia https://en.wikipedia.org/wiki/Section_2_of_the_Canadian_Charter_of_Rights_and_Freedoms.

### Problem 1
Count the number of freedoms listed (they are enumerated in parentheses). Return a single number.

In [15]:
text=r"""
2. Everyone has the following fundamental freedoms:
(a) freedom of conscience and religion;
(b) freedom of thought, belief, opinion and expression, including freedom of the press and other media of communication;
(c) freedom of peaceful assembly; and
(d) freedom of association.
"""

# Your code goes here

In [18]:
# Solution

# First we import the python re module
import re

# In this pattern we want to match any single character in paretheses followed by a space. Note that we have
# to escape the parentheses.
pattern=r"\(.\) "

# Now we just print the length of that list
print(len(re.findall(pattern,text)))

4


### Problem 2
How many times is a word beginning with the term "free" mentioned in this section of the document? Return a single number.

In [19]:
# Solution

# In this pattern we want to match just the term "free" with a space in front of it. This wouldn't catch words
# that started a sentence or paragraph as they might not be preceeded by a space (e.g. the start of a new 
# line) and they might have different capitalization.
pattern=r" free"

# Now we just print the length of that list
print(len(re.findall(pattern,text)))

6


## Reuse

I authorize that this work may be used in further iterations of this or other classes with or without attribution in order to help people learn regex.

I do not authorize this work to be used in further iterations of this or other classes.