<h3>Meta Characters</h3>

In this notebook, we will deal with some more special characters, also referred to as __meta characters__.

![Metachars.png](attachment:Metachars.png)

In [1]:
import re

_ The `$` returns a match if the regex is at the end of the string.  
_ The special character must be placed at the end of the regex.  
_ The `$` functions exactly the same as the `\Z` by default.  
_ However, if the `MULTILINE` mode is on, then the `$` also matches at the end before every newline. Note that you can set the `flags` parameter equal to either `re.MULTILINE` or `re.M` to get this result.

In [2]:
string = "this is the first line\nthis is the second line\nthis is the third line"

In [3]:
print('With \Z ', re.findall('line\Z', string))
print('With $  ', re.findall('line$', string))

With \Z  ['line']
With $   ['line']


In [4]:
print('With \Z with MULTILINE', re.findall('line\Z', string, flags=re.MULTILINE))
print('With $ with MULTILINE', re.findall('line$', string, flags=re.M))

With \Z with MULTILINE ['line']
With $ with MULTILINE ['line', 'line', 'line']


_ The `^` returns a match if the regex is at the start of the string.  
_ The special character must be placed at the start of the regex.  
_ The `^` functions exactly the same as the `\A` by default.  
_ However, if the `MULTILINE` mode is on, then the `^` also matches at the start before every newline. Note that you can set the `flags` parameter equal to either `re.MULTILINE` or `re.M` to get this result.

In [None]:
string = "this is the first line\nthis is the second line\nthis is the third line"

In [None]:
print('With \A ', re.findall('\Athis', string))
print('With ^  ', re.findall('^this', string))

In [5]:
print('With \A with MULTILINE', re.findall('\Athis', string, flags=re.MULTILINE))
print('With ^ with MULTILINE', re.findall('^this', string, flags=re.M))

With \A with MULTILINE ['this']
With ^ with MULTILINE ['this', 'this', 'this']


_ The `[]` metacharacter is used when you wish to match one of a set of characters. 

_ The `[]` together with the set of characters is called a __character set__.  

_ In the example below, `[se]` matches with every occurrence of __s__ or __e__ in the input string.

In [6]:
string = "searchinge for sets"
print(re.findall('se', string))
print(re.findall('[se]', string)) #find either s or e

['se', 'se']
['s', 'e', 'e', 's', 'e', 's']


Inside the `[]`, the `^` functions differently.  It returns every occurrence which does not belong to the character set.

In [7]:
string = "searching for sets"
print(re.findall('[^se]', string))

['a', 'r', 'c', 'h', 'i', 'n', 'g', ' ', 'f', 'o', 'r', ' ', 't']


You can use ranges and combinations of different ranges inside `[]` to create more complex character sets.  In the example below, `findall` will search for and return all characters between (and including) __k__ through __t__.

In [8]:
string = "searching for a set"
print(re.findall('[k-t]', string))

['s', 'r', 'n', 'o', 'r', 's', 't']


You can include multiple ranges.

In the example below, findall will search for and return all characters between (and including) __k__ through __t__ or between (and including) __a__ through __c__

In [23]:
string = "searching for a set"
print(re.findall('[k-ta-c]', string))

['s', 'a', 'r', 'c', 'n', 'o', 'r', 'a', 's', 't']


Including the `^` at the outside a character set will return a match if the string starts with any of the characters inside the character set.

In [26]:
string = "searching for a set"
print(re.findall('^[k-ta-c]', string))

['s']


Including the `^` both outside a character set as well as inside the character set will return a match if the string starts with any of the characters NOT inside the character set. 

In [30]:
string = "Kearching for a set"
print(re.findall('^[^k-ta-c]', string)) #^ does the opposite of whats entered inside
print(re.findall('[^k-ta-c]', string))

['K']
['K', 'e', 'h', 'i', 'g', ' ', 'f', ' ', ' ', 'e']


The `|` will return a match if the string matches any one of the characters.

In [31]:
string = "searching for a set"
print(re.findall('a|c', string))
print(re.findall('[ac]', string))

['a', 'c', 'a']
['a', 'c', 'a']


You can combine `|` with `()` to match one of two or more of multiple character substrings.

In [32]:
string = "searching for a set"
print(re.findall('(ea|ch)', string))

['ea', 'ch']


The period (`.`) character matches with any character other than a new line.

In [33]:
string = "this is the first line\nthis is the second line*..\n"
print('With . ', re.findall('.', string))
print('With . ', len(string), len(re.findall('.', string)))

With .  ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 't', 'h', 'e', ' ', 'f', 'i', 'r', 's', 't', ' ', 'l', 'i', 'n', 'e', 't', 'h', 'i', 's', ' ', 'i', 's', ' ', 't', 'h', 'e', ' ', 's', 'e', 'c', 'o', 'n', 'd', ' ', 'l', 'i', 'n', 'e', '*', '.', '.']
With .  50 48
