# Python Tutorial

https://www.w3schools.com/python/

## RegEx

In [3]:
print('The `re` module is used for handling regular expressions.\n',
      'Use ^ to signify starting string.\n',
      'Use $ to signify ending string.\n',
      'Use . to signify any character.\n',
      'Use * to signify any number of repetitions.')

import re

my_text = 'The rain in Spain'

print(re.search('^The.*Spain$', my_text))

The `re` module is used for handling regular expressions.
 Use ^ to signify starting string.
 Use $ to signify ending string.
 Use . to signify any character.
 Use * to signify any number of repetitions.
<re.Match object; span=(0, 17), match='The rain in Spain'>


### RegEx Functions

In [11]:
print('There are many functions that can be imported from `re`')

import re

some_re_functions = {
'findall':	'Returns a list containing all matches',
'search':	'Returns a Match object if there is a match anywhere in the string',
'split':	'Returns a list where the string has been split at each match',
'sub':	'Replaces one or many matches with a string'
}

for key in some_re_functions:
    print(f'{key} \n\t {some_re_functions[key]}\n')

my_string = 'The rain in Spain'

print(re.findall('a', my_string))
print(re.search('a', my_string))
print(re.split(' ', my_string))
print(re.sub(' ', '_', my_string))

print('And there are even more that are not fully described here:')

for key in dir(re):
    print(key)

There are many functions that can be imported from `re`
findall 
	 Returns a list containing all matches

search 
	 Returns a Match object if there is a match anywhere in the string

split 
	 Returns a list where the string has been split at each match

sub 
	 Replaces one or many matches with a string

['a', 'a']
<re.Match object; span=(5, 6), match='a'>
['The', 'rain', 'in', 'Spain']
The_rain_in_Spain
And there are even more that are not fully described here:
A
ASCII
DEBUG
DOTALL
I
IGNORECASE
L
LOCALE
M
MULTILINE
Match
Pattern
RegexFlag
S
Scanner
T
TEMPLATE
U
UNICODE
VERBOSE
X
_MAXCACHE
__all__
__builtins__
__cached__
__doc__
__file__
__loader__
__name__
__package__
__spec__
__version__
_cache
_compile
_compile_repl
_expand
_locale
_pickle
_special_chars_map
_subx
compile
copyreg
enum
error
escape
findall
finditer
fullmatch
functools
match
purge
search
split
sre_compile
sre_parse
sub
subn
template


In [46]:
print('There are many special characters for grabbing specific characters.')

special_characters = {
    '[]':	'A set of characters. For example, "[a-m]"',
    '\\':	'Signals a special sequence (can also be used to escape special characters). For example, "\d"',
    '.':	'Any character (except newline character). For example, "he..o"',
    '^':	'Starts with. For example, "^hello"',
    '$':	'Ends with. For example, "planet$"',
    '*':	'Zero or more occurrences. For example, "he.*o"',
    '+':	'One or more occurrences. For example, "he.+o"',
    '?':	'Zero or one occurrences. For example, "he.?o"',
    '{}':	'Exactly the specified number of occurrences. For example, "he.{2}o"',
    '|':	'Either or. For example, "falls|stays"',
    '()':	'Capture and group'
    }

for key in special_characters:
    print(f'{key} \n\t {special_characters[key]}\n')
    
import re

my_string = 'The rain in Spain; It\'s plain. My lullaby.'

print(re.search('[b-f]', my_string))
print(re.search(r'\.', my_string))
print('Note that if you will use any escape character patterns that are \n',
      'not recognized by python, pylance linting may warn you unless you\n,'
      'preface the string with "r" to indicate it should be treated as a \n',
      r'"raw" string. For example, "\.".')
print(re.search('T.e', my_string))
print(re.search('^The', my_string))
print(re.search(r'by\.$', my_string))
print(re.search('The.*plain', my_string))
print(re.search('T+he', my_string))
print(re.search('Tg?he', my_string))
print(re.search('l{2}', my_string))
print(re.search('N|M', my_string))
print(re.search('Th(e )?', my_string))

There are many special characters for grabbing specific characters.
[] 
	 A set of characters. For example, "[a-m]"

\ 
	 Signals a special sequence (can also be used to escape special characters). For example, "\d"

. 
	 Any character (except newline character). For example, "he..o"

^ 
	 Starts with. For example, "^hello"

$ 
	 Ends with. For example, "planet$"

* 
	 Zero or more occurrences. For example, "he.*o"

+ 
	 One or more occurrences. For example, "he.+o"

? 
	 Zero or one occurrences. For example, "he.?o"

{} 
	 Exactly the specified number of occurrences. For example, "he.{2}o"

| 
	 Either or. For example, "falls|stays"

() 
	 Capture and group

<re.Match object; span=(2, 3), match='e'>
<re.Match object; span=(29, 30), match='.'>
Note that if you will use any escape character patterns that are 
 not recognized by python, pylance linting may warn you unless you
,preface the string with "r" to indicate it should be treated as a 
 "raw" string. For example, "\.".
<re.Match o

### Special Sequences

In [61]:
print('There are many special sequences for grabbing specific characters.')

special_sequences = {
r'\A':	r'Returns a match if the specified characters are at the beginning of the string. For example,	"\AThe"	',
r'\b':	r'Returns a match where the specified characters are at the beginning or at the end of a word\n(the "r" in the beginning is making sure that the string is being treated as a "raw string"). For example,	r"\bain" r"ain\b"',
r'\B':	r'Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word\n(the "r" in the beginning is making sure that the string is being treated as a "raw string"). For example,	r"\Bain" r"ain\B"',
r'\d':	r'Returns a match where the string contains digits (numbers from 0-9). For example,	"\d"	',
r'\D':	r'Returns a match where the string DOES NOT contain digits. For example,	"\D"	',
r'\s':	r'Returns a match where the string contains a white space character. For example,	"\s"	',
r'\S':	r'Returns a match where the string DOES NOT contain a white space character. For example,	"\S"	',
r'\w':	r'Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character). For example,	"\w"	',
r'\W':	r'Returns a match where the string DOES NOT contain any word characters. For example,	"\W"	',
r'\Z':	r'Returns a match if the specified characters are at the end of the string. For example,	"Spain\Z"'
    }

for key in special_sequences:
    print(f'{key} \n\t {special_sequences[key]}\n')
    
import re

my_string = 'The rain in Spain; It\'s plain. My lullaby'
my_other_string = 'one,2,three,4'
my_wordless_string = '*(#*)(@^((*@)))'

print(re.search(r'\AThe', my_string))
print(re.search(r'\bThe', my_string))
print(re.search('by\\b', my_string))
print(r'Observe that if you use \b, you will lose the ability to use', '\n',
      'other special characters if you prepend with r"" since the other\n',
      'special sequences will be ignored and treated as raw strings\n',
      r'instead. This is because \b stands for a backspace character in', '\n',
      'ASCII. Thus, a better practice may just be to use double slash.')
print(re.search('\\Bain', my_string))
print(re.search('rai\\B', my_string))
print(re.search('\\d', my_other_string))
print(re.search('\\D', my_string))
print(re.search('\\s', my_string))
print(re.search('\\S', my_other_string))
print(re.search('\\w', my_string))
print(re.search('\\W', my_wordless_string))
print(re.search('y\\Z', my_string))

There are many special sequences for grabbing specific characters.
\A 
	 Returns a match if the specified characters are at the beginning of the string. For example,	"\AThe"	

\b 
	 Returns a match where the specified characters are at the beginning or at the end of a word\n(the "r" in the beginning is making sure that the string is being treated as a "raw string"). For example,	r"\bain" r"ain\b"

\B 
	 Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word\n(the "r" in the beginning is making sure that the string is being treated as a "raw string"). For example,	r"\Bain" r"ain\B"

\d 
	 Returns a match where the string contains digits (numbers from 0-9). For example,	"\d"	

\D 
	 Returns a match where the string DOES NOT contain digits. For example,	"\D"	

\s 
	 Returns a match where the string contains a white space character. For example,	"\s"	

\S 
	 Returns a match where the string DOES NOT contain a white space character. Fo

### Sets

In [74]:
print('There are many special sets for grabbing specific characters.')

special_sets = {
'[arn]':	'Returns a match where one of the specified characters (a, r, or n) is present	',
'[a-n]':	'Returns a match for any lower case character, alphabetically between a and n	',
'[^arn]':	'Returns a match for any character EXCEPT a, r, and n	',
'[0123]':	'Returns a match where any of the specified digits (0, 1, 2, or 3) are present	',
'[0-9]':	'Returns a match for any digit between 0 and 9	',
'[0-5][0-9]':	'Returns a match for any two-digit numbers from 00 and 59	',
'[a-zA-Z]':	'Returns a match for any character alphabetically between a and z, lower case OR upper case',
'[+]':	'In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string'
    }

for key in special_sets:
    print(f'{key} \n\t {special_sets[key]}\n')
    
import re

my_string = 'The rain in Spain; It\'s plain. My lullaby'
my_other_string = 'one,2,three,4,75'
my_wordless_string = '*(#*)(@^((*@)))'

print(re.search('[a]', my_string))
print(re.search('[a-c]', my_string))
print(re.search('[^qxc]', my_string))
print(re.search('[23]', my_other_string))
print(re.search('[0-9]', my_other_string))
print(re.search('[7-8][5-6]', my_other_string))
print(re.search('[b-dL-N]', my_string))
print(re.search('[.]', my_string))

There are many special sets for grabbing specific characters.
[arn] 
	 Returns a match where one of the specified characters (a, r, or n) is present	

[a-n] 
	 Returns a match for any lower case character, alphabetically between a and n	

[^arn] 
	 Returns a match for any character EXCEPT a, r, and n	

[0123] 
	 Returns a match where any of the specified digits (0, 1, 2, or 3) are present	

[0-9] 
	 Returns a match for any digit between 0 and 9	

[0-5][0-9] 
	 Returns a match for any two-digit numbers from 00 and 59	

[a-zA-Z] 
	 Returns a match for any character alphabetically between a and z, lower case OR upper case

[+] 
	 In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string

<re.Match object; span=(5, 6), match='a'>
<re.Match object; span=(5, 6), match='a'>
<re.Match object; span=(0, 1), match='T'>
<re.Match object; span=(4, 5), match='2'>
<re.Match object; span=(4, 5), match='2'>
<re.Match object; span=(14, 16), matc

### The `findall()` function

In [76]:
print('The `findall()` function returns all matches. It returns an empty\n',
      'list when there are no matches')

import re

my_string = 'The rain in Spain; it\'s  plain.'

print(re.findall('ai', my_string))
print(re.findall('xyz', my_string))

The `findall()` function returns all matches. It returns an empty
 list when there are no matches
['ai', 'ai', 'ai']
[]


### The `search()` function

In [87]:
print('The `search()` function searches a string for a match, returning a \n',
      '`Match` object when there is.')

import re

my_string = 'The rain in Spain; it\'s  plain.'

print(re.search('ain', my_string))
print(re.search('xyz', my_string))

The `search()` function searches a string for a match, returning a 
 `Match` object when there is.
<re.Match object; span=(5, 8), match='ain'>
None


### The `split()` Function

In [84]:
print('The split function returns a list where the string has been split')

import re

my_string = 'The rain in Spain; it\'s  plain.'

re.split(' ', my_string)

print('The `maxsplit` parameter can be used to dictate how many splits')

re.split(' ', my_string, 3)

The split function returns a list where the string has been split
The `maxsplit` parameter can be used to dictate how many splits


['The', 'rain', 'in', "Spain; it's  plain."]

### The `sub()` Function

In [86]:
print('The `sub()` function can substitute strings.')

import re

my_string = 'The rain in Spain; it\'s  plain.'

re.sub(' ', '_', my_string)

print('The `count` parameter can dictate how many substitutions to make')

re.sub(' ', '_', my_string, count = 5)

The `sub()` function can substitute strings.
The `count` parameter can dictate how many substitutions to make


"The_rain_in_Spain;_it's_ plain."

### Match Object

In [95]:

print('The `search()` function searches a string for a match, returning a \n',
      '`Match` object when there is.')

import re

my_string = 'The rain in Spain; it\'s  plain.'

my_match = re.search('ain', my_string)

print(my_match)

print(type(my_match))

print('The methods for a match object are:')

for key in dir(my_match):
    print(key)

print('To get the docstrings from a method, use `__doc__`. For example:')
print(my_match.start.__doc__)

print('`span()` and `.group()` are useful methods of a `Match` object, \n',
      'while `string` is a useful attribute.')

print('`.span()` will make a tuple telling you the indices for the match')

print(my_match.span.__doc__)

print(my_match.span())

print('`.group()` will make a string of the match')

print(my_match.group.__doc__)

print(my_match.group())

print('`.string` will make a string of the originally queried item')

print(my_match.string.__doc__)

print(my_match.string)

print('Observe that since `Match` objects can be of type `None`, pylance \n',
      'linter will complain that `None` types do not have the\n',
      'methods or attributes you might try to access.')

The `search()` function searches a string for a match, returning a 
 `Match` object when there is.
<re.Match object; span=(5, 8), match='ain'>
<class 're.Match'>
The methods for a match object are:
__class__
__class_getitem__
__copy__
__deepcopy__
__delattr__
__dir__
__doc__
__eq__
__format__
__ge__
__getattribute__
__getitem__
__gt__
__hash__
__init__
__init_subclass__
__le__
__lt__
__ne__
__new__
__reduce__
__reduce_ex__
__repr__
__setattr__
__sizeof__
__str__
__subclasshook__
end
endpos
expand
group
groupdict
groups
lastgroup
lastindex
pos
re
regs
span
start
string
To get the docstrings from a method, use `__doc__`. For example:
Return index of the start of the substring matched by group.
`span()` and `.group()` are useful methods of a `Match` object, 
 while `string` is a useful attribute.
For match object m, return the 2-tuple (m.start(group), m.end(group)).
(5, 8)
group([group1, ...]) -> str or tuple.
    Return subgroup(s) of the match by indices or names.
    For 0 returns the 