## CRMDA Python Workgroup
# Week #4 

The first thing that you want to do every time you start working with the materials we are providing is to check for updates on the files. Here is how you can do that with Git. 

Make sure that you have closed all of your Jupyter Notebooks (if you don't know how to shut down Jupyter, go [here](https://github.com/CRMDA-python/tutorial/blob/master/shut_down_jupyter.md)). If you are on OS X/Linux, navigate to the "tutorial" directory (that's the directory you cloned during week 1) in Terminal. If you are on Windows, go to the folder you downloaded during week 1, and right click somewhere in the white space. Then select "Git Bash Here". Once you have a command prompt, type the following commands:

> git checkout .  

> git pull

Once you do that, Git will output something like this:

_______________________________________________________________
<img src="../content/images/week2/git_pull_output.png">
_______________________________________________________________




<img src="../content/images/caution-h50.png"> These notebooks are designed to guide your reading of the book we have chosen for this workgroup, "Automate the Boring Stuff with Python" by Al Sweigart. The content is available online: https://automatetheboringstuff.com/

The sections we recommend you read and practice are listed below. You will find any missing code in the PDF version of the book. 

# The Boring Stuff - Chapter 6: Manipulating Strings
http://automatetheboringstuff.com/chapter6/

## Working with Strings

### String Literals

In [None]:
spam = 'I am a kitty cat'

### Double Quotes

In [None]:
spam = "That is Alice's cat."

### Escape Characters

In [None]:
spam = 'Say hi to Bob\'s mother.'

### Raw Strings

In [None]:
print(r'That is Carol\'s cat.')

### Multiline Strings with triple Quotes

In [None]:
print('''Dear Alice,

Eve's cat has been arrested for catnapping, cat burglary, and extortion.

Sincerely,
Bob''')

### Multiline Comments

In [None]:
"""This is a test Python program.
Written by Al Sweigart al@inventwithpython.com
This program was designed for Python 3, not Python 2.
"""
def spam():
    """This is a multiline comment to help
    explain what the spam() function does."""
    print('Hello!')

### Indexing and Slicing Strings

In [None]:
spam = 'Hello world!'
spam[0]

In [None]:
spam[4]

In [None]:
spam[-1]

In [None]:
spam[0:5]

In [None]:
spam[:5]

In [None]:
spam[6:]

### The in and not in Operators with Strings

In [None]:
'Hello' in 'Hello World'

In [None]:
'Hello' in 'Hello'

In [None]:
'HELLO' in 'Hello World'

In [None]:
'' in 'spam'

In [None]:
'cats' not in 'cats and dogs'

## Useful String Methods

### The _upper()_, _lower()_, _isupper()_, and _islower()_ String Methods

In [None]:
spam = 'Hello world!'
spam = spam.upper()
spam

In [None]:
spam = spam.lower()
spam

In [None]:
print('How are you?')
feeling = input()
if feeling.lower() == 'great':
    print('I feel great too.')
else:
    print('I hope the rest of your day is good.')

In [None]:
spam = 'Hello world!'

In [None]:
spam.islower()

In [None]:
spam.isupper()

In [None]:
'HELLO'.isupper()

In [None]:
'abc12345'.islower()

In [None]:
'12345'.islower()

In [None]:
'12345'.isupper()

In [None]:
'Hello'.upper()

In [None]:
'Hello'.upper().lower()

In [None]:
'Hello'.upper().lower().upper()

In [None]:
'HELLO'.lower()

In [None]:
'HELLO'.lower().islower()

### The isX String Methods

In [None]:
'hello'.isalpha()

In [None]:
'hello123'.isalpha()

In [None]:
'hello123'.isalnum()

In [None]:
'hello'.isalnum()

In [None]:
'123'.isdecimal()

In [None]:
' '.isspace()

In [None]:
'This Is Title Case'.istitle()

In [None]:
'This Is Title Case 123'.istitle()

In [None]:
'This Is not Title Case'.istitle()

In [None]:
'This Is NOT Title Case Either'.istitle()

In [None]:
while True:
    print('Enter your age:')
    age = input()
    if age.isdecimal():
        break
    print('Please enter a number for your age.')


while True:
    print('Select a new password (letters and numbers only):')
    password = input()
    if password.isalnum():
        break
    print('Passwords can only have letters and numbers.')

### The _startswith()_ and _endswith()_ String Methods

In [None]:
'Hello world!'.startswith('Hello') 


In [None]:
'Hello world!'.endswith('world!')

In [None]:
'abc123'.startswith('abcdef')

In [None]:
'abc123'.endswith('12')

In [None]:
'Hello world!'.startswith('Hello world!')

In [None]:
'Hello world!'.endswith('Hello world!')

### The _join()_ and _split()_ String Methods

In [None]:
', '.join(['cats', 'rats', 'bats'])

In [None]:
' '.join(['My', 'name', 'is', 'Simon'])

In [None]:
'ABC'.join(['My', 'name', 'is', 'Simon']) 

In [None]:
'My name is Simon'.split()

In [None]:
'MyABCnameABCisABCSimon'.split('ABC')

In [None]:
'My name is Simon'.split('m')

In [None]:
spam = '''Dear Alice,
How have you been? I am fine. There is a container in the fridge that is labeled "Milk Experiment".
Please do not drink it.
Sincerely,
Bob'''
spam.split('\n')

### Justifying Text with _rjust()_, _ljust()_, and _center()_

In [None]:
'Hello'.rjust(10)

In [None]:
'Hello'.rjust(20)

In [None]:
'Hello World'.rjust(20)

In [None]:
'Hello'.ljust(10)

In [None]:
'Hello'.rjust(20, '*')

In [None]:
'Hello'.ljust(20, '-')

In [None]:
'Hello'.center(20)

In [None]:
'Hello'.center(20, '=')

In [None]:
def printPicnic(itemsDict, leftWidth, rightWidth):
    print('PICNIC ITEMS'.center(leftWidth + rightWidth, '-'))
    for k, v in itemsDict.items():
        print(k.ljust(leftWidth, '.') + str(v).rjust(rightWidth))
picnicItems = {'sandwiches': 4, 'apples': 12, 'cups': 4, 'cookies': 8000}
printPicnic(picnicItems, 12, 5)
printPicnic(picnicItems, 20, 6)


### Removing Whitespace with _strip()_, _rstrip()_, and _lstrip()_

In [None]:
spam = ' Hello World '
spam.strip()

In [None]:
spam.lstrip()

In [None]:
spam.rstrip()

In [None]:
spam = 'SpamSpamBaconSpamEggsSpamSpam'
spam.strip('ampS')

### Removing Values from Lists with _del_ Statements

In [None]:
import pyperclip
pyperclip.copy('Hello world!')
pyperclip.paste()

In [None]:
pyperclip.paste()

# Test yourself

It's important that you are able to answer the following practice questions:

__PAGE 142__: Q4, Q6-8, Q10


### Additional Resources

Python-Course: http://www.python-course.eu/python3_sequential_data_types.php

# The Boring Stuff - Chapter 7: Pattern Matching with Regular Expressions
http://automatetheboringstuff.com/chapter7/

"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.'
Now they have two problems." (Jamie Zawinski) 

## Finding Patterns of Text Without Regular Expressions

In [None]:
def isPhoneNumber(text):
    if len(text) != 12:
        return False
    for i in range(0, 3):
        if not text[i].isdecimal():
            return False
    if text[3] != '-':
        return False
    for i in range(4, 7):
        if not text[i].isdecimal():
            return False
    if text[7] != '-':
        return False
    for i in range(8, 12):
        if not text[i].isdecimal():
            return False
    return True

print('415-555-4242 is a phone number:')
print(isPhoneNumber('415-555-4242'))
print('Moshi moshi is a phone number:')
print(isPhoneNumber('Moshi moshi'))

## Finding Patterns of Text with Regular Expressions

### Creating Regex Objects

In [None]:
import re

In [None]:
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')

### Matching Regex Objects

In [None]:
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-4242.')
print('Phone number found: ' + mo.group())

Review of Regular Expression Matching
While there are several steps to using regular expressions in Python, each step is fairly simple.
1. Import the regex module with import re.
2. Create a Regex object with the re.compile() function. (Remember to use a
raw string.)
3. Pass the string you want to search into the Regex object’s search() method.
This returns a Match object.
4. Call the Match object’s group() method to return a string of the actual
matched text.

__Note__: While I encourage you to enter the example code into the interactive shell, you should also make use of web-based regular expression testers, which can show you exactly how a regex matches a piece of text that you enter. I recommend the tester at http:// regexpal.com/.

## More Pattern Matching with Regular Expressions

### Grouping with Parentheses

In [None]:
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My number is 415-555-4242.')
mo.group(1)

In [None]:
mo.group(2)

In [None]:
mo.group(0)

In [None]:
mo.group()

In [None]:
mo.groups()

In [None]:
areaCode, mainNumber = mo.groups() 
print(areaCode)

In [None]:
print(mainNumber)

In [None]:
phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My phone number is (415) 555-4242.')
mo.group(1)

In [None]:
mo.group(2)

### Matching Multiple Groups with the Pipe

In [None]:
heroRegex = re.compile (r'Batman|Tina Fey')
mo1 = heroRegex.search('Batman and Tina Fey.')
mo1.group()

In [None]:
mo2 = heroRegex.search('Tina Fey and Batman.')
mo2.group()

In [None]:
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = batRegex.search('Batmobile lost a wheel')
mo.group()

In [None]:
mo.group(1)

### Optional Matching with the Question Mark

In [None]:
batRegex = re.compile(r'Bat(wo)?man')
mo1 = batRegex.search('The Adventures of Batman')
mo1.group()

In [None]:
mo2 = batRegex.search('The Adventures of Batwoman')
mo2.group()

In [None]:
phoneRegex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d')
mo1 = phoneRegex.search('My number is 415-555-4242')
mo1.group()

In [None]:
mo2 = phoneRegex.search('My number is 555-4242')
mo2.group()

### Matching Zero or More with the Star

In [None]:
batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('The Adventures of Batman')
mo1.group()

In [None]:
mo2 = batRegex.search('The Adventures of Batwoman')
mo2.group()

In [None]:
mo3 = batRegex.search('The Adventures of Batwowowowoman')
mo3.group()

### Matching One or More with the Plus

In [None]:
batRegex = re.compile(r'Bat(wo)+man')
mo1 = batRegex.search('The Adventures of Batwoman')
mo1.group()

In [None]:
mo2 = batRegex.search('The Adventures of Batwowowowoman')
mo2.group()

In [None]:
mo3 = batRegex.search('The Adventures of Batman')
mo3 == None

### Matching Specific Repetitions with Curly Brackets

In [None]:
haRegex = re.compile(r'(Ha){3}')
mo1 = haRegex.search('HaHaHa')
mo1.group()

In [None]:
mo2 = haRegex.search('Ha')
mo2 == None

## Greedy and Nongreedy Matching

In [None]:
greedyHaRegex = re.compile(r'(Ha){3,5}')
mo1 = greedyHaRegex.search('HaHaHaHaHa')
mo1.group()

In [None]:
nongreedyHaRegex = re.compile(r'(Ha){3,5}?')
mo2 = nongreedyHaRegex.search('HaHaHaHaHa')
mo2.group()

## The _findall()_ method

In [None]:
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('Cell: 415-555-9999 Work: 212-555-0000')
mo.group()

In [None]:
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') # has no groups 
phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000')

In [None]:
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)') # has groups 
phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000')

## Character Classes

Shorthand character class | Represents

\d                           Any numeric digit from 0 to 9.

\D                           Any character that is not a numeric digit from 0 to 9.

\w                           Any letter, numeric digit, or the underscore character. (Think of this as matching “word” characters.)

\W                           Any character that is not a letter, numeric digit, or the underscore character.

\s                           Any space, tab, or newline character. (Think of this as matching “space” characters.)

\S                           Any character that is not a space, tab, or newline.


In [None]:
xmasRegex = re.compile(r'\d+\s\w+')
xmasRegex.findall('12 drummers, 11 pipers, 10 lords, 9 ladies, 8 maids, 7 swans, 6 geese, 5 rings, 4 birds, 3 hens, 2 doves, 1 partridge')

## Making Your Own Classes

In [None]:
vowelRegex = re.compile(r'[aeiouAEIOU]')
vowelRegex.findall('RoboCop eats baby food. BABY FOOD.')

In [None]:
consonantRegex = re.compile(r'[^aeiouAEIOU]')
consonantRegex.findall('RoboCop eats baby food. BABY FOOD.')

## The Caret and Dollar Sign Characters

In [None]:
beginsWithHello = re.compile(r'^Hello')
beginsWithHello.search('Hello world!')

In [None]:
beginsWithHello.search('He said hello.') == None

In [None]:
endsWithNumber = re.compile(r'\d$')
endsWithNumber.search('Your number is 42')

In [None]:
endsWithNumber.search('Your number is forty two.') == None

In [None]:
wholeStringIsNum = re.compile(r'^\d+$')
wholeStringIsNum.search('1234567890')

In [None]:
wholeStringIsNum.search('12345xyz67890') == None

In [None]:
wholeStringIsNum.search('12 34567890') == None

## The Wildcard Character

In [None]:
atRegex = re.compile(r'.at')
atRegex.findall('The cat in the hat sat on the flat mat.')

### Matching Everything with Dot-Star

In [None]:
nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)')
mo = nameRegex.search('First Name: Al Last Name: Sweigart')
mo.group(1)

In [None]:
mo.group(2)

In [None]:
nongreedyRegex = re.compile(r'<.*?>')
mo = nongreedyRegex.search('<To serve man> for dinner.>')
mo.group()

In [None]:
greedyRegex = re.compile(r'<.*>')
mo = greedyRegex.search('<To serve man> for dinner.>')
mo.group()

### Matching Newlines with the Dot Character

In [None]:
noNewlineRegex = re.compile('.*')
noNewlineRegex.search('Serve the public trust.\nProtect the innocent. \nUphold the law.').group()

In [None]:
newlineRegex = re.compile('.*', re.DOTALL)
newlineRegex.search('Serve the public trust.\nProtect the innocent. \nUphold the law.').group()

### Review of Regex Symbols
This chapter covered a lot of notation, so here’s a quick review of what you learned:
- The ? matches zero or one of the preceding group.
- The * matches zero or more of the preceding group.
- The + matches one or more of the preceding group.
- The {n} matches exactly n of the preceding group.
- The {n,} matches n or more of the preceding group.
- The {,m} matches 0 to m of the preceding group.
- The {n,m} matches at least n and at most m of the preceding group.
- {n,m}? or *? or +? performs a nongreedy match of the preceding group.
- ^spam means the string must begin with spam.
- spam$ means the string must end with spam.
- The . matches any character, except newline characters.
- \d, \w, and \s match a digit, word, or space character, respectively.
- \D, \W, and \S match anything except a digit, word, or space character, respectively.
- [abc] matches any character between the brackets (such as a, b, or c).
- [^abc] matches any character that isn’t between the brackets.

## Case-Insensitive Matching

In [None]:
robocop = re.compile(r'robocop', re.I)
robocop.search('RoboCop is part man, part machine, all cop.').group()

In [None]:
robocop.search('ROBOCOP protects the innocent.').group()

In [None]:
robocop.search('Al, why does your programming book talk about robocop so much?').group()

## Substituting Strings with the _sub()_ Method

In [None]:
namesRegex = re.compile(r'Agent \w+')
namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.')

In [None]:
agentNamesRegex = re.compile(r'Agent (\w)\w*')
agentNamesRegex.sub(r'\1****', 'Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.')

## Managing Complex Regexes

In [None]:
phoneRegex = re.compile(r'((\d{3}|\(\d{3}\))?(\s|-|\.)?\d{3}(\s|-|\.)\d{4}(\s*(ext|x|ext.)\s*\d{2,5})?)')

In [None]:
agentNamesRegex = re.compile(r'Agent (\w)\w*')
agentNamesRegex.sub(r'\1****', 'Agent Alice told Agent Carol that Agent Eve knew Agent Bob was a double agent.')

## Combining re.IGNORECASE, re.DOTALL, and re.VERBOSE

In [None]:
someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL)

In [None]:
someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL | re.VERBOSE)

# Test yourself

It's important that you are able to answer the following practice questions:

__PAGE 167__: All questions in this section are good! 

### Additional Resources

Python-Course: http://www.python-course.eu/python3_re.php (Basic)

http://www.python-course.eu/python3_re_advanced.php (Advanced)