### Regular Expressions

Sample answers are in a separate folder with an explanation (there might be multiple ways to solve a problem though!)

1. Using the sample text below and regular expressions find the place where the world 'involuntarily' appears (find the string slice).

In [1]:
text = """Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, 
and nothing particular to interest me on shore, I thought I would sail about a little and see the watery 
part of the world. It is a way I have of driving off the spleen and regulating the circulation. 
Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; 
whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; 
and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me 
from deliberately stepping into the street, and methodically knocking people's hats off - then, 
I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. 
With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. 
There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, 
cherish very nearly the same feelings towards the ocean with me."""



In [2]:
import re
comp = re.compile("involuntarily")
result = comp.search(text)
print(result)

<_sre.SRE_Match object; span=(438, 451), match='involuntarily'>


2. Using the text from 9.7.1 above find how many times the word 'I' is used.

In [3]:
# If you do just 'I' it gets all the words that start with a capital 'I' like the 'It' and 'Ishmael'.
comp = re.compile('I ')
result = comp.findall(text)
print(len(result))

9


3. Using regular expressions and the pay_list below, find the counts of how people paid. That is, get the count of each of  these codes: 

```O = Online, P = Phone, Cash = Cash, CC = Credit Card ```

In [4]:
pay_list = ['O[SGC] Paid $123.34', 'Cash - $150.00 - ABC', 'O[ABC] Paid $230.23', 'P[rjf@abc.net] paid 321.12', 
            'O[CGR] Paid $967.21', 'CC[ajk] 245.34', 'P[abc@rjf.net] paid 161.13', 'Cash - $100.00 - rjf', 
            'Cash - $100.00 - gun', 'O[DYTC] Paid $199.99', 'O[ABC] Paid $123.93', 'P[dtc@dtc.com]  paid 109.56',
            'CC[ABC] 186.70', 'CC[CGC] 995.95', 'Cash - $125.00 - pal']
            

In [5]:
# Using defaultdict to setup a dictionary - it doesnt give an error if there isnt a key (just makes a new key)
from collections import defaultdict
pay_dict = defaultdict(int)

# You could also setup your dict like the one below (with the assumption you know the types to start)
# pay_dict = {'O':0, 'P':0, 'Cash': 0, 'CC': 0}

# I put this outside the loop since it doesnt need to be re-computed each time
# The [ has to be escape because it is a 'special' character
pattern = re.compile('\[')

# Iterate through all the items
# If the item has the word cash in it - count it as cash
# Else split out the first item before the '[' character and thats the payment method
for item in pay_list:
    if re.search("cash", item.lower()):
        pay_dict['Cash'] += 1
    else:       
        pay = pattern.split(item)[0]
        pay_dict[pay] += 1
        
print(pay_dict)
    
    

defaultdict(<class 'int'>, {'O': 5, 'Cash': 4, 'P': 3, 'CC': 3})


4. Using regular expressions and the pay_list from above, find the total amount of money that was paid. 

In [6]:
# Init the total to 0 and the pattern of 3 digits decimal 2 digits
# This would be a harder problem if there was a variant # of digits in the payments
total = 0
number_pattern = re.compile(r'\d{3}.\d{2}')

# Find the payment - convert to float - add to the total
# We know there is only one payment per item so the digits item we want is always in the [0] position
# If it wasn't (or there were multiple payments), we could add another for loop to go through
# each item in digits list and add them to the total
for item in pay_list:
    digits = number_pattern.findall(item)
    num = float(digits[0])
    total += num
    
print("The total is: {:.2f}".format(total))

The total is: 4139.50
