# Focus on Fundamentals

1. Know basic data structures inside oute (e.g `list`, `tuple`, `dict`, `set`, `str`)
2. Know most common standard library functions
3. Practice, practice practice!


## Example 1: Parse Text into Words

In [33]:
import re

with open('hamlet.txt') as f:
    text = f.read()
    
words = text.split()
words = [re.sub('[0-9%s ]' % re.escape(string.punctuation), '', x) for x in words]
words = [w.lower() for w in words if w != '']
words

['hamlet',
 'by',
 'william',
 'shakespeare',
 'edited',
 'by',
 'barbara',
 'a',
 'mowat',
 'and',
 'paul',
 'werstine',
 'with',
 'michael',
 'poston',
 'and',
 'rebecca',
 'niles',
 'folger',
 'shakespeare',
 'library',
 'httpsshakespearefolgeredushakespearesworkshamlet',
 'created',
 'on',
 'jul',
 'from',
 'fdt',
 'version',
 'characters',
 'in',
 'the',
 'play',
 'the',
 'ghost',
 'hamlet',
 'prince',
 'of',
 'denmark',
 'son',
 'of',
 'the',
 'late',
 'king',
 'hamlet',
 'and',
 'queen',
 'gertrude',
 'queen',
 'gertrude',
 'widow',
 'of',
 'king',
 'hamlet',
 'now',
 'married',
 'to',
 'claudius',
 'king',
 'claudius',
 'brother',
 'to',
 'the',
 'late',
 'king',
 'hamlet',
 'ophelia',
 'laertes',
 'her',
 'brother',
 'polonius',
 'father',
 'of',
 'ophelia',
 'and',
 'laertes',
 'councillor',
 'to',
 'king',
 'claudius',
 'reynaldo',
 'servant',
 'to',
 'polonius',
 'horatio',
 'hamlets',
 'friend',
 'and',
 'confidant',
 'courtiers',
 'at',
 'the',
 'danish',
 'court',
 'volt

### Exercises
* Remove punctuation
* Remove case
* Remove numbers
* Blank words
* How would you modify program if file was 10GB large?

## Example 2: Find Unique Words

In [43]:
set(['hamlet', 'brian']) - set(words) 

{'brian'}

### Exercise

* Find intersection of words between two lists
* Find difference of words between two lists

## Example 3: Word Count

In [53]:
from collections import Counter, defaultdict

Counter(words).most_common()[-10:]

dd = defaultdict(int)
for w in words:
    dd[w] += 1
dd    

d = {}
for w in words:
    if w not in d:
        d[w] = 0
    d[w] += 1
d    

sorted(d.items(), key=lambda r: -r[1])

[('the', 1086),
 ('and', 968),
 ('to', 758),
 ('of', 678),
 ('i', 564),
 ('you', 553),
 ('a', 534),
 ('my', 517),
 ('hamlet', 468),
 ('in', 440),
 ('it', 415),
 ('that', 386),
 ('is', 360),
 ('not', 313),
 ('his', 306),
 ('this', 299),
 ('with', 277),
 ('but', 274),
 ('he', 254),
 ('for', 251),
 ('your', 242),
 ('me', 232),
 ('as', 231),
 ('lord', 223),
 ('be', 223),
 ('what', 203),
 ('king', 198),
 ('him', 196),
 ('so', 192),
 ('have', 182),
 ('will', 171),
 ('do', 160),
 ('horatio', 154),
 ('no', 140),
 ('we', 138),
 ('on', 137),
 ('are', 130),
 ('all', 126),
 ('queen', 122),
 ('by', 121),
 ('our', 120),
 ('polonius', 119),
 ('they', 118),
 ('shall', 117),
 ('if', 114),
 ('o', 113),
 ('or', 113),
 ('laertes', 109),
 ('thou', 108),
 ('come', 107),
 ('good', 106),
 ('now', 100),
 ('from', 98),
 ('more', 96),
 ('let', 94),
 ('her', 92),
 ('t', 89),
 ('ophelia', 88),
 ('well', 87),
 ('thy', 87),
 ('how', 86),
 ('was', 86),
 ('at', 83),
 ('most', 81),
 ('would', 81),
 ('like', 79),
 ('sir

### Exercise

* Top 10 words
* Bottom 10

## Example: Advent of Code 2020 - Day 1

Find the two entries that sum to 2020 and then multiply those two numbers together.

https://adventofcode.com/2020/day/1

In [56]:
nums = [1721, 979, 366, 299, 675, 1456]

def solution1(nums):
    for x in nums:
        for y in nums:
            if x + y == 2020:
                return x * y
            
solution1(nums)

514579

In [57]:
def solution2(nums):
    diff = [2020 - x for x in nums]
    unique = set(nums)
    for x in diff:
        if x in unique:
            return x * (2020 - x)

solution2(nums)

514579

## Example: Advent of Code 2020 - Day 2

To try to debug the problem, they have created a list (your puzzle input) of passwords (according to the corrupted database) and the corporate policy when that password was set.

For example, suppose you have the following list:
```
1-3 a: abcde
1-3 b: cdefg
2-9 c: ccccccccc
```
Each line gives the password policy and then the password. The password policy indicates the lowest and highest number of times a given letter must appear for the password to be valid. For example, 1-3 a means that the password must contain a at least 1 time and at most 3 times.

In the above example, 2 passwords are valid. The middle password, cdefg, is not; it contains no instances of b, but needs at least 1. The first and third passwords are valid: they contain one a or nine c, both within the limits of their respective policies.

How many passwords are valid according to their policies?

https://adventofcode.com/2020/day/2

In [77]:
passwords = [
    '1-3 a: abcde',
    '1-3 b: cdefg',
    '2-9 c: ccccccccc',
]

for l in passwords:
    policy, password = l.split(':')
    pattern, letters = policy.split(' ')
    low, high = [int(x) for x in pattern.split('-')]
    occurrences = len(password.split(letters)) - 1
    check = 'valid' if low <= occurrences <= high else 'invalid'
    print('{} is {}'.format(password, check))

 abcde is valid
 cdefg is invalid
 ccccccccc is valid
