# Homework 6 Solutions: Fall 2021

# Problem 1:  Game Scores

You are writing a library for a video game.  Each user has a list of all their scores.  

You must take a list of scores, and return the most recent score, the best score, and the top three scores.

Hint: To find the top three, investigate the Python list method sort() and function sorted()

<div class="alert alert-block alert-info">
Fill in your function below.
</div>

In [None]:
from typing import List

def recent(scores: List[int]) -> int:
    """The most recent score"""
    return scores[-1]


def best(scores: List[int]) -> int:
    """The best score"""
    return max(scores)


def top_triple(scores: List[int]) -> List[int]:
    """The three best scores"""
    return sorted(scores, reverse=True)[:3]  # Slice is Forgiving

### Sort the list without changing the list

sort() alters the list.

sorted() copies the list, and sorts the copy

**We sort the list 'scores' in reverse order**

Both sort() and sorted() take an optional Boolean parameter 'reverse'

*Because it is optional, you need to specify it*

```python
    return sorted(scores, reverse = True)
```

### Test cases for Game Scores

In [None]:
def test_game_scores():
    assert recent([100, 0, 90, 30]) == 30
    assert best([40, 100, 70]) == 100
    
    
    scores = [10, 30, 90, 30, 100, 20, 10, 0, 30, 40, 40, 70, 33]  
    assert recent(scores) == 33
    assert top_triple(scores) == [100, 90, 70]
    assert best(scores) == 100
    assert recent(scores) == 33, "Did you reorder the list?"
    
    assert top_triple([20, 10, 30]) == [30, 20, 10]
    assert top_triple([40, 20, 40, 30]) == [40, 40, 30] 
    assert top_triple([30, 70]) == [70, 30] 
    assert top_triple([30]) == [30] 
    assert top_triple([]) == [] 
    
    return('Success!')

test_game_scores()

# Problem 2: Mystical Listicle

You are given a list.  Some of the items in the list are themselves lists.  Sublists will not contain lists.  

Return a list of the items in order.

You can use the method isinstance() to tell if an item is a list.

In [None]:
## Demo of isinstance()

lst = [1, [2], [3, 4]]

for item in lst:
    if isinstance(item, list):
        print(item)

<div class="alert alert-block alert-info">
Fill in your function below.
</div>

In [None]:
from typing import List

def listicle(lst: List) -> List:
    """Flatten a list"""
    result = []
    for item in lst:
        if isinstance(item, list):
            result = result + item
        else:
            result.append(item)
            
    return result

### Unit Tests

In [None]:
def test_listicle():
    assert listicle([]) == []
    assert listicle([1]) == [1]
    assert listicle([0, 1, 2]) == [0, 1, 2]
    assert listicle([1, [2]]) == [1, 2]
    assert listicle([[1], 2]) == [1, 2]
    assert listicle([1, [2, 3, 4, 5, 6, 7], 8]) == [1, 2, 3, 4, 5, 6, 7, 8]
    assert listicle([[]]) == []  
    assert listicle([1, []]) == [1] 
    assert listicle([1, [], 2]) == [1, 2] 
    
    print('Success!')
    
test_listicle()
    

# Problem 3: Parentheses
### Decide if a string contains valid nested parentheses

You are given a string consisting only of parentheses - (, ), {, }, [, and ]. Write a Boolean function is_valid_parens() that takes a string and decides if it consists of valid nested parenthesis.

Hint: Your function should take open parentheses, such as '(',  and 'push it on a stack' and should take closing parentheses, and pop the stack and compare.  If the close parenthesis doesn't match the open parenthesis on top of the stack, the string is invalid.  If the stack is empty too soon, or is not empty when you finish the string, the string is invalid.  

You can read about stacks here: 

https://en.wikipedia.org/wiki/Stack_(abstract_data_type)

Implement your stack with a list, pushing and poping the final element.  

<div class="alert alert-block alert-info">
Fill in your function below.
</div>

In [None]:
# Takes a string, and returns a Boolean 
#  '{()[{}]}' is valid:    return True
#  '{()[{}}' is not:       return False
def is_valid_parens(s):
    """Is this a well-nested set of parentheses?"""
    stack = []   
    pairs = {'[':']', '{':'}', '(':')'}   # a dictionary for things that count as pairs
    
    # loop through each character in the string
    for ch in s:
        # Is this an open paren?
        if ch in pairs:
            stack.append(ch)
        else:
            # Close paren
            try: 
                cur = stack.pop()
            except IndexError:
                return False
            
            # Does the pair match?
            if pairs[cur] != ch:
                return False

    # If the list is empty, we have a valid expression
    return len(stack) == 0 

### Test case for is_valid_parens()

In [None]:
def test_parens():
    assert(is_valid_parens(""))
    assert(is_valid_parens("[]"))
    assert(is_valid_parens('{()[{}]}'))
    assert(is_valid_parens("{}"))
    assert(is_valid_parens("{[]}"))
    assert(is_valid_parens("{}[]"))
    assert(is_valid_parens("([{}({}[])])"))

    assert not is_valid_parens('{()[{}}]'), 'Interlaced parentheses'
    assert not is_valid_parens("[["), "Unmatched opens"
    assert not is_valid_parens("}{"), "Unmatched close"
    assert not is_valid_parens("{]"), "Mismatched parentheses"
    assert not is_valid_parens("{[])"), "Mismatched parentheses"
    assert not is_valid_parens("{[)][]}"), "Mismatched parentheses"
    assert not is_valid_parens("([{])"), "Mismatched parentheses"
    assert not is_valid_parens("[({]})"), "Mismatched parentheses"
    
    return 'Success!'

test_parens()

# Problem 4: Find Large Files

Write a function that takes a directory and a size in bytes, 
and returns a list of files in the directory or below that 
are larger than the size.  

*For example, you can use this function to look for files larger than 1 Meg below your Home directory.*

You will find a Python function that gives you the size of a file in the os.path library: 

https://pymotw.com/3/os.path/

<div class="alert alert-block alert-info">
Fill in your function below.
</div>

In [None]:
import os

def find_large_files(dirname, filesize):
    """Return a list of large files below this point"""

    result = []                                    

    # Walk over the files in this directory
    for name in os.listdir(dirname):
    
        # Construct a full path
        path = os.path.join(dirname, name)

        # print filenames, and traverse directories
        if os.path.isfile(path):
            if os.path.getsize(path) > filesize:         # Check file type
                result.append(path)               
        else:
            result = result + find_large_files(path, filesize)   

    return result                                 

## Add some error handling

In [None]:
import os

def find_large_files(dirname, filesize):
    """Return a list of large files below this point"""

    result = []                                    

    # Walk over the files in this directory
    for name in os.listdir(dirname):
    
        # Construct a full path
        path = os.path.join(dirname, name)

        # print filenames, and traverse directories
        if os.path.isfile(path):
            try:
                if os.path.getsize(path) > filesize:         # Check file type
                    result.append(path)      
            except:
                print(f"Could not getsize of {path}")
        else:
            result = result + find_large_files(path, filesize)   

    return result                                 

## Show your program in action
Give the parameters and show the results for your program

I looked for files larger than a Megabyte found below the directory one step up.   

In [None]:
lst = find_large_files('..', 1048576)
print(len(lst))

for path in lst:
    print(path)

# Problem 5: Wine and Beer

Find the top beer and wine suppliers listed in a CSV file.

The CSV file Beer_Wine.csv lists beer and wine suppliers to one state. Each line of the file records a different supplier, and includes a 5 or 9 digit zip code. When you see a 9-digit zip code, truncate the last 4 digits.  Find the 5-digit zip codes that hold the most suppliers to the state. 

Your function should return a list of lists, with the frequency and the zip code. Organize the list in decreasing order of frequency.
Here are three items from my list of 720 zip codes
```python
    [ ... [9, '65616'], [8, '94573'], [8, '63103'] ...]
```
This tells me that Branson, Missouri, has 9 beer or wine suppliers to the state, and Rutherford, California, has 8.
Print the number of suppliers and the zip code for the 10 most common zip codes in the file.

Use the csv library to read the textfile. Use the idiom "Dictionary as a Counter" (section 11.1 of Downey) to count the number of times you see each zip code. Traverse the Dictionary and build a list of lists, and use the functions sort() or sorted() to organize your list so the most common zip codes are first.

To validate your results, you can check the three zip codes above, and you can use Google to map the zip code to a location and check that that is a likely source for wine or beer.

<div class="alert alert-block alert-info">
Fill in your function below.
</div>

## Validation

Here is a simple validator for zip codes

It checks the length, and that all characters are digits using sets.

This will have the side effect of discarding the header, where the field reads 'Zip Code'

In [None]:
import string

def is_valid_zip(zip_string):
    """Is this a valid zip code?"""

    # Is the length right?
    if not (len(zip_string) in [5, 9]):
        return False

    # Is each character in zip a digit?
    return set(zip_string).issubset(set(string.digits))

## Try this out

In [None]:
for s in ['1234', '12345', '123456', '123456789', 'abcde', '123r5']:
    print(f'{s = }\t {is_valid_zip(s)}')

### Read in the Data

This uses the validation routine, tossing out rows we think are invalid

In practice, you would want at least a count of how many were tossed

### Defaultdict

Used to simplify the coding

### Exceptions

We'll handle an unexpected error below

In [None]:
# Beer and Wine problem
# Find the top zip codes
#
# Jeff Parker	2021

from typing import List

import sys
import csv
from collections import defaultdict

def import_file(fileName):
    """Takes a filename, returns a dictionary of zip codes"""

    # Dictionary as Counter Design Pattern
    # Dictionary to hold count for each Zip Code
    # We use a Default Dict to simplify the coding
    d = defaultdict(int)

    try:                             # Try to open file: EFAP
        with open(fileName, 'rt') as f:
            reader = csv.reader(f)

            # Get a new line from file
            for row in reader:
                zip_string = row[1]

                # If this is a valid zip code
                if is_valid_zip(zip_string):

                    # Trim it to 5 digits
                    zip_string = zip_string[:5]
                    d[zip_string] = 1 + d[zip_string]

    except FileNotFoundError:
        print(f"No such file: {fileName}")
    except Exception as err:
        print(f"Exception: {err}")

    return d

### Test the general Exception statement

In [None]:
res = import_file('..')

### What I see

```python
Exception: [Errno 21] Is a directory: '..'
```

This is generated by the two lines below: I didn't expect this error, so I'm depending on Python to describe it. 

```python
    except Exception as err:
        print(f"Exception: {err}")
```

### Take the Dictionary and return a sorted list

**We sort the list 'res' in reverse order**

Both sort() and sorted() take an optional Boolean parameter 'reverse'

*Because it is optional, you need to specify it*

```python
    return sorted(res, reverse = True)
```

In [None]:
def build_list(d):
    """Take the dictionary and return a list of most popular Zip Codes"""

    # Build a list of count and zipcode
    res = [[d[zip], zip] for zip in d]

    # Sort so the largest are first
    res.sort(reverse=True)

    return res


def list_zips(filename):
    return build_list(import_file(filename))

## Main function from .py file

I wrote this as a program to run from the command line.

This isn't useful in a notebook: we'll call it directly below

In [None]:
# Read in file name from command line
if (len(sys.argv) < 2):
    print("Usage:", sys.argv[0], "<filename>")
else:
    # Get a dictionary counting instances of each zip code
    d   = import_file(sys.argv[1])
    
    # Build a sorted list from the dictionary
    lst = build_list(d)

    for pair in lst[:10]:
        print(pair)

## We need to call it directly

In [None]:
lst = list_zips("../../Data/Beer_Wine.csv")

print(len(lst))
for pair in lst[:10]:
    print(pair)

## Does this make sense?

Zip code 94558 is in Napa County, wine country.  

https://www.unitedstateszipcodes.org/94558/

# Find the longest sequence of consecutive zip codes

64110, 64111, and 64112 are all Zip codes found in the file.

Find the longest sequence of consecutive zip codes in the file.  If two sequences have the same length, print the first one.  

### Approach

My first draft took the list 
```python
    [ ... [9, '65616'], [8, '94573'], [8, '63103'] ...]
```
and transformed it into
```python
    [ ... ['65616', 9], ['94573', 8], ['63103', 8] ...]
```
This version transforms it into this list
```python
    [ ... '65616', '94573', '63103' ...]
```

In [None]:
# While writing this, I had multiple print statements
# I am leaving them in to show the kinds of things I print
# The DEBUG flag lets me turn off the print statements when I think it works
# ... and yet leaves the print statements in place should I find it doesn't work
DEBUG = True

def find_longest_sequence(lst):
    """Find the longest sequence of consecutive zip Codes"""
    
    # Take existing list and pull out only the zip code, toss frequency
    new_lst = []
    for item in lst:
        # Transform the strings into integers
        new_lst.append(int(item[1]))
        
    # Today we would write this as a List Comprehension
    # new_lst = [int(item[1]) for item in lst]
    
    # Did we get the list right?  
    #    Print after every transformation
    if DEBUG:
        for item in lst[:10]:
            print(item)
    
    # Sort the zip codes to find consecutive elements
    # I am making a copy: I could have called sort() here
    lst = sorted(new_lst)
    
    # We 'prime the pump' and get the first element
    prev = lst[0]
    
    pos = -1             # Current position
    max_len = 1          # Longest sequence so far
    max_pos = -1         # Starting point of longest sequence
    ln = 1               # Length of current run
    
    # Traverse the rest of the list
    for pos, curr in enumerate(lst[1:]):
        # Watch each step
        if DEBUG:
            print(f"{pos = } {curr = }")
            
        # Is this a consecutive zip code?
        if prev + 1 == curr:
            ln = ln + 1
            if ln > max_len:
                max_len = ln
                max_pos = pos - ln + 2   # List starts at pos = -1
                
                # Are we seeing the sequences?
                if DEBUG:
                    print(f"\t{max_pos = } {max_len = }")
        else:
            # Not consecutive
            ln = 1
            
            if DEBUG:
                print(f"\t reset")
        
        prev = curr
        
    # Prepare the result.  To check my work, I pull directly from the list
    # While I could print this here, I'll soon have the return value to print
    result = []
    for pos in range(max_pos, max_pos + max_len):
        result.append(lst[pos])
        
    return result

## Print the zip codes in the longest consecutive sequence in the file

In [None]:
seq = find_longest_sequence(lst)

print(seq)

## Turn off Debugging and run again

In [None]:
DEBUG = False

seq = find_longest_sequence(lst)

print(seq)