## Solving day 4 of Advent of Code 2020 using dictionaries

Wee bit of house keeping to get the data loader.  Need to edit the loader a little to work around lack of `__main__.__file__` when running in a notebook:

In [1]:
# %load ../../common/loaders.py
import os
import __main__


def load_string():
    """
    Returns each line from the input file

    Returns
    -------
    : list of strings
        Each line from the index file
    """
    filepath = os.path.join("..", "input.txt")  # This is the edited line for the notebook
    with open(filepath) as file_handle:
        contents = file_handle.readlines()
    return [line.strip() for line in contents]

Quick look at the first 10 lines of the data to check the load:

In [2]:
data = load_string()
data[:10]

['byr:2024 iyr:2016',
 'eyr:2034 ecl:zzz pid:985592671 hcl:033b48',
 'hgt:181 cid:166',
 '',
 'hgt:66cm',
 'pid:152cm',
 'hcl:cfb18a eyr:1947',
 'byr:2020 ecl:zzz iyr:2029',
 '',
 'ecl:gry hcl:#888785 eyr:2023 cid:63']

That looks as expected, so on to processing the data

### Part 1

Requirement here is to:

1. Parse the input data to extract passport information (passports separated by blank lines, within a passport whitespace separates individual fields, fields within a passport are key:value pairs with a colon to separate key from value).
2. Count number of valid passports, where passport is valid if it contains all these keys: byr, iyr, eyr, hgt, hcl, ecl, pid.

First, we'll set up a function to convert the data into a list of dicts:

In [3]:
def generate_passports(data):
    """
    Converts data into a list of dicts.  

    Each dict represents a single passport.  Passports are delineated in the
    data by a blank line.

    """
    passports = []
    passport = {}
    for line in data:
        if not line:  # Relies on "" (i.e. blank line) evaluating to False
            passports.append(passport)
            passport = {}
            continue
        for pairs in line.split():
            items = pairs.split(':')
            passport[items[0]] = items[1]
    passports.append(passport)  # Need to include last passport since input
                                # doesn't have a blank line at the end
    return passports

A quick test that generate_passports works roughly as expected using examples from the instructions:

In [4]:
test_data = ["ecl:gry pid:860033327 eyr:2020 hcl:#fffffd",
        "byr:1937 iyr:2017 cid:147 hgt:183cm",
        "",
        "iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884",
        "hcl:#cfa07d byr:1929",]
test_result = generate_passports(test_data)

import pprint
pprint.pprint(test_result)

[{'byr': '1937',
  'cid': '147',
  'ecl': 'gry',
  'eyr': '2020',
  'hcl': '#fffffd',
  'hgt': '183cm',
  'iyr': '2017',
  'pid': '860033327'},
 {'byr': '1929',
  'cid': '350',
  'ecl': 'amb',
  'eyr': '2023',
  'hcl': '#cfa07d',
  'iyr': '2013',
  'pid': '028048884'}]


Now need to define a function to validate a single passport:

In [5]:
def is_valid_passport(passport):
    """Returns True for valid passport, False otherwise.

    Valid passport defined as including the following keys:
    byr (Birth Year)
    iyr (Issue Year)
    eyr (Expiration Year)
    hgt (Height)
    hcl (Hair Color)
    ecl (Eye Color)
    pid (Passport ID)

    """
    required_keys = ("byr", "iyr", "eyr", "hgt", "hcl", "ecl", "pid")
    result = True
    for key in required_keys:
        if key not in passport.keys():
            result = False
    return result

Verify with the test data.  From instructions, the first test data
passport is valid while the second is not:


In [6]:
[is_valid_passport(passport) for passport in test_result]

[True, False]

Now run on full set of passports.  Note we only want the total number of valid
passports.  Hence, use a listcomp to filter all passports down to a list of just valid passports, and the length of this resulting list is the count of valid passports:


In [7]:
passports = generate_passports(data)
len([passport for passport in passports if is_valid_passport(passport)])

247

### Part 2

This has a different validation - needs quite a bit more code.  The requirements are listed in the docstring for `is_valid_passport_part2`:


In [8]:
import re

def is_in_range(number, minimum, maximum):
    """Returns true if number is between minimum and maximum, inclusive."""
    try:
        result = minimum <= int(number) <= maximum
    except ValueError:
        result = False
    return result

def is_valid_hcl(hcl):
    """Returns true if hair colour is valid.
    
    Hair colour is valid if it is: "# followed by 6 characters, 0 to 9 or a to f"
    """
    result = True
    if len(hcl) != 7:
        result = False
    if hcl.startswith("#") is False:
        result = False
    if hcl[1:].isalnum is False:
        # This check is that all characters are letters or digits...
        result = False
    if re.search(r'g-z', hcl[1:]):
        # ... and this check is that the letters g to z aren't in the hcl.
        # Assumption here that there's no upper case letters in the hcl
        result = False
    return result

def is_valid_pid(pid):
    """Returns true if passport ID is valid.
    
    passport ID is valid if it is a 9 digit number including leading zeroes."""
    # Avoid leading zeroes question since pid is still a string at this point - might be
    # something to enforce w/ type hinting?
    result = True
    if len(pid) != 9:
        result = False
    if pid.isdecimal is False:
        result = False
    return result

def is_valid_passport_part2(passport):
    """Returns True for valid passport, False otherwise.

    Valid passport defined as including the following keys and values:
    byr (Birth Year) 1920 to 2002
    iyr (Issue Year) 2010 to 2020
    eyr (Expiration Year) 2020 to 2030
    hgt (Height) 150 to 193cm or 59 to 76in
    hcl (Hair Color) # followed by 6 characters, 0 to 9 or a to f
    ecl (Eye Color) exactly one of: amb blu brn gry grn hzl oth
    pid (Passport ID) 9 digit number, including leading zeroes

    """
    valid_ecls = ('amb', 'blu', 'brn', 'gry', 'grn', 'hzl', 'oth')
    try:
        result = is_in_range(passport['byr'], 1920, 2002)
        result = result and is_in_range(passport['iyr'], 2010, 2020)
        result = result and is_in_range(passport['eyr'], 2020, 2030)
        # Height needs a little more processing for the units:
        height = passport['hgt']
        min_height = 150
        max_height = 193
        if height[-2:] == 'in':
            min_height = 59
            max_height = 76
        result = result and is_in_range(height[:-2], min_height, max_height)
        result = result and passport['ecl'] in valid_ecls
        result = result and is_valid_hcl(passport['hcl'])
        pid = passport['pid']
        result = result and is_valid_pid(pid)
    except KeyError:
        # More code in the try...except than is usual.  This lets us abort as 
        # soon as any required key is missing, rather than proceed with checks 
        # for validity for the other fields.
        result = False

    return result

2 valid and 2 invalid passport tests from the instructions.  Let's check that the test data loads sensibly:

In [9]:
test_data_part2 = ["pid:087499704 hgt:74in ecl:grn iyr:2012 eyr:2030 byr:1980",
"hcl:#623a2f",
"",
"eyr:2029 ecl:blu cid:129 byr:1989",
"iyr:2014 pid:896056539 hcl:#a97842 hgt:165cm",
"",
"eyr:1972 cid:100",
"hcl:#18171d ecl:amb hgt:170 pid:186cm iyr:2018 byr:1926",
"",
"iyr:2019",
"hcl:#602927 eyr:1967 hgt:170cm",
"ecl:grn pid:012533040 byr:1946",
]
test_result_part2 = generate_passports(test_data_part2)
pprint.pprint(test_result_part2)

[{'byr': '1980',
  'ecl': 'grn',
  'eyr': '2030',
  'hcl': '#623a2f',
  'hgt': '74in',
  'iyr': '2012',
  'pid': '087499704'},
 {'byr': '1989',
  'cid': '129',
  'ecl': 'blu',
  'eyr': '2029',
  'hcl': '#a97842',
  'hgt': '165cm',
  'iyr': '2014',
  'pid': '896056539'},
 {'byr': '1926',
  'cid': '100',
  'ecl': 'amb',
  'eyr': '1972',
  'hcl': '#18171d',
  'hgt': '170',
  'iyr': '2018',
  'pid': '186cm'},
 {'byr': '1946',
  'ecl': 'grn',
  'eyr': '1967',
  'hcl': '#602927',
  'hgt': '170cm',
  'iyr': '2019',
  'pid': '012533040'}]


And then check that the first two are valid and the second two invalid:

In [10]:
[is_valid_passport_part2(passport) for passport in test_result_part2]

[True, True, False, False]

And finally get the results with a similar listcomp to part 1:

In [11]:
len([passport for passport in passports if is_valid_passport_part2(passport)])

145

And just to finish with a flourish, an f-string with both listcomps embedded:

In [12]:
print("Results are "
      f"{len([passport for passport in passports if is_valid_passport(passport)])}"
      " for part 1, and "
      f"{len([passport for passport in passports if is_valid_passport_part2(passport)])}"
      " for part 2.")

Results are 247 for part 1, and 145 for part 2.
