## Part 1

Count the number of valid passports - those that have all required fields. Treat cid as optional. In your batch file, how many passports are valid?

In [1]:
import pandas as pd
import re

In [2]:
# Read in data

with open('04input.txt', 'r') as file:
    input_list = [' '.join(x.split('\n')) for x in file.read().split('\n\n')]
        

In [3]:
# Into a df

entries_df = pd.DataFrame(input_list, columns=['entry'])

In [4]:
# Cols indicating if field exists

fields = ['byr', 'iyr', 'eyr', 'hgt', 'hcl', 'ecl', 'pid', 'cid']
relevant_fields = ['byr', 'iyr', 'eyr', 'hgt', 'hcl', 'ecl', 'pid']

for field in fields:
    entries_df[field] = entries_df.apply(lambda x: field in x['entry'], axis=1)

In [5]:
print(f'Number of valid passports: {len(entries_df[entries_df[relevant_fields].all(axis=1)])}')

Number of valid passports: 242


## Part 2

You can continue to ignore the cid field, but each other field has strict rules about what values are valid for automatic validation:

byr (Birth Year) - four digits; at least 1920 and at most 2002.

iyr (Issue Year) - four digits; at least 2010 and at most 2020.

eyr (Expiration Year) - four digits; at least 2020 and at most 2030.

hgt (Height) - a number followed by either cm or in:
If cm, the number must be at least 150 and at most 193.
If in, the number must be at least 59 and at most 76.

hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.

ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.

pid (Passport ID) - a nine-digit number, including leading zeroes.

cid (Country ID) - ignored, missing or not.

Your job is to count the passports where all required fields are both present and valid according to the above rules.

In [6]:
def valid_number(text, regex, low, high):
    try:
        found = int(re.search(regex, text).group(1))
    except:
        return False
    if (found >= low) and (found <= high):
        return True
    else:
        return False

In [7]:
def valid_hgt(text):
    try:
        num = int(re.search('(?<=hgt:)([0-9]{2,3})([ci][mn])', text).group(1))
        measure = re.search('(?<=hgt:)([0-9]{2,3})([ci][mn])', text).group(2)
    except:
        return False
    if (measure == 'cm') and (num >= 150) and (num <= 193):
        return True
    elif(measure == 'in') and (num >= 59) and (num <= 76):
        return True
    else:
        return False

In [8]:
def valid_hcl(text):
    if re.search('(?<=hcl:)(#[A-z0-9]{6})(?![A-z0-9])', text):
        return True
    else:
        return False

In [9]:
def valid_ecl(text):
    valid_colours = ['amb', 'blu', 'brn', 'gry', 'grn', 'hzl', 'oth']
    try:
        found = re.search('(?<=ecl:)([a-z]{3})(?![a-z])', text).group(1)
    except:
        return False
    if found in valid_colours:
        return True
    else:
        return False

In [10]:
def valid_pid(text):
    try:
        found = int(re.search('(?<=pid:)(\d{9})(?![0-9])', text).group(1))
        return True
    except:
        return False


In [11]:
entries_df['byr_valid'] = entries_df.apply(lambda x: valid_number(x['entry'], '(?<=byr:)(\d{4})', 1920, 2002), axis=1)
entries_df['iyr_valid'] = entries_df.apply(lambda x: valid_number(x['entry'], '(?<=iyr:)(\d{4})', 2010, 2020), axis=1)
entries_df['eyr_valid'] = entries_df.apply(lambda x: valid_number(x['entry'], '(?<=eyr:)(\d{4})', 2020, 2030), axis=1)
entries_df['hgt_valid'] = entries_df.apply(lambda x: valid_hgt(x['entry']), axis=1)
entries_df['hcl_valid'] = entries_df.apply(lambda x: valid_hcl(x['entry']), axis=1)
entries_df['ecl_valid'] = entries_df.apply(lambda x: valid_ecl(x['entry']), axis=1)
entries_df['pid_valid'] = entries_df.apply(lambda x: valid_pid(x['entry']), axis=1)

In [12]:
relevant_fields_new = [field + '_valid' for field in relevant_fields]

In [13]:
print(f'Number of valid passports: {len(entries_df[entries_df[relevant_fields_new].all(axis=1)])}')

Number of valid passports: 186
