# Day 1 — Trebuchet?!

- [Part 01](#—-Part-01)
- [Part 02](#—-Psrt-02)

In [118]:
import re
import numpy as np
import pandas as pd 

Parse calibration document:

In [119]:
data_dict = {'string': []}

In [120]:
with open(file='day_01_input.txt') as file:
    for text in file.read().split('\n'):
        # Append text to data dictionary
        data_dict['string'].append(text)

Create a new dataframe:

In [121]:
df = pd.DataFrame(data_dict)

In [122]:
df.head()

Unnamed: 0,string
0,tsgbzmgbonethreedrqzbhxjkvcnm3
1,7qlpsnhbmnconeeight78
2,prbqsfzqn57
3,ctrv3hmvjphrffktwothree
4,9six9qbgcvljfvccdjslspprgonenine


### — Part 01

On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number. What is the sum of all of the calibration values?

In [123]:
def find_value(x):
    """
    Extract all digits from string in order of occurrence, return first and last digit.
    """
    numeric_vals = "".join(re.findall(r'[0-9]+', x))
    return numeric_vals[0] + numeric_vals[-1]

Add new column `calibration_val` to dataframe:

In [124]:
df['calibration_val'] = df['string'].apply(lambda x: find_value(x)).astype(int)

In [125]:
df.head()

Unnamed: 0,string,calibration_val
0,tsgbzmgbonethreedrqzbhxjkvcnm3,33
1,7qlpsnhbmnconeeight78,78
2,prbqsfzqn57,57
3,ctrv3hmvjphrffktwothree,33
4,9six9qbgcvljfvccdjslspprgonenine,99


Get sum of `calibration_val`:

In [126]:
df['calibration_val'].sum()

55002

### — Part 02

It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid "digits". Equipped with this new information, you now need to find the real first and last digit on each line.

In [127]:
def detect_numeric_text(x):
    """
    Converts or extracts all numeric text / digits from string in order of occurrence, return first and last digit.
    """
    number_map = {'one': '1', 'two': '2', 'three': '3', 'four': '4', 'five': '5', 'six': '6', 'seven':'7', 'eight': '8', 'nine': '9'}
    
    # Store starting index of each string / digit
    index_pos = {}
    
    # Find digits in string
    for index, char in enumerate(x):
        if char.isnumeric():
            index_pos[index] = char
            
    # Find numeric text in string:
    # Use sliding window to determine if the text within window is also in number_map, aka a valid number
    # Since all valid numeric words have a length between 3-5, I'll iterate through the input three times increasing the window length from 3 up to 5
    
    window = 3  # Starting size of window
    while window <= 5:
        start = 0  # Starting index of window
        temp_string = ""  # Temp variable to store characters in current window
        
        # SLIDING WINDOW ---
        for end, char in enumerate(x):
            # Add current char to temp variable
            temp_string += char
            if end - start + 1 == window:
                # Check if temp string in number_map, if so, add starting index of current window index_pos with temp string as its value
                if temp_string in number_map:
                    index_pos[start] = number_map[temp_string]
                # Remove first character from temp string
                temp_string = temp_string[1:]
                # Increment starting index by 1
                start += 1
        
        # Increase window size
        window += 1
                
    # Get first and last numeric value in 'x' by grabbing min / max index keys stored in index_pos
    first_digit, last_digit = index_pos[min(index_pos.keys())], index_pos[max(index_pos.keys())]
    return first_digit + last_digit       

Add new column `real_calibration_val` to dataframe:

In [128]:
df['real_calibration_val'] = df['string'].apply(lambda x: detect_numeric_text(x)).astype(int)

In [129]:
df.head()

Unnamed: 0,string,calibration_val,real_calibration_val
0,tsgbzmgbonethreedrqzbhxjkvcnm3,33,13
1,7qlpsnhbmnconeeight78,78,78
2,prbqsfzqn57,57,57
3,ctrv3hmvjphrffktwothree,33,33
4,9six9qbgcvljfvccdjslspprgonenine,99,99


Get sum of `real_calibration_val`:

In [130]:
df['real_calibration_val'].sum()

55093