# Day 1 AOC 2023
Something is wrong with global snow production, and you've been selected to take a look. The Elves have even given you a map; on it, they've used stars to mark the top fifty locations that are likely to be having problems.

You've been doing this long enough to know that to restore snow operations, you need to check all fifty stars by December 25th.

Collect stars by solving puzzles. Two puzzles will be made available on each day in the Advent calendar; the second puzzle is unlocked when you complete the first. Each puzzle grants one star. Good luck!

You try to ask why they can't just use a weather machine ("not powerful enough") and where they're even sending you ("the sky") and why your map looks mostly blank ("you sure ask a lot of questions") and hang on did you just say the sky ("of course, where do you think snow comes from") when you realize that the Elves are already loading you into a trebuchet ("please hold still, we need to strap you in").

As they're making the final adjustments, they discover that their calibration document (your puzzle input) has been amended by a very young Elf who was apparently just excited to show off her art skills. Consequently, the Elves are having trouble reading the values on the document.

The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.

For example:

```
1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
```

In this example, the calibration values of these four lines are 12, 38, 15, and 77. Adding these together produces 142.

Consider your entire calibration document. What is the sum of all of the calibration values?

In [None]:
sample = """1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
"""

In [None]:
lines = sample.split('\n')
lines

['1abc2', 'pqr3stu8vwx', 'a1b2c3d4e5f', 'treb7uchet', '']

In [None]:
line = 'treb7uchet'
[l for l in line]

['t', 'r', 'e', 'b', '7', 'u', 'c', 'h', 'e', 't']

In [None]:
line = 'treb7uchet'
assert [int(l) for l in line if l.isdigit()] == [7]
line = 'a1b2c3d4e5f'
assert [int(l) for l in line if l.isdigit()] == [1, 2, 3, 4, 5]

In [None]:
def extract_digits(line : str):
    digits = [int(l) for l in line if l.isdigit()]
    first = digits[0]
    last = digits[-1]
    return int(f'{first}{last}')

line = 'treb7uchet'
assert extract_digits(line) == 77
line = 'a1b2c3d4e5f'
assert extract_digits(line) == 15

In [None]:
lines = sample
print(lines)
print([extract_digits(line) for line in lines.split()])
assert [extract_digits(line) for line in lines.split()] == [12, 38, 15, 77]

1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet

[12, 38, 15, 77]


In [None]:
def process_data(data: str):
    return sum([extract_digits(line) for line in data.split()])
print(process_data(sample))
assert process_data(sample) == 142

142


In [None]:
from aocd import get_data
input_data = get_data(day=1, year=2023)
print(input_data[:100])

sq5fivetwothree1
six5gc
txb3qfzsbzbxlzslfourone1vqxgfive
3onethreebrth
cseven7nqqxnkzngndtddfiverkxk


In [None]:
print(process_data(input_data))
assert 54331 == process_data(input_data)

54331


**AI Prompt**

How did I do so far on part1 day 1 advent of code 2023? Anything worth improving?

**AI Response**

Your solution is clean and effective! You broke it down into nice small functions, used good type hints, and included assertions for testing. The `extract_digits` function is particularly well-structured.

A few minor suggestions if you're interested in optimization:
1. Since you're only using the first and last digits, you could avoid creating the full list of digits
2. The f-string for combining digits could be replaced with simple multiplication and addition

Would you like to see either of these optimizations, or would you prefer to move on to part 2 of the puzzle?

**AI Prompt**

Lets talk about 1

**AI Response**

Instead of creating a full list of digits, you could find just the first and last digits directly. Here's the concept:

For a string like 'treb7uchet', you could:
1. Find the first digit by scanning left-to-right until you hit a digit
2. Find the last digit by scanning right-to-left until you hit a digit

Would you like to try implementing this approach yourself, or would you like me to show you how it might look?

In [None]:
def extract_digits(line : str):
    for l in line:
        if l.isdigit():
            break
    first = l
    for l in line[::-1]:
        if l.isdigit():
            break
    last = l
        
    return int(f'{first}{last}')

line = 'treb7uchet'
assert extract_digits(line) == 77
line = 'a1b2c3d4e5f'
assert extract_digits(line) == 15

**AI Prompt**

Like that?

**AI Response**

Yes, exactly! That's a nice clean implementation that avoids creating the full list. It's more efficient since it stops as soon as it finds what it needs.

One small note: your solution assumes there will always be at least one digit in the string. For production code, you might want to add error handling for strings with no digits - but for this puzzle it's fine since we know the input format.

Would you like to look at the second optimization idea now (replacing the f-string with multiplication and addition)?

**AI Prompt**

sure

**AI Response**

Instead of using f-string `int(f'{first}{last}')`, you could use:

```python
int(first) * 10 + int(last)
```

This directly creates the two-digit number since the first digit needs to be in the tens place (multiply by 10) and the last digit in the ones place.

Would you like to try modifying your code to use this approach? Or would you like me to show you the complete modified version?

**AI Prompt**

Why exactly is it better?

**AI Response**

The multiplication/addition approach is marginally more efficient because:
1. It avoids creating a new string (the f-string)
2. It performs direct numeric operations rather than string manipulation followed by conversion

However, in practice for this puzzle, the difference is tiny and the f-string version might actually be more readable. I probably shouldn't have called it an "optimization" - it's more of an alternative approach than a meaningful improvement.

Would you like to move on to part 2 of the puzzle, or do you have any other questions about the current solution?

In [None]:
def extract_digits(line : str):
    for l in line:
        if l.isdigit():
            break
    first = int(l)
    for l in line[::-1]:
        if l.isdigit():
            break
    last = int(l)
        
    return first * 10 + last

line = 'treb7uchet'
assert extract_digits(line) == 77
line = 'a1b2c3d4e5f'
assert extract_digits(line) == 15

In [None]:
def process_data(data: str):
    return sum([extract_digits(line) for line in data.split()])
print(process_data(sample))
assert process_data(sample) == 142

print(process_data(input_data))
assert 54331 == process_data(input_data)

142
54331


# Part 2

--- Part Two ---
Your calculation isn't quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid "digits".

Equipped with this new information, you now need to find the real first and last digit on each line. For example:

```
two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen
```

In this example, the calibration values are 29, 83, 13, 24, 42, 14, and 76. Adding these together produces 281.

What is the sum of all of the calibration values?

In [None]:
sample_data = """two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen"""
print(sample_data)

two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen


**AI Prompt**

Okay lets create sample_data with the current data I defined plus your edge cases. Then let's also create a `sample_data_expected` which is a list of tuples with the first and last digits for that data.

**AI Response**

Good idea - this will help us test our solution thoroughly. Would you like me to combine the original sample with the new Part 2 sample, or should we start fresh with just the Part 2 examples? 

Also, just to clarify: for the expected results, should we include just the raw digits (like (2,9) for "two1nine"), or would you prefer the final calibration values (like 29)?

Let me know your preference and I'll help you set up the test data.

**AI Prompt**

okay yea sure final calibration is fine. And yea we can include all sample data from part 1 tooo.

**AI Response**

Great! Let's combine both sample sets and their expected results. I'll help you set up the test data:

```python
sample_data = """1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen"""

sample_data_expected = [
    12,    # "1abc2"
    38,    # "pqr3stu8vwx"
    15,    # "a1b2c3d4e5f"
    77,    # "treb7uchet"
    29,    # "two1nine"
    83,    # "eightwothree"
    13,    # "abcone2threexyz"
    24,    # "xtwone3four"
    42,    # "4nineeightseven2"
    14,    # "zoneight234"
    76     # "7pqrstsixteen"
]
```

Would you like to verify this setup before we proceed with solving Part 2?

**AI Prompt**

Can you come up with a list of edge cases as well in same format we just described.

**AI Response**

Let's think about some tricky edge cases for strings with written numbers. Here are some challenging scenarios:

```python
edge_cases = """oneight
twone
threeight
fiveighthree
eighthree
sevenine
nineight
one2one
4nineeight8eight
1eighthree2
"""

edge_cases_expected = [
    18,    # "oneight" (one + eight overlap)
    21,    # "twone" (two + one overlap)
    38,    # "threeight" (three + eight overlap)
    53,    # "fiveighthree" (five + eight + three)
    83,    # "eighthree" (eight + three overlap)
    79,    # "sevenine" (seven + nine overlap)
    98,    # "nineight" (nine + eight overlap)
    11,    # "one2one" (repeated number)
    48,    # "4nineeight8eight" (multiple numbers, mix of written and digits)
    12     # "1eighthree2" (mix of digits and written numbers)
```

These cases specifically test:
1. Overlapping number words
2. Multiple occurrences of numbers
3. Mix of written and digit numbers
4. Numbers that share letters at boundaries

Would you like me to add these to our previous test data, or would you like to modify any of these cases?

**AI Prompt**

Okay let's put into a list of tuples. 

[('1abc2', 12), ('pqr3stu8vwx', 38), ('a1b2c3d4e5f', 15), ..... ('4nineeight8eight', 48), ('1eighthree2', 12)]

**AI Response**

Here's how we can combine both sample sets into a list of tuples:

```python
test_cases = [
    ('1abc2', 12),
    ('pqr3stu8vwx', 38),
    ('a1b2c3d4e5f', 15),
    ('treb7uchet', 77),
    ('two1nine', 29),
    ('eightwothree', 83),
    ('abcone2threexyz', 13),
    ('xtwone3four', 24),
    ('4nineeightseven2', 42),
    ('zoneight234', 14),
    ('7pqrstsixteen', 76),
    # edge cases
    ('oneight', 18),
    ('twone', 21),
    ('threeight', 38),
    ('fiveighthree', 53),
    ('eighthree', 83),
    ('sevenine', 79),
    ('nineight', 98),
    ('one2one', 11),
    ('4nineeight8eight', 48),
    ('1eighthree2', 12)
]
```

Would you like to verify this structure before we proceed with using it to test our solution?

In [None]:
test_cases = [
    ('1abc2', 12),
    ('pqr3stu8vwx', 38),
    ('a1b2c3d4e5f', 15),
    ('treb7uchet', 77),
    ('two1nine', 29),
    ('eightwothree', 83),
    ('abcone2threexyz', 13),
    ('xtwone3four', 24),
    ('4nineeightseven2', 42),
    ('zoneight234', 14),
    ('7pqrstsixteen', 76),
    # edge cases
    ('oneight', 18),
    ('twone', 21),
    ('threeight', 38),
    ('fiveighthree', 53),
    ('eighthree', 83),
    ('sevenine', 79),
    ('nineight', 98),
    ('one2one', 11),
    ('4nineeight8eight', 48),
    ('1eighthree2', 12)
]

**AI Prompt**

Create me a dictionary with these keys mapping to their corresponding integer ---> one, two, three, four, five, six, seven, eight, and nine

**AI Response**

```python
word_to_digit = {
    'one': 1,
    'two': 2,
    'three': 3,
    'four': 4,
    'five': 5,
    'six': 6,
    'seven': 7,
    'eight': 8,
    'nine': 9
}
```

Would you like to create the reverse mapping (digit to word) as well, or shall we proceed with this dictionary?

In [None]:
word_to_digit = {
    'one': 1,
    'two': 2,
    'three': 3,
    'four': 4,
    'five': 5,
    'six': 6,
    'seven': 7,
    'eight': 8,
    'nine': 9
}

In [None]:
line = '4nineeightseven2'
words = [w for w in word_to_digit if w in line]
print(words)

['seven', 'eight', 'nine']


In [None]:
print(line)
word_idxs = [(word_to_digit[w], line.find(w)) for w in word_to_digit if w in line]
print(word_idxs)

4nineeightseven2
[(7, 10), (8, 5), (9, 1)]


In [None]:
print(line)
num_idxs = [(int(n), i) for i,n in enumerate(line) if n.isdigit()]
print(num_idxs)

4nineeightseven2
[(4, 0), (2, 15)]


In [None]:
print(line)
word_num_idxs = sorted(word_idxs + num_idxs, key=lambda x:x[-1])
print(word_num_idxs)

4nineeightseven2
[(4, 0), (9, 1), (8, 5), (7, 10), (2, 15)]


In [None]:
def process_line(line:str):
    word_idxs = [(word_to_digit[w], line.find(w)) for w in word_to_digit if w in line]
    num_idxs = [(int(n), i) for i,n in enumerate(line) if n.isdigit()]
    all_idxs = sorted(word_idxs + num_idxs, key=lambda x:x[-1])
    return all_idxs[0][0] * 10 +  all_idxs[-1][0]

In [None]:
print(line)
process_line(line)

4nineeightseven2


42

In [None]:
for line, expected in test_cases:
    if process_line(line) != expected:
        print('WRONG ANSWER')
        print(line)
        print(f'Actual: {line}, {expected}')
        print(f'Processed: {line}, {process_line(line)}')

WRONG ANSWER
one2one
Actual: one2one, 11
Processed: one2one, 12


**AI Prompt**

Oh I see the issue. I am using find which only finds the first occurence of the word. I need to find all of them actually. I think there is a function called index right?

**AI Response**

Actually, to find all occurrences, you might want to use `findall()` from the `re` module, or you could use a different string method called `find()` repeatedly by providing a start position.

The `index()` method is similar to `find()` but it raises an exception when the substring isn't found, while `find()` returns -1. Neither of these methods find all occurrences by default.

Would you like me to show you how to use either:
1. A loop with `find()` to get all positions
2. The regex `findall()` approach

Which would you prefer to try?

**AI Prompt**

I think `re` is more performant right? I also suck at regex cause I don't use it often. Can you show me some examples of its use?

**AI Response**

I'll show you some basic regex examples using `re.findall()` that are relevant to our problem:

```python
import re

text = "one2one"

# Find all digits
re.findall(r'\d', text)  # ['2']

# Find all occurrences of 'one'
re.findall(r'one', text)  # ['one', 'one']

# Get positions using re.finditer()
list(re.finditer(r'one', text))  # will give match objects with position info

Would you like me to show you how to use these patterns in our solution, or would you like to see more regex examples first?
```

In [None]:
line = 'one2one'
import re
list(re.finditer(r'one', line))

[<re.Match object; span=(0, 3), match='one'>,
 <re.Match object; span=(4, 7), match='one'>]

**AI Prompt**

I just need to parse this output right to get the locations?

**AI Response**

Yes! Each match object has a `span()` method or attribute that gives you the (start, end) positions. You can access them like this:

```python
for match in re.finditer(r'one', line):
    start, end = match.span()  # or match.start(), match.end()
    print(f"Found at position {start}")
```

Would you like me to show you how to use this in our solution to get all the number positions?

**AI Prompt**

Unless there is a more pythonic approach. A loop with find might be easier to read? Thoughts?

**AI Response**

Both approaches have their merits:

`find()` loop approach:
```python
def find_all(s, sub):
    positions = []
    pos = 0
    while True:
        pos = s.find(sub, pos)
        if pos == -1:
            break
        positions.append(pos)
        pos += 1
    return positions
```

Pros:
- More readable for simple pattern matching
- No regex syntax to remember
- Easy to understand what's happening

Regex approach:
```python
positions = [m.start() for m in re.finditer(r'one', s)]
```

Pros:
- More concise
- Generally faster for complex patterns
- More powerful for complex matching needs

For our specific case with simple word matching, the `find()` loop might be more maintainable. Would you like to try that approach?

In [None]:
print(line)
[m.start() for m in re.finditer(r'one', line)]

one2one


[0, 4]

In [None]:
[m.start() for word in word_to_digit for m in re.finditer(r'{word}', line)]

[]

In [None]:
print(line)
for word in word_to_digit:
    print(word)
    locs = [m.start() for m in re.finditer(word, line)]
    print(locs)

one2one
one
[0, 4]
two
[]
three
[]
four
[]
five
[]
six
[]
seven
[]
eight
[]
nine
[]


In [None]:
def process_line(line:str):
    word_idxs = []
    for word in word_to_digit:
        locs = [m.start() for m in re.finditer(word, line)]
        for idx in locs:
            word_idxs.append((word_to_digit[word],idx))
    num_idxs = [(int(n), i) for i,n in enumerate(line) if n.isdigit()]
    all_idxs = sorted(word_idxs + num_idxs, key=lambda x:x[-1])
    return all_idxs[0][0] * 10 +  all_idxs[-1][0]

In [None]:
for line, expected in test_cases:
    if process_line(line) != expected:
        print('WRONG ANSWER')
        print(line)
        print(f'Actual: {line}, {expected}')
        print(f'Processed: {line}, {process_line(line)}')

In [None]:
def process_lines(input_data: str):
    return sum([process_line(l) for l in input_data.split()])

In [None]:
process_lines("""two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen""")

281

In [None]:
process_lines(input_data)

54518