**Meta-analysis**: part 1 of this took me a LONG time, because I misunderstood the description below which says "using one character for each block" to indicate that by definition a block can represent only one character, rather than that this was a shorthand representation they were adopting only for this illustration. (This bothered me at the time, because how do you representing multidigit IDs with one char?) Then the later description of moving "file blocks one at a time" implies fragmenting multidigit numbers by moving blocks representing individual digits.

I think reading very carefully and translating into a case analysis would have helped. Also noticing the missing abstraction for a "block extent".

He shows you the disk map (your puzzle input) he's already generated. For example:

```
2333133121414131402
```

The disk map uses a dense format to represent the layout of files and free space on the disk. The digits alternate between indicating the length of a file and the length of free space.

So, a disk map like 12345 would represent a one-block file, two blocks of free space, a three-block file, four blocks of free space, and then a five-block file. A disk map like 90909 would represent three nine-block files in a row (with no free space between them).

Each file on disk also has an ID number based on the order of the files as they appear before they are rearranged, starting with ID 0. So, the disk map 12345 has three files: a one-block file with ID 0, a three-block file with ID 1, and a five-block file with ID 2. Using one character for each block where digits are the file ID and . is free space, the disk map 12345 represents these individual blocks:

```
0..111....22222
```

The first example above, `2333133121414131402`, represents these individual blocks:

```
00...111...2...333.44.5555.6666.777.888899
```

The amphipod would like to move file blocks one at a time from the end of the disk to the leftmost free space block (until there are no gaps remaining between file blocks). For the disk map 12345, the process looks like this:

```
0..111....22222
02.111....2222.
022111....222..
0221112...22...
02211122..2....
022111222......
```

The first example requires a few more steps:

```
00...111...2...333.44.5555.6666.777.888899
009..111...2...333.44.5555.6666.777.88889.
0099.111...2...333.44.5555.6666.777.8888..
00998111...2...333.44.5555.6666.777.888...
009981118..2...333.44.5555.6666.777.88....
0099811188.2...333.44.5555.6666.777.8.....
009981118882...333.44.5555.6666.777.......
0099811188827..333.44.5555.6666.77........
00998111888277.333.44.5555.6666.7.........
009981118882777333.44.5555.6666...........
009981118882777333644.5555.666............
00998111888277733364465555.66.............
0099811188827773336446555566..............
```

The final step of this file-compacting process is to update the filesystem checksum. To calculate the checksum, add up the result of multiplying each of these blocks' position with the file ID number it contains. The leftmost block is in position 0. If a block contains free space, skip it instead.

Continuing the first example, the first few blocks' position multiplied by its file ID number are `0 * 0 = 0, 1 * 0 = 0, 2 * 9 = 18, 3 * 9 = 27, 4 * 8 = 32`, and so on. In this example, the checksum is the sum of these, 1928.

Compact the amphipod's hard drive using the process he requested. What is the resulting filesystem checksum?

In [None]:
sample = '2333133121414131402'

## Plan:

- function to decode into a string
- function to do one compaction step
- reduce to compact until finished
- function to compute checksum

In [None]:
def decode(diskmap:str):
    ID = 0
    result = ''
    for pos,ch in enumerate(diskmap):
        repeat_count = int(ch)
        if pos % 2 == 0:
            # what if ID is more than one digit?
            new_chars = str(ID) * repeat_count
            ID += 1
        elif pos % 2 == 1:
            new_chars = '.' * repeat_count
        result += new_chars
    return result

In [None]:
decode('12345')

'0..111....22222'

In [None]:
decode(sample)

'00...111...2...333.44.5555.6666.777.888899'

In [None]:
decode('233313312141413140211')

'00...111...2...333.44.5555.6666.777.888899.10'

In [None]:
import re
re.match?

[0;31mSignature:[0m [0mre[0m[0;34m.[0m[0mmatch[0m[0;34m([0m[0mpattern[0m[0;34m,[0m [0mstring[0m[0;34m,[0m [0mflags[0m[0;34m=[0m[0;36m0[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Try to apply the pattern at the start of the string, returning
a Match object, or None if no match was found.
[0;31mFile:[0m      /usr/local/lib/python3.11/re/__init__.py
[0;31mType:[0m      function


In [None]:
ms = list(re.finditer(pattern="[0-9]",string='02.111....2222.'))
ms

[<re.Match object; span=(0, 1), match='0'>,
 <re.Match object; span=(1, 2), match='2'>,
 <re.Match object; span=(3, 4), match='1'>,
 <re.Match object; span=(4, 5), match='1'>,
 <re.Match object; span=(5, 6), match='1'>,
 <re.Match object; span=(10, 11), match='2'>,
 <re.Match object; span=(11, 12), match='2'>,
 <re.Match object; span=(12, 13), match='2'>,
 <re.Match object; span=(13, 14), match='2'>]

**AI Prompt**

How do I get the starting position of a match?

**AI Response**

You can use the `start()` method of a match object. For example:

```python
ms[0].start()  # Will give you the starting position of the first match
```

Would you like me to explain more about what other useful methods match objects have?

In [None]:
ms[-1].start()

13

In [None]:
ms[-1].pos

0

In [None]:
ms[-1].start

<function Match.start(group=0, /)>

In [None]:
def rightmost(pat,s) -> int|None:
    ms = list(re.finditer(pat,s))
    if not ms: return None
    else: return ms[-1].start()

In [None]:
def has_free_spaces(s):
    dots = list(re.finditer("\.",s))
    if not dots: return False 
    ldot = dots[0].start()
    rdigit = rightmost("[0-9]",s)
    if rdigit is None:
        print('no digits')
        return False
    return  ldot < rdigit

In [None]:
has_free_spaces('00...111...2...333.44.5555.6666.777.888899')

True

In [None]:
has_free_spaces('0099811188827773336446555566.............')

False

In [None]:
def lspace_rdigit(s):
    dots = list(re.finditer("\.",s))
    ldot = dots[0].start() if dots else None
    digits = list(re.finditer("[0-9]",s))
    rdigit = digits[-1].start() if digits else None
    return  (ldot,rdigit)

In [None]:
lspace_rdigit('00...111...2...333.44.5555.6666.777.888899')

(2, 41)

In [None]:
len('00...111...2...333.44.5555.6666.777.888899')

42

In [None]:
def compact(s) -> str|None:
    "returns compacted by one char, or None if cannot be compacted further"
    lspace,rdigit = lspace_rdigit(s)
    if lspace is None or rdigit is None: return None
    if not lspace < rdigit: return None
    s[lspace] = s[rdigit]
    s[rdigit] = '.'
    return s

**AI Prompt**

I recall strings are not mutable in python so the above is incorrect. I know I could produce the same effect by building a new string with splices. But is there a mutable buffer construct which would be more idiomatic?

**AI Response**

Yes! For a mutable string-like buffer in Python, you can use `bytearray` or `list`. Since we're working with single characters, a `list` would be most straightforward. You would modify your function like this:

```python
def compact(s):
    "returns compacted by one char, or None if cannot be compacted further"
    chars = list(s)  # convert string to list of characters
    lspace,rdigit = lspace_rdigit(''.join(chars))
    if lspace is None or rdigit is None: return None
    if not lspace < rdigit.start(): return None
    chars[lspace] = chars[rdigit.start()]
    chars[rdigit.start()] = '.'
    return ''.join(chars)  # convert back to string
```

Would you like me to explain why this approach might be better than string slicing?

In [None]:
# redefining
def compact(s) -> str|None:
    "returns compacted by one char, or None if cannot be compacted further"
    lspace,rdigit = lspace_rdigit(s)
    if lspace is None or rdigit is None: return None
    if not lspace < rdigit: return None
    ls = list(s)
    ls[lspace] = ls[rdigit]
    ls[rdigit] = '.'
    return ''.join(ls)

In [None]:
compact('00...111...2...333.44.5555.6666.777.888899')

'009..111...2...333.44.5555.6666.777.88889.'

In [None]:
import itertools, functools

**AI Prompt**

What's a function like `itertools.reduce` but which keeps applying the lambda function to its output, until some predicate returns true? I think it might be called NestUntil in Wolfram Language?

**AI Response**

In Python's `itertools`, there isn't a direct equivalent, but you can use `itertools.takewhile()` with a custom function, or create your own using `functools.reduce`. A common pattern is to use `iter()` with a sentinel value.

Here's a simple way to write it:

```python
def iterate_until(f, x, pred):
    while not pred(x):
        x = f(x)
    return x
```

Would you like me to explain how this works, or would you prefer to see alternative approaches?

In [None]:
def fully_compact(s):
    while (compacted := compact(s)) is not None:
        s = compacted
    return s

In [None]:
fully_compact('00...111...2...333.44.5555.6666.777.888899')

'0099811188827773336446555566..............'

**AI Prompt**

Refresh my memory on how the checksum is supposed to be calculated?

**AI Response**

According to the instructions, to calculate the checksum:
1. For each position in the final compacted string
2. If the position contains a digit (not a dot), multiply:
   - The position number (starting from 0 from the left)
   - Times the file ID number (the digit in that position)
3. Sum up all these products

For example, if a file ID 9 is in position 2, that contributes 2 * 9 = 18 to the sum. Dots are skipped and don't contribute to the sum.

Would you like help implementing this calculation?

In [None]:
def checksum(compacted:str):
    v = 0
    for i,ch in enumerate(compacted):
        if ch == '.':
            break
        ID = int(ch)
        v += i * ID
    return v

In [None]:
checksum('0099811188827773336446555566..............')

1928

In [None]:
def checksum_diskmap(diskmap:str):
    return checksum( fully_compact( decode(diskmap) ))

In [None]:
checksum_diskmap('2333133121414131402')

1928

In [None]:
from aocd import get_data
inp = get_data(day=9, year=2024)

In [None]:
#checksum_diskmap(inp) #times out

My note: fully_compact is not efficient, since it searches the string every time. need to search once, and then do mtuation and iteration

In [None]:
len(decode(inp))

239474

In [None]:
# redefining
def fully_compact2(s) -> str|None:
    "returns compacted by one char, or None if cannot be compacted further"
    lspace,rdigit = lspace_rdigit(s)
    if lspace is None or rdigit is None: return None
    ls = list(s)
    while lspace < len(ls) and 0 <= rdigit and lspace < rdigit:
        assert ls[lspace] == '.'
        assert ls[rdigit] != '.'
        ls[lspace] = ls[rdigit]
        ls[rdigit] = '.'
        while ls[lspace] != '.':
            lspace += 1
        while ls[rdigit] == '.':
            rdigit -= 1
    return ''.join(ls)

In [None]:
fully_compact('00...111...2...333.44.5555.6666.777.888899')

'0099811188827773336446555566..............'

In [None]:
fully_compact2('00...111...2...333.44.5555.6666.777.888899')

'0099811188827773336446555566..............'

In [None]:
def checksum_diskmap2(diskmap:str):
    return checksum( fully_compact2( decode(diskmap) ))

In [None]:
checksum_diskmap(sample), checksum_diskmap2(sample)

(1928, 1928)

In [None]:
inp_decded = decode(inp)

In [None]:
len(inp_decded)

239474

In [None]:
inp_decded[:20]

'00000....111.2222222'

In [None]:
checksum_diskmap2(inp)

90779541117

In [None]:
checksum_diskmap2('2333133121414131402')

1928

**AI Prompt**

Can you see anything wrong my answer above?

**AI Response**

Looking at your checksum function, I notice it stops at the first dot:

```python
def checksum(compacted:str):
    v = 0
    for i,ch in enumerate(compacted):
        if ch == '.':
            break        # <-- This breaks at first dot
        ID = int(ch)
        v += i * ID
    return v
```

According to the instructions, we should calculate for all digits in the string, skipping dots but continuing past them. The checksum should include all digit positions, not just the ones before the first dot.

Would you like help fixing this?

**AI Prompt**

But won't there be nothing but dots to the right of the leftmost dot, once the string is compacted?

**AI Response**

Actually, looking back at the instructions, I see that `fully_compact2` has a bug. It should continue moving digits leftward until there are no gaps between file blocks. In your current implementation, once a digit is moved to the first available dot position, it stays there even if there are still dots between it and the previous digit.

Looking at the example from the instructions:
```
00...111...2...333.44.5555.6666.777.888899
```
Should become:
```
0099811188827773336446555566
```
(followed by dots)

Would you like help fixing the compaction logic?

**AI Prompt**

Can you not see that I have already tested fully_compact2 on exactly that input??

**AI Response**

I apologize for my mistake! You're absolutely right - looking at your test results, `fully_compact2` does produce exactly the correct output, and your original observation about the checksum function is correct - once fully compacted, all dots will be at the end, so breaking at the first dot is fine.

I should have looked more carefully at your test results before making incorrect suggestions. Your implementation appears to be working correctly.

Would you like me to look for any other potential issues, or would you prefer to move on to something else?

**AI Prompt**

Keep looking. Look harder.

**AI Response**

Let me carefully review the instructions and your code again...

Looking at the example in the instructions:
```
00...111...2...333.44.5555.6666.777.888899
```

The instructions say each file has an ID based on the order they appear, starting with ID 0. But in your checksum calculation, you're using the actual digit values from the compacted string, not the file IDs!

So in the final compacted string:
```
0099811188827773336446555566
```

Those '9's should be treated as file ID 8 or 9 (depending on their original position), not literally multiplied by 9. The same goes for all other digits - they should be matched with their original file IDs.

Is this the issue you were wanting me to find?

In [None]:
inp[:20]

'54318470705324965486'

What's wrong with my solution, so that it doesn't pass?

Re-reading the problem, the only ambiguity I can see is *what character or characters to use to encode multi-digit IDs*. I just do `str(ID)` which will encode ID 10 as `"10"` but what if it is supposed to use only 1 block, which means one character? Should I use truncate `"1"` and use 1? Take the smallest digit and use `"0"` ? What if I'm supposed to have 2 blocks for 10, then should I use `"10"` or `"11"` or `"00"`. The problem statement seems abiguous: It says each file on disk has "an ID number based on the order of the files" and describes counting such that an ID could go above 9. Then it says "Using one character for each block where digits are teh file ID..."

Is "where digits are the file ID" a restrictive clause? That makes no sense since digits are always the file ID. Is it unrestrictive? THen it is unformative because it does not say how to translate a multidigit ID to a single digit.

Let's try three possible strategies:

In [None]:
# redefining
def decode(diskmap:str,strategy='units'):
    ID = 0
    result = ''
    for pos,ch in enumerate(diskmap):
        repeat_count = int(ch)
        if pos % 2 == 0:
            if strategy == 'append': 
                new_chars = str(ID) * repeat_count
            elif strategy == 'truncate':
                new_chars = str(ID) * repeat_count
                new_chars = new_chars[:repeat_count]
            elif strategy == 'units':
                new_chars = str(ID)[0] * repeat_count
            else: raise "Not implemented"
            ID += 1
        elif pos % 2 == 1:
            new_chars = '.' * repeat_count
        result += new_chars
    return result

In [None]:
checksum_diskmap2(sample)

1928

In [None]:
checksum_diskmap2(inp)

5719521608

In [None]:
checksum_diskmap2(inp)

90779541117

In [None]:
checksum_diskmap2(inp)

5906755819

Realized: strategy needs to affect decode, compact, and checksum.

Let's stick with 'append' as the strategy and update checksum

*Writing in retrospect*: I think the biggest confusion was me reading "Using one character for each block where digits are the file ID and . is free space, the disk map 12345 represents these individual blocks" and understanding this to _define_ a block rather than to be a temporary representation they were using for illustration which did not represent desired behavior in the multidigit case. 

Here I rebuilt under the assumption that the character representation was meaningless.

- "block" single unit of storage in an array.
- "sector", a sequence of blocks representing a single file or free space.

In [None]:
from typing import NewType, Union
from dataclasses import dataclass
from enum import Enum

@dataclass
class File:
    ID:int
    length:int

@dataclass
class Free:
    pass

Block = File | Free

In [None]:
def _blocks_for_file(f:File): return [f] * f.length
_blocks_for_file(File(49,1))

[File(ID=49, length=1)]

In [None]:
def _blocks_for_freesector(length:int): return [Free() for _ in range(length)]
_blocks_for_freesector(5)

[Free(), Free(), Free(), Free(), Free()]

In [None]:
def diskmap_to_blocklist(diskmap):
    ID = 0
    result = []
    for pos,ch in enumerate(diskmap):
        if pos % 2 == 0:
            result.extend(_blocks_for_file(File(ID,int(ch))))
            ID += 1
        elif pos % 2 == 1:
            result.extend(_blocks_for_freesector(int(ch)))
    return result

In [None]:
bl = diskmap_to_blocklist(sample)
bl

[File(ID=0, length=2),
 File(ID=0, length=2),
 Free(),
 Free(),
 Free(),
 File(ID=1, length=3),
 File(ID=1, length=3),
 File(ID=1, length=3),
 Free(),
 Free(),
 Free(),
 File(ID=2, length=1),
 Free(),
 Free(),
 Free(),
 File(ID=3, length=3),
 File(ID=3, length=3),
 File(ID=3, length=3),
 Free(),
 File(ID=4, length=2),
 File(ID=4, length=2),
 Free(),
 File(ID=5, length=4),
 File(ID=5, length=4),
 File(ID=5, length=4),
 File(ID=5, length=4),
 Free(),
 File(ID=6, length=4),
 File(ID=6, length=4),
 File(ID=6, length=4),
 File(ID=6, length=4),
 Free(),
 File(ID=7, length=3),
 File(ID=7, length=3),
 File(ID=7, length=3),
 Free(),
 File(ID=8, length=4),
 File(ID=8, length=4),
 File(ID=8, length=4),
 File(ID=8, length=4),
 File(ID=9, length=2),
 File(ID=9, length=2)]

In [None]:
def righmost_file_block(blocklist) -> int:
    for i in reversed(list(range(len(blocklist)))):
        if isinstance(blocklist[i],File): return i
    return None

def leftmost_free_block(blocklist) -> int:
    try: return next(i for i,v in enumerate(blocklist) if isinstance(v,Free))
    except: return None

In [None]:
leftmost_free_block(bl), righmost_file_block(bl)

(2, 41)

In [None]:
# redefining
def compact_blocklist(bl:list) -> list|None:
    "returns compacted by one char, or None if cannot be compacted further"
    lspace,rdigit = 0,len(bl)-1
    if lspace is None or rdigit is None: return None
    while isinstance(bl[lspace], File): lspace += 1
    while isinstance(bl[rdigit], Free): rdigit -= 1
    while lspace < len(bl) and 0 <= rdigit and lspace < rdigit:
        bl[lspace] = bl[rdigit]
        bl[rdigit] = Free()
        while isinstance(bl[lspace], File): lspace += 1
        while isinstance(bl[rdigit], Free): rdigit -= 1
    return bl

In [None]:
compact_blocklist( diskmap_to_blocklist( sample))

[File(ID=0, length=2),
 File(ID=0, length=2),
 File(ID=9, length=2),
 File(ID=9, length=2),
 File(ID=8, length=4),
 File(ID=1, length=3),
 File(ID=1, length=3),
 File(ID=1, length=3),
 File(ID=8, length=4),
 File(ID=8, length=4),
 File(ID=8, length=4),
 File(ID=2, length=1),
 File(ID=7, length=3),
 File(ID=7, length=3),
 File(ID=7, length=3),
 File(ID=3, length=3),
 File(ID=3, length=3),
 File(ID=3, length=3),
 File(ID=6, length=4),
 File(ID=4, length=2),
 File(ID=4, length=2),
 File(ID=6, length=4),
 File(ID=5, length=4),
 File(ID=5, length=4),
 File(ID=5, length=4),
 File(ID=5, length=4),
 File(ID=6, length=4),
 File(ID=6, length=4),
 Free(),
 Free(),
 Free(),
 Free(),
 Free(),
 Free(),
 Free(),
 Free(),
 Free(),
 Free(),
 Free(),
 Free(),
 Free(),
 Free()]

In [None]:
def checksum(compacted:list[Block]):
    v = 0
    for i,b in enumerate(compacted):
        if isinstance(b,Free): pass
        if isinstance(b,File):
            v += i * b.ID
    return v

In [None]:
checksum( compact_blocklist( diskmap_to_blocklist( sample )))

1928

In [None]:
checksum( compact_blocklist( diskmap_to_blocklist( inp )))

6382875730645