### Refactored Original Solution

Obviously impossible to iterate through this large of a sequence using lists in Python, numpy also out of question

Brief Example of what I Wanted To Do:
- Each step keep track of the elements that need to be split in a dictionary: `{'NN': 1, 'NC': 1, 'CB': 1, 'N':1, 'C':1}`
    - I keep the start & end as their own single key so that I can just divide every character by 2 at the end. 
- I then can iterate over the keys & values of my count dictionary:
    - Find the insertion 
    - Take the 0th element of the key and pair with insertion, store off value as count
    - Take insertion and pair with 1st element, store off value as count
    - Example:
        - Assume we are looking at `NN`:
        - This will need to generate `NCN`, which can actually be stored as: `NC, CN` since this is what we care about in the next step
        - So, our insertion is `C`
        - We build one pair of `NC` 
        - We then build a pair of `CN`
        - Notice that C is being handled twice here, this is intentional since each of the `N` elements would be shared by another pair as indicated in the prompt. 

General Strategy:
- Start by building a dictionary of counts that has all pairs to iterate through from initial polymer as well as the first and last elements stored as single keys.
    - key: pair of strings (or single string for first & last element)
    - value: count (this is why I kept screwing up with actual data, need to not assume a single instance of these keys at the start)
- Each step we are going to iterate through the `(key,value)` in the dictionary and:
    - Identify the insertion character
    - Build out two new pairs -> `(key[0]+insertion)` and `(key[1]+ insertion)`
    - store in a new dictionary of counts
        - we can assume that the value represents the count for these new pairs since an `insertion`, `key[0]` or `key[1]` will need to exist as many times as the `key` it was generated from. 
- We then have an updated & expanded dictionary with all proper insertions. We can restart the process on the next step, ensuring we now iterate over the updates. 

Final Solve:
- I intentionally stored the first & last elements of our initial string so that I "double-count" every character.
    - By double count I mean this: `NNCB` is treated as `N, NN, NC, CB, B`, with each element douibled due to being shared across two pairs. 
- At the very end I just need to take the last count dictionary (stored with a key being a pair of characters and the value being the count) and count up all the various characters, dividing by 2 due to double-counting

      

In [1]:
from collections import Counter, defaultdict

def calcCounts(d):
    count_dict = defaultdict(lambda: 0)
    for k,v in d.items():
        if len(k) == 1:
            count_dict[k] += (v*0.5)
        else:
            count_dict[k[0]] += (v*0.5)
            count_dict[k[1]] += (v*0.5)
            
    return count_dict

def dataClean(path):
    """Read in data and output pair insertion dict and initial state string"""
    with open(path) as fh:
        data = [line.strip('|\n') for line in fh.readlines()]

    # start by handling initial polymer
    poly_temp = data[0]
    
    pair_insertion = {}

    # finish by handling fold details, always starts at 3rd row 
    for dirs in data[2:]:
        r, i = dirs.split(' -> ')
        pair_insertion[r] = i
    
    return poly_temp, pair_insertion

In [2]:
# read data for test case
poly_temp, pair_insertion = dataClean('data/day14_test.txt')

# New addition: Generate initiale dict of various counts
temp_dict = {}
for i in range(len(poly_temp) - 1):
    temp_dict[poly_temp[i:i+2]] = 1

# Add the ends so i can just divide all values by 1
temp_dict[poly_temp[0]] = 1
temp_dict[poly_temp[-1]] = 1
temp_dict

{'NN': 1, 'NC': 1, 'CB': 1, 'N': 1, 'B': 1}

In [3]:
# iterate through 40 (39 more):
for _ in range(40):
    
    # pass over what we just did last step
    counter_dict = temp_dict

    # definitely need a default dict here for updating keys that didn't exist
    temp_dict = defaultdict(lambda: 0)

    for k,v in counter_dict.items():

        if len(k) == 1:
            temp_dict[k] = v
            continue

        # We can build out the insertion impact to sequence
        l = k[0]
        i = pair_insertion[k]
        r = k[1]

        # we build the two new pairs and store off counts from original pair (key)
        temp_dict[l+i] += v
        temp_dict[i+r] += v

# Take final dict & determine various counts
temp = calcCounts(temp_dict)
assert(temp[max(temp, key = temp.get)] - temp[min(temp, key = temp.get)] == 2188189693529)

### Actual 1 and 2

In [4]:
# Part 1
poly_temp, pair_insertion = dataClean('data/day14.txt')

# New addition: Generate initiale dict of various counts
temp_dict = defaultdict(lambda: 0)

for i in range(len(poly_temp) - 1):
    temp_dict[poly_temp[i:i+2]] += 1

# Add the ends so i can just divide all values by 1
temp_dict[poly_temp[0]] = 1
temp_dict[poly_temp[-1]] = 1

# Start by confirming part 1
total_count = defaultdict(lambda: 0)
for _ in range(10):
    
    # pass over what we just did last step
    counter_dict = temp_dict

    # definitely need a default dict here for updating keys that didn't exist
    temp_dict = defaultdict(lambda: 0)

    for k,v in counter_dict.items():

        if len(k) == 1:
            temp_dict[k] = v
            continue

        # We can build out the insertion impact to sequence
        l = k[0]
        i = pair_insertion[k]
        r = k[1]
        
        #print(f"Split {k} into {(l,i)} and {(i,r)}")

        # we build the two new pairs
        temp_dict[l+i] += v
        temp_dict[i+r] += v

    # Take final dict & determine various counts
    temp = calcCounts(temp_dict)

assert(temp[max(temp, key = temp.get)] - temp[min(temp, key = temp.get)] == 3408)

In [5]:
# Part 2
poly_temp, pair_insertion = dataClean('data/day14.txt')

# New addition: Generate initiale dict of various counts
temp_dict = defaultdict(lambda: 0)

for i in range(len(poly_temp) - 1):
    temp_dict[poly_temp[i:i+2]] += 1

# Add the ends so i can just divide all values by 1
temp_dict[poly_temp[0]] = 1
temp_dict[poly_temp[-1]] = 1

# Start by confirming part 1
total_count = defaultdict(lambda: 0)
for _ in range(40):
    
    # pass over what we just did last step
    counter_dict = temp_dict

    # definitely need a default dict here for updating keys that didn't exist
    temp_dict = defaultdict(lambda: 0)

    for k,v in counter_dict.items():

        if len(k) == 1:
            temp_dict[k] = v
            continue

        # We can build out the insertion impact to sequence
        l = k[0]
        i = pair_insertion[k]
        r = k[1]
        
        #print(f"Split {k} into {(l,i)} and {(i,r)}")

        # we build the two new pairs
        temp_dict[l+i] += v
        temp_dict[i+r] += v

    # Take final dict & determine various counts
    temp = calcCounts(temp_dict)

temp[max(temp, key = temp.get)] - temp[min(temp, key = temp.get)]

3724343376942.0