# 2022-12-01

## Performance Summary
Inital: ~655 µs ± 9.89 µs

Best: ~226 µs ± 2.63 µs

I think that all of my provided solutions are $O(n)$

## Initial solution
Woke up att 5:55am (~Midnight UTC-5) and did it on time. 
```
-------Part 1--------   -------Part 2--------
Day       Time  Rank  Score       Time  Rank  Score
  1   00:08:55  4880      0   00:10:30  3906      0
```

### First Puzzle
Although not optimal, divide the file by newline then rebuild the object with spaces and split by double space.
Then, iterate through the list, map the values to ints and them sum for each elf.

In [101]:
# Run this if you are running a clean version of Python

# %%capture
# %pip install numpy

In [102]:
import numpy as np

with open("data/day-1.txt") as f:
    data = " ".join(f.read().split("\n")).split("  ")
sums = np.array(
    [np.array(list(map(lambda x: int(x), x))).sum() for x in [x.split() for x in data]]
)
sums.max()

71506

### Second Puzzle
Sort the list and then find the top three and sum them up.

In [103]:
sums.sort()
sums[::-1][:3].sum()

209603

## Improved or alternate solution

## First alternative
This is the first alternative I came up with. It has some small improvements over the first solution, and runs 3x faster.

In [147]:
def first_alternative(data=None):
    if data is None:
        with open("data/day-1.txt") as f:
            data = f.read()
    elf_calorie_sums = [
        sum(list(map(int, elf_strings.split("\n"))))
        for elf_strings in data.split("\n\n")
    ]
    elf_calorie_sums.sort()
    elf_calorie_sums.reverse()
    return elf_calorie_sums[0], sum(elf_calorie_sums[:3])


solution_1, solution_2 = first_alternative()
print(f"The most an elf is carrying is {solution_1} calories")
print(f"The three elfs that carries the most has together {solution_2} calories")

The most an elf is carrying is 71506 calories
The three elfs that carries the most has together 209603 calories


## Second alternative
While I thought this was going to be much faster, in fact, this is much much slower than any other alternative. I'm leaving it here for reference. It could be further optimised by reducing the number of times we perform the argmin operation but I am not sure if this is the true bottleneck.

In [108]:
def second_alternative(data=None):
    if data is None:
        with open("data/day-1.txt") as f:
            data = f.read()

    # hyperparameter
    top_n = 3

    temp = []
    elf_storage = []
    partial_solution = [0] * top_n
    chars = len(data)
    i = 0
    while i < chars:
        if data[i].isalnum():
            temp.append(data[i])
            i += 1
        else:
            elf_storage.append(int("".join(temp)))
            temp = []
            if data[i] == data[i + 1]:
                elf_calories = sum(elf_storage)
                smallest = min(
                    range(len(partial_solution)), key=partial_solution.__getitem__
                )
                if elf_calories > partial_solution[smallest]:
                    partial_solution.pop(smallest)
                    partial_solution.append(elf_calories)
                elf_storage = []
                i += 2
            else:
                i += 1
    solution_1 = max(partial_solution)
    solution_2 = sum(partial_solution)
    return solution_1, solution_2


solution_1, solution_2 = second_alternative()
print(f"The most an elf is carrying is {solution_1} calories")
print(f"The three elfs that carries the most has together {solution_2} calories")

The most an elf is carrying is 71506 calories
The three elfs that carries the most has together 209603 calories


# Other leanings
It appears that depending on how we import numpy, it can be much slower. I'm not sure why this is the case, but I'm leaving it here for reference.

- Importing numpy using the first cell produces much slower results than if I were to run it in another python environment where numpy is already installed.
- Applying `int` directly in `map` is possible and there is no need to create a lambda function as in the initial solution.
- While working on the second alternative solution, I learned that you cannod modify the iterator while iterating through it when using a for loop.
- Obv, I wouldn't have needed to reverse the sort, although it felt more natural to me to have the largest number at the top of the list.

# Runtimes

## Initial solution

In [None]:
def initialSolution(data=None):
    if data is None:
        with open("data/day-1.txt") as f:
            data = " ".join(f.read().split("\n")).split("  ")
    else:
        data = " ".join(data.split("\n")).split("  ")
    np.array(
        [
            np.array(list(map(lambda x: int(x), x))).sum()
            for x in [x.split() for x in data]
        ]
    )
    sums.sort()
    return sums.max(), sums[::-1][:3].sum()

In [None]:
%%timeit

initialSolution()

637 µs ± 1.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## First alternative

In [None]:
%%timeit

with open("data/day-1.txt") as f:
    data = f.read()
elf_calorie_sums = [
    sum(list(map(int, elf_strings.split("\n")))) for elf_strings in data.split("\n\n")
]
elf_calorie_sums.sort()
elf_calorie_sums.reverse()
solution_1 = elf_calorie_sums[1]
solution_2 = sum(elf_calorie_sums[:3])

## Second alternative

In [109]:
%%timeit
second_alternative()

1.38 ms ± 2.82 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


# Just other things

In [None]:
# Just som statistics from the original data
with open("data/day-1.txt") as f:
    data = f.read()
elf_calorie_sums = [
    list(map(int, elf_strings.split("\n"))) for elf_strings in data.split("\n\n")
]
mean = np.array([item for sublist in elf_calorie_sums for item in sublist]).mean()
std = np.array([item for sublist in elf_calorie_sums for item in sublist]).std()
mean_sums = np.array(
    [sum(list(map(int, elf_strings.split("\n")))) for elf_strings in data.split("\n\n")]
).mean()
mean_sums_std = np.array(
    [sum(list(map(int, elf_strings.split("\n")))) for elf_strings in data.split("\n\n")]
).std()

mean, std, mean_sums, mean_sums_std

(5923.5075, 5550.487886658591, 46277.40234375, 11665.198320545209)

In [None]:
import numpy as np


def generator(n=256, mean=5900, std=5550):
    # generate random calorie counts
    nums = []
    while len(nums) < n:
        nums.append(int(np.random.normal(mean, std)))
        if nums[-1] < 0:
            nums.pop()

    # split nums into random groups of 1 to 16
    groups = [
        nums[i : i + np.random.randint(1, 17)]
        for i in range(0, len(nums), np.random.randint(1, 14))
    ]

    # create a string of the groups where each number in the group is separated by newline and each group is separated by two newlines
    data = "\n\n".join(["\n".join([str(x) for x in group]) for group in groups])
    return data