# December 04

Today's puzzle is very similar to the problem John and Rikk has recently worked on. 

We need to build a timeline of states, where a Elf is either awake or asleep. 

In [None]:
import regex as re
import numpy as np
import pandas as pd

In [None]:
# Sample data as provided in question
data= """
[1518-11-01 00:00] Guard #10 begins shift
[1518-11-01 00:05] falls asleep
[1518-11-01 00:25] wakes up
[1518-11-01 00:30] falls asleep
[1518-11-01 00:55] wakes up
[1518-11-01 23:58] Guard #99 begins shift
[1518-11-02 00:40] falls asleep
[1518-11-02 00:50] wakes up
[1518-11-03 00:05] Guard #10 begins shift
[1518-11-03 00:24] falls asleep
[1518-11-03 00:29] wakes up
[1518-11-04 00:02] Guard #99 begins shift
[1518-11-04 00:36] falls asleep
[1518-11-04 00:46] wakes up
[1518-11-05 00:03] Guard #99 begins shift
[1518-11-05 00:45] falls asleep
[1518-11-05 00:55] wakes up
""".strip().splitlines()

In [None]:
# We're now going to start storing the data in a file to avoid having to paste into the main document
# Make sure you create a file with the name below and save the real problem output there. 
# If you want to run just the sample data, skip this block
with open("./04-kws.txt", "r") as FILE:
    data = FILE.read().strip().splitlines()

In [None]:
data[:5]

In the real data the 'events' are not sorted, so before we do anything else, we need to sort the list. The wonderful thing about ISO8601 order date format is that the dates sort alphanumerically. Keep that in mind whenever putting a date in filenames!

So to put this in chronological order, we simply need to sort the list.

In [None]:
data.sort()
data[:5]

In [None]:
guard_on_duty = None

sleep_times = dict()

pattern_guard_change = re.compile(".*Guard #(\d+) begins shift")
pattern_date = re.compile("\[(\d\d\d\d-\d\d-\d\d) (\d\d):(\d\d)\].*")
for line in data:
    # Check if the current line is a guard change
    match = pattern_guard_change.match(line)
    if match:
        guard_on_duty = match[1]
        continue
        
    # If not a guard change, then it has to be an awake or sleep line
    match = pattern_date.match(line)
    date = match[1]
    minutes = int(match[3])
    
    date_key = "{}|{}".format(date,guard_on_duty)
    sleep_array = sleep_times.get(date_key, np.zeros(60))
        
    # If instead it's a guard that falls asleep
    if "asleep" in line:
        sleep_array[minutes:] = 1
    elif "wakes up" in line:
        sleep_array[minutes:] = 0
        
    sleep_times[date_key] = sleep_array

df_sleep = pd.DataFrame.from_dict(sleep_times,orient="index")

# Split date and elf into separate index columns
df_sleep["date"] = df_sleep.index.map(lambda x: x.split("|")[0])
df_sleep["elf"] = df_sleep.index.map(lambda x: x.split("|")[1])
df_sleep = df_sleep.set_index(["date","elf"])

df_sleep

First we want to find the elf that falls asleep the most

In [None]:
total_sleep = df_sleep.groupby('elf').sum().sum(axis=1).sort_values(ascending=False)
total_sleep

We can use the [idxmax()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.idxmax.html) function to find the ID of the row with the highest value

In [None]:
elf_with_max_sleep = total_sleep.idxmax()
elf_with_max_sleep

To how likely each elf is to be asleep in a particular minute, we sum up minute-by-minute for each elf by using
[groupby()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html).

In [None]:
total_by_minute = df_sleep.groupby('elf').sum().reset_index()
total_by_minute

The question now is which minute is the elf with most sleep (as we found above) most likely to be awake. We can use the idxmax function again to create a column the minute each elf is most likely to be asleep, and the [max()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.max.html) function to find the corresponding value.

The question asks for the product of the minutes asleep in this minute with the ID of the elf, so we also create a `solution` column to hold this value.

In [None]:
total_by_minute["most_common_sleep_minute"] = total_by_minute[list(range(0,60))].idxmax(axis=1)
total_by_minute["most_common_sleep_minute_value"] = total_by_minute[list(range(0,60))].max(axis=1)

# The "solution" is the product of the elf ID with the most_common_sleep_minute
total_by_minute["solution"] = total_by_minute["elf"].astype(int) * total_by_minute["most_common_sleep_minute"]

# Finally, recreate the total number of minutes asleep from above. We want to sort the table by this.
total_by_minute["total_min_asleep"] =  total_by_minute[list(range(0,60))].sum(axis=1)

total_by_minute.sort_values(by="total_min_asleep", ascending=False)

The answer can be found in the top row above. We can select this with [head()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html).

In [None]:
total_by_minute[["elf","total_min_asleep","most_common_sleep_minute","most_common_sleep_minute_value","solution"]]\
  .sort_values(by="total_min_asleep", ascending=False).head(1)

# Part 2

> Of all guards, which guard is most frequently asleep on the same minute?

We have already answered this above, so we just need to sort by `most_common_sleep_minute_value`

In [None]:
total_by_minute[["elf","total_min_asleep","most_common_sleep_minute","most_common_sleep_minute_value", "solution"]]\
  .sort_values(by="most_common_sleep_minute_value", ascending=False).head(1)