--- Day 3: Rucksack Reorganization ---

Go to Advent of Code (https://adventofcode.com/) for the full description of the challenge solved here.

Summary of challenge: The Day 3 input contains a series of letters which represent items in a rucksack. Each line of text correspond to a rucksack carried by one elf, but each rucksack can be divided into 2 compartments containing an equal amount of items (letters). In part 1, we need to calculate the total priority score of letters that are the same in both rucksack compartments. In part 2, we need to seperate the elfs into groups of 3 (1 group - 3 rucksacks) and calculate the total priority score based on the letter that occurs in all 3 rucksacks.

In [1]:
import pandas as pd
import itertools # allows parrallel comparison
# will be neccessary for scoring letters: 
from string import ascii_lowercase
from string import ascii_uppercase

Part 1

What is the total priority score of letters that are the same in both rucksack compartments?

In [26]:
# load Day3_input as a dataframe

excel = pd.read_excel("Day3_input.xlsx", header=None)
excel.head()

Unnamed: 0,0
0,WwcsbsWwspmFTGVV
1,RHtMDHdSMnDBGMSDvnvDjtmpTpjTFggpmjmTFggTjmpP
2,vtCSGRMBDzHddvBHBzRhrlcZhlLzWNlqblhzcr
3,shhszHNHHZWqSzVNdClMjlFjBBbNTB
4,tQQGmnrMnJnGfmvrRRPCjlbljFBdjFCjTjnP


In [27]:
# creating a dataframe that contains the first and second half of the rucksack

first=[] # list will contain the first half of the string (rucksack)
second=[] # list will contain the second half of the string (rucksack)

for x in excel[0]:
  # determine the half point in a string by dividing the string by 2
  half=int(len(x)/2)
  # add the first and second half of the string to lists, respectively
  first.append(x[:half])
  second.append(x[half:])

# add the elements of the lists as columns in a new dataframe
df = pd.DataFrame()
df["first"]=first
df["second"]=second
df.head()

Unnamed: 0,first,second
0,WwcsbsWw,spmFTGVV
1,RHtMDHdSMnDBGMSDvnvDjt,mpTpjTFggpmjmTFggTjmpP
2,vtCSGRMBDzHddvBHBzR,hrlcZhlLzWNlqblhzcr
3,shhszHNHHZWqSzV,NdClMjlFjBBbNTB
4,tQQGmnrMnJnGfmvrRR,PCjlbljFBdjFCjTjnP


In [28]:
# There are 300 rucksacks

df.shape

(300, 2)

In [29]:
# storing the letters that repeat in both rucksack compartments

store=[] # used to store repeating letters
for z in range(0,300):
  for i, j in itertools.product(df["first"][z], df["second"][z]): # itertools allows parralel comparison between the two compartments
    if i==j:
      store.append(i) # append to the new list when the letter repeats between the first and second compartment
      break # break because the letter repetition should only be recorded once

In [30]:
# The length is the same as the number of rucksacks which means that the loop has been created successfully

len(store)

300

In [31]:
# Creating a dataframe that contains priority rules

# creating a list of all the letters of the alphabet
list_score=[]
for i in ascii_lowercase:
  list_score.append(i)
for i in ascii_uppercase:
  list_score.append(i)

# creating a new priority dataframe
priority=pd.DataFrame()
# adding the letters of the alphabet to the priority dataframe
priority["alphabet"]=list_score

In [32]:
# The priority dataframe contains 52 rows
# lowercase letters in the alphabet column need to have priorities 1 through 26
# uppercase letters in the alphabet column need to have priorities 27 through 52

priority.shape

(52, 1)

In [33]:
# adding priorities to the priority dataframe

# Creating a list that contains numbers between 1 and 52 (inclusive)
priorities=[]
for i in range(1, 53):
  priorities.append(i)

# Creating a priority column in the priority dataframe
priority["priority"]=priorities

# There are now 2 columns in the priority dataframe
priority.shape

(52, 2)

In [35]:
# Calculating the total priority score from the input

total=[]
#loop through all the repetitive letters
for z in range(0, 300):
  for i in store[z]:
    # loop through all the priorities outcomes
    for u in range(0, 52):
      if i==priority["alphabet"][u]: # when the repetitive letter matches the priority letter
        total.append(priority["priority"][u]) # then append the appropriate priority score to the total score

# Viewing the total score
sum(total)

7850

Part 2

What is the total priority score based on the letter that occurs in all 3 rucksacks of a group of elfs?

In [37]:
# Creating a loop that splits elfs into groups of 3

groups=[]
for i in range(0,300,3):
  groups.append(excel[0][i:i+3]) # reminder: excel dataframe contains the original input

groups[1]

3                  shhszHNHHZWqSzVNdClMjlFjBBbNTB
4            tQQGmnrMnJnGfmvrRRPCjlbljFBdjFCjTjnP
5    mRwtfGrMmJtwRDvQJQrJpMLSzVDHzhzHZqZzqSzcWVWH
Name: 0, dtype: object

In [38]:
# Creating a dataframe that contains each group in a seperate column

group_df=pd.DataFrame(groups)
group_df=group_df.transpose()
group_df.columns = [x for x in range(0,100)] # naming columns as numbers
group_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99
0,WwcsbsWwspmFTGVV,,,,,,,,,,...,,,,,,,,,,
1,RHtMDHdSMnDBGMSDvnvDjtmpTpjTFggpmjmTFggTjmpP,,,,,,,,,,...,,,,,,,,,,
2,vtCSGRMBDzHddvBHBzRhrlcZhlLzWNlqblhzcr,,,,,,,,,,...,,,,,,,,,,
3,,shhszHNHHZWqSzVNdClMjlFjBBbNTB,,,,,,,,,...,,,,,,,,,,
4,,tQQGmnrMnJnGfmvrRRPCjlbljFBdjFCjTjnP,,,,,,,,,...,,,,,,,,,,


In [39]:
# Note
# There are 297 nulls in each column because each member of each group retains the original index
  # This means that 3 rows correspond to the group and 297 nulls correspond to indexes of other groups
# However, all the data required is in the dataframe

group_df.isnull().sum().head()

0    297
1    297
2    297
3    297
4    297
dtype: int64

In [40]:
# Creating a loop that determines the badge of each group of elfs
# The badge is the letter that is contained within the rucksack of every elf in the group

badge=[]
counter=0
for u in range(0, 300, 3):
  try:
    for i, j in itertools.product(group_df[counter][u], group_df[counter][u+1]):
      if i==j: # when the item type is the same between the 1st and 2nd elf
        for x in group_df[counter][u+2]: # then loop through the rucksack of the 3rd elf
          if x==i: # if the 3rd elf also contains the item
            badge.append(x) # then append the item to the badge list
            counter+=1 # when the badge is found, the counter increases by 1 which allows going to the next group
            break # when badge is found, break from the loop to go to the next group
          else:
            continue # if the badge has not been found, continue the comparison between the 1st and 2nd elf to find the badge
  except:
    TypeError # necessary to get around the nulls in the dataset
    continue

# Viewing the length of the badge list
len(badge) # length should be equal to the number of groups (100)

100

In [41]:
# Calculating the total badge score for part 2

total_p2=[]

# loop through all the letters in the badge list
for z in range(0, 100):
  for i in badge[z]:
    # loop through all the outcomes in the priority dataframe
    for u in range(0, 52):
      if i==priority["alphabet"][u]: # when the letter in the badges list matches the letter in the priority dataframe
        total_p2.append(priority["priority"][u]) # then append the appropriate score from the priority dataframe

# Viewing the total sum for part 2
sum(total_p2)

2581