--- Day 6: Custom Customs ---
As your flight approaches the regional airport where you'll switch to a much larger plane, customs declaration forms are distributed to the passengers.

The form asks a series of 26 yes-or-no questions marked a through z. All you need to do is identify the questions for which anyone in your group answers "yes". Since your group is just you, this doesn't take very long.

However, the person sitting next to you seems to be experiencing a language barrier and asks if you can help. For each of the people in their group, you write down the questions for which they answer "yes", one per line. For example:

abcx
abcy
abcz
In this group, there are 6 questions to which anyone answered "yes": a, b, c, x, y, and z. (Duplicate answers to the same question don't count extra; each question counts at most once.)

Another group asks for your help, then another, and eventually you've collected answers from every group on the plane (your puzzle input). Each group's answers are separated by a blank line, and within each group, each person's answers are on a single line. For example:

abc

a
b
c

ab
ac

a
a
a
a

b
This list represents answers from five groups:

The first group contains one person who answered "yes" to 3 questions: a, b, and c.
The second group contains three people; combined, they answered "yes" to 3 questions: a, b, and c.
The third group contains two people; combined, they answered "yes" to 3 questions: a, b, and c.
The fourth group contains four people; combined, they answered "yes" to only 1 question, a.
The last group contains one person who answered "yes" to only 1 question, b.
In this example, the sum of these counts is 3 + 3 + 3 + 1 + 1 = 11.

For each group, count the number of questions to which anyone answered "yes". What is the sum of those counts?

Your puzzle answer was 6947.

The first half of this puzzle is complete! It provides one gold star: *

In [1]:
import pandas as pd
import numpy as np

In [2]:
file_path = 'C:/Users\Loren/Desktop/advent_of_code_day6.txt'

In [3]:
#ingest data between blank lines as associate with a single group. 
#so can't go straight to df, use a detour to manage split by blank line

with open(file_path) as f:
    lines = f.read()

groups = lines.split("\n\n") #2 blanks to split on blank row instead of just new line..

In [4]:
df = pd.DataFrame(groups)
df.rename(columns={0:'raw_groups'}, inplace=True)

In [5]:
def clean_up_and_count(raw_groups):
    '''Remove duplicates and \n from each group's string of yes responses. 
        set() will create a set of unique letters in the string, 
        and "".join() will join the letters back to a string in arbitrary order.
        Count = number of unique letters in each clean string'''
    no_new_lines = raw_groups.replace('\n','') #replace deals with new lines in middle of string, strip would deal with only start and end occurrences
    clean = "".join(set(no_new_lines)) 
    count = len(clean)
    return count

df['count_unique_yeses'] = df.apply(lambda x: clean_up_and_count(x['raw_groups']), axis=1)    

In [6]:
df.head()

Unnamed: 0,raw_groups,count_unique_yeses
0,xav\nuavx\nxavsi\nyavx,7
1,efokjptizdcwmqnuh\nqgfdvurtnjwpichxk\ntaqkcunf...,24
2,mzbg\ntmg\nrlvge\nhgpbzn\ncagkijyu,19
3,ahynbmqljzpwxokcfrtsgeud\nxwzcmdhkrjnupegqlyoa...,24
4,jrxcnyadsgbtpvoze\nsecpytarvdzjgb\nycsfzgtedar...,18


In [7]:
#answer - sum uniques per group
df['count_unique_yeses'].sum()

6947

Part 2

As you finish the last group's customs declaration, you notice that you misread one word in the instructions:

You don't need to identify the questions to which anyone answered "yes"; you need to identify the questions to which everyone answered "yes"!

Using the same example as above:

abc

a
b
c

ab
ac

a
a
a
a

b
This list represents answers from five groups:

In the first group, everyone (all 1 person) answered "yes" to 3 questions: a, b, and c.
In the second group, there is no question to which everyone answered "yes".
In the third group, everyone answered yes to only 1 question, a. Since some people did not answer "yes" to b or c, they don't count.
In the fourth group, everyone answered yes to only 1 question, a.
In the fifth group, everyone (all 1 person) answered "yes" to 1 question, b.
In this example, the sum of these counts is 3 + 0 + 1 + 1 + 1 = 6.

For each group, count the number of questions to which everyone answered "yes". What is the sum of those counts?

Your puzzle answer was 3398.

Both parts of this puzzle are complete! They provide two gold stars: **

In [11]:
def yes_for_all_count(raw_groups):
    '''Count number of single lines in group = number of group members
        Count occurrences of each char in string, where count = group members - add 1 to score '''
    
    #number of lines = number group members
    group_members_count = raw_groups.count('\n') + 1 #plus one for first line in addition to new lines
    
    #dict for letter counts in each group
    group_letter_counts = {}
    
    for char in raw_groups:
        if char.isalpha(): #exclude new lines from char count
            if char not in group_letter_counts: #letters not yet counted
                group_letter_counts.update({char: 1}) #first letter is count 1..obvs. 
            else:
                new_count = group_letter_counts.get(char) + 1 #add one to existing count in dict
                group_letter_counts.update({char: new_count})

    #how many letters have count == group member number?
    yes_for_all = [] #list for those letters with yes from all group members
    for item in group_letter_counts.items():
        if item[1] == group_members_count:
            yes_for_all.append(item[0])  

    yes_for_all_count = len(yes_for_all)

    return yes_for_all_count

df['yes_for_all_count'] = df.apply(lambda x: yes_for_all_count(x['raw_groups']), axis=1)   

In [12]:
df.head()

Unnamed: 0,raw_groups,count_unique_yeses,yes_for_all_count
0,xav\nuavx\nxavsi\nyavx,7,3
1,efokjptizdcwmqnuh\nqgfdvurtnjwpichxk\ntaqkcunf...,24,13
2,mzbg\ntmg\nrlvge\nhgpbzn\ncagkijyu,19,1
3,ahynbmqljzpwxokcfrtsgeud\nxwzcmdhkrjnupegqlyoa...,24,22
4,jrxcnyadsgbtpvoze\nsecpytarvdzjgb\nycsfzgtedar...,18,14


In [13]:
#answer - count of all yeses by group
df['yes_for_all_count'].sum()

3398