# Signals, I guess

We were given a broken communicator and now we need to write a subroutine that will find the first set of four characters that do not repeat. This particular set of characters that do not repeat, wherever they are, are called the start-of-packet marker. In the example on the website, `mjqjpqmgbljsphdztnvjfqwrcgsmlb`, the beginning of the signal, which is directly after the start-of-packet marker, comes at index 7. The start-of-packet marker is `jpqm`

We need to figure out at which index the signal starts.

## Input formatting
A wall of text, all random lowercase numbers. Looks like we don't have to split anything today!

In [10]:
file = open('input.txt')
datastream = file.read()
datastream

'zdnnfgfsgffgllwrwprwrgwwpssznzrnznllstszsttpdptdpdmdsdzsdscsmcmttdllbsbwwtwnwswcchshlhjhfhwfftcchnnfwwbqwqwrqqgmgzmmwzwfzwfzzzsmzzrczcmmhphzhbbbgdbgddmggwwbbttvmtvttfsfttjlttdfdsdqqczqzffbrfbbfbrrmdrrlslshllwzwrzrzzlqldqdjdjwjvjzjrjjcsszjjqfqnfqqsrqrccbhhwphwhbwwlzwwjwfjwfwzfwzffssvjsjddcsdslslrsrfsrffsggdffrcrcdcpprrzbrzbbtstvvqttbqqgfgsggtvtrvtvbbrqrsqrsqsvsbbzmbmgmvgmmrqrzzbbnjjlwjjfssdrdbrbffwrrrjjgcgtgvvjbjjjsqjsqqncntnndcdrcrhhsgstslldwdbwdbdtbdbggpnndhdvhhvrrlzzfjjffzszvvzgzhhqzqttdhdrrwdwzdwdbwwfsfwsfwfqqzwzbzmzwmmvgggvssvwswfwswhwzzqtqrtthhbbjggjppnpfnfmnnghgrhhtvtqtsttbpprzzwqqfhqfflttzffrprwpwspplzpztptgtltjlttwwsrsrprwrsswnwttcscqsqlsqqhshbhlblnnpznnzlnndrnrcncvcqcjqqvhvppjzjddzbzsztzqqlmlnlnblnlwnllswszwzrzddqhdqhqffhfhjjpbbhvbvmmfhmhcccjlclhcllbrlblddnpplggcmmvddmmqzzmqmppnjpjjjzllqjqccrwwhzhnnlmlhhbbtztvtltvtnvtvltlrtrllbttbzzfwfrrjzjbbmgghjhqhlljnjhhhjzjcjvcvwwzczwcwgcclflnnsvvcncbncnqqpjqjnjwjrrgqrrmqmfmmmjgjfgjfftggqdgqddgcdcdwdrdsswqqphpjhhdjjwswfwfnfqnqccvhvzvmzvzz

In [11]:
# there's one, lonely newline character that we're going to remove...
datastream = datastream[:len(datastream) - 1]
datastream

'zdnnfgfsgffgllwrwprwrgwwpssznzrnznllstszsttpdptdpdmdsdzsdscsmcmttdllbsbwwtwnwswcchshlhjhfhwfftcchnnfwwbqwqwrqqgmgzmmwzwfzwfzzzsmzzrczcmmhphzhbbbgdbgddmggwwbbttvmtvttfsfttjlttdfdsdqqczqzffbrfbbfbrrmdrrlslshllwzwrzrzzlqldqdjdjwjvjzjrjjcsszjjqfqnfqqsrqrccbhhwphwhbwwlzwwjwfjwfwzfwzffssvjsjddcsdslslrsrfsrffsggdffrcrcdcpprrzbrzbbtstvvqttbqqgfgsggtvtrvtvbbrqrsqrsqsvsbbzmbmgmvgmmrqrzzbbnjjlwjjfssdrdbrbffwrrrjjgcgtgvvjbjjjsqjsqqncntnndcdrcrhhsgstslldwdbwdbdtbdbggpnndhdvhhvrrlzzfjjffzszvvzgzhhqzqttdhdrrwdwzdwdbwwfsfwsfwfqqzwzbzmzwmmvgggvssvwswfwswhwzzqtqrtthhbbjggjppnpfnfmnnghgrhhtvtqtsttbpprzzwqqfhqfflttzffrprwpwspplzpztptgtltjlttwwsrsrprwrsswnwttcscqsqlsqqhshbhlblnnpznnzlnndrnrcncvcqcjqqvhvppjzjddzbzsztzqqlmlnlnblnlwnllswszwzrzddqhdqhqffhfhjjpbbhvbvmmfhmhcccjlclhcllbrlblddnpplggcmmvddmmqzzmqmppnjpjjjzllqjqccrwwhzhnnlmlhhbbtztvtltvtnvtvltlrtrllbttbzzfwfrrjzjbbmgghjhqhlljnjhhhjzjcjvcvwwzczwcwgcclflnnsvvcncbncnqqpjqjnjwjrrgqrrmqmfmmmjgjfgjfftggqdgqddgcdcdwdrdsswqqphpjhhdjjwswfwfnfqnqccvhvzvmzvzz

In [14]:
def find_start_of_packet(stream):
    for i, signal in enumerate(stream):
        signal_set = set([signal, stream[i+1], stream[i+2], stream[i+3]])
        uniq_list = list(signal_set)
        if len(signal_set) == 4:
            return i + 4
    return 'you goofed'

# testing with example in description
# expecting 7
find_start_of_packet('mjqjpqmgbljsphdztnvjfqwrcgsmlb')

7

In [15]:
# expecting 5
print(find_start_of_packet('bvwbjplbgvbhsrlpgdmjqwftvncz'))

# expecting 10
print(find_start_of_packet('nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg'))

5
10


In [16]:
find_start_of_packet(datastream)

1566

# Got it in one!!!
Now we need to find the start of message marker. Which consists of 14 distinct characters. 

In [17]:
# wanted to check we can check lengths of sets...
len(set([1, 2, 2, 6]))

3

In [21]:
def find_start_of_message(stream):
    for i, signal in enumerate(stream):
        signal_set = set()
        for x in range(14):
            signal_set.add(stream[i+x])
        
        if len(signal_set) == 14:
            return i + 14
    return 'you goofed'

# expecting 23
print(find_start_of_message('bvwbjplbgvbhsrlpgdmjqwftvncz'))

# expecting 19
print(find_start_of_message('mjqjpqmgbljsphdztnvjfqwrcgsmlb'))

23
19


In [22]:
find_start_of_message(datastream)

2265

# Success!

Technically we could combine these two methods and pass in the number of distinct characters we are looking for as well as the datastream.