# Day 2: Gift Shop ---
```
You get inside and take the elevator to its only other stop: the gift shop. "Thank you for visiting the North Pole!" gleefully exclaims a nearby sign. You aren't sure who is even allowed to visit the North Pole, but you know you can access the lobby through here, and from there you can access the rest of the North Pole base.

As you make your way through the surprisingly extensive selection, one of the clerks recognizes you and asks for your help.

As it turns out, one of the younger Elves was playing on a gift shop computer and managed to add a whole bunch of invalid product IDs to their gift shop database! Surely, it would be no trouble for you to identify the invalid product IDs for them, right?

They've even checked most of the product ID ranges already; they only have a few product ID ranges (your puzzle input) that you'll need to check. For example:

11-22,95-115,998-1012,1188511880-1188511890,222220-222224,
1698522-1698528,446443-446449,38593856-38593862,565653-565659,
824824821-824824827,2121212118-2121212124
(The ID ranges are wrapped here for legibility; in your input, they appear on a single long line.)

The ranges are separated by commas (,); each range gives its first ID and last ID separated by a dash (-).

Since the young Elf was just doing silly patterns, you can find the invalid IDs by looking for any ID which is made only of some sequence of digits repeated twice. So, 55 (5 twice), 6464 (64 twice), and 123123 (123 twice) would all be invalid IDs.

None of the numbers have leading zeroes; 0101 isn't an ID at all. (101 is a valid ID that you would ignore.)

Your job is to find all of the invalid IDs that appear in the given ranges. In the above example:

11-22 has two invalid IDs, 11 and 22.
95-115 has one invalid ID, 99.
998-1012 has one invalid ID, 1010.
1188511880-1188511890 has one invalid ID, 1188511885.
222220-222224 has one invalid ID, 222222.
1698522-1698528 contains no invalid IDs.
446443-446449 has one invalid ID, 446446.
38593856-38593862 has one invalid ID, 38593859.
The rest of the ranges contain no invalid IDs.
Adding up all the invalid IDs in this example produces 1227775554.

What do you get if you add up all of the invalid IDs?
```

In [3]:
import numpy as np
import math
import sys
sys.path.append("..")
from common.utils import read_lines

test_path = "test.txt"
data_path = "data.txt"

In [21]:
"""
This could get realy unwheildy realy fast if I use too many checks
"""

class Day2A:
    def __init__(self, path):
        self.data = read_lines(path)
        self.ranges = self._read_ranges_from_data()
        pass

    def _read_ranges_from_data(self):
        """
        Separate with the commas, then hyphons,
        create and store a Range object for each
        element in the list        
        """

        self.string_ranges = self.data[0].split(",")
        ranges = []
        for sr in self.string_ranges:
            start, stop = sr.split("-")
            # NOTE: Python ranges work from start to stop-1 in step=1
            ranges.append(range(int(start), int(stop) + 1, 1))

        return ranges
    
    def _get_invalid_ids(self):
        """
        Steps to check how many invalid IDS are within each range:

            for element in range[i]:
                get the length of the element (*):
                    if the lenght is odd, it can not be invalid
                    -> I can not be exclusively formed by repeating two identical numbers.

                    if even, then turn it into string and check:
                        str[:half] == str[half:]
                        if so, it is invalid (add one to the count)


        (*) I can rule out a whole bunch of elements at once by checking the length
        of the start and end.
        """
        result = 0
        for numeric_range in self.ranges:
            start, stop = numeric_range.start, numeric_range.stop

            # Number of digits in the range
            ndigits_start, ndigits_stop= [int(math.log10(x)) + 1 for x in [start, stop]]
            
            # If all numbers in the range have odd number of digits, none are invalid:
            if ndigits_start == ndigits_stop and ndigits_start % 2 == 1:
                continue

            # TODO: identify valid sub-ranges of numbers to remove unnecesary checks

            for element in numeric_range:
                str_element = str(element)
                length = int(math.log10(element)) + 1
                if str_element[:length//2] == str_element[length//2:]:
                    result += element
        
        return result


day2 = Day2A(test_path)
day2._get_invalid_ids()


1227775554

In [22]:
day2 = Day2A(data_path)
day2._get_invalid_ids()

22062284697

# Part Two ---
```
The clerk quickly discovers that there are still invalid IDs in the ranges in your list. Maybe the young Elf was doing other silly patterns as well?

Now, an ID is invalid if it is made only of some sequence of digits repeated at least twice. So, 12341234 (1234 two times), 123123123 (123 three times), 1212121212 (12 five times), and 1111111 (1 seven times) are all invalid IDs.

From the same example as before:

11-22 still has two invalid IDs, 11 and 22.
95-115 now has two invalid IDs, 99 and 111.
998-1012 now has two invalid IDs, 999 and 1010.
1188511880-1188511890 still has one invalid ID, 1188511885.
222220-222224 still has one invalid ID, 222222.
1698522-1698528 still contains no invalid IDs.
446443-446449 still has one invalid ID, 446446.
38593856-38593862 still has one invalid ID, 38593859.
565653-565659 now has one invalid ID, 565656.
824824821-824824827 now has one invalid ID, 824824824.
2121212118-2121212124 now has one invalid ID, 2121212121.
Adding up all the invalid IDs in this example produces 4174379265.

What do you get if you add up all of the invalid IDs using these new rules?
```

In [106]:
class Day2B(Day2A):
    def _get_invalid_ids_B(self):
        """
        now the number of repetitions is not 2, so how to check this new condition?

        - I found something about the Knuth-Morris-Pratt Algorithm:
        https://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/StringMatch/kuthMP.htm
        
        property if the Failure function:
        if the string is made up of a repeating substring, the Failure function
        is n 0's (where n is the length of the substring) 
        and then monotonic ascending numbers from 1 to m-n

        BUT, if there are incomplete substrings, the same is true and the ID is invalid:

        we need to check that n * k = m where k is an integer

        """
        result = 0
        for numeric_range in self.ranges:

            for element in numeric_range:
                str_element = str(element)

                if is_repetition(str_element):
                    result += element
        return result

def FailureFunction(pattern):
    m = len(pattern)
    pi = np.zeros(shape=(m, )).astype(int)
    j = 0

    for i in range(1, m):
        while(j >0 and pattern[i]!= pattern[j]):
            j = pi[j -1]
        if pattern[i] == pattern[j] : 
            j+=1
            pi[i] = j
        else: 
            pi[i] = 0
    return pi

# 

def is_repetition(pattern):
    pi = FailureFunction(pattern)
    n = len(pattern)
    l = pi[-1]
    return l > 0 and n % (n - l) == 0


day2 = Day2B(test_path)
day2._get_invalid_ids_B()

4174379265

In [None]:
"""
property if the Failure function (hypothesis):
if the string is made up of a repeating substring, the Failure function
is n 0's (where n is the length of the substring) 
and then monotonic ascending numbers from 1 to m-n

BUT, if there are incomplete substrings, the same is true and the ID is invalid:

we need to check that n * k = m where k is an integer

also, The last element of FailureFunction must be length - subpattern length

"""
#######
""" 
Turns out there is a way to check for specificaly this property, and the
condition is:

given
 - n = len(pattern)
 - l = pi[n-1]

l>0 and n mod (n-l) = 0
"""

"\nproperty if the Failure function (hypothesis):\nif the string is made up of a repeating substring, the Failure function\nis n 0's (where n is the length of the substring) \nand then monotonic ascending numbers from 1 to m-n\n\nBUT, if there are incomplete substrings, the same is true and the ID is invalid:\n\nwe need to check that n * k = m where k is an integer\n\nalso, The last element of FailureFunction must be length - subpattern length\n\n"

In [None]:
# Testing
pattern = "1188511885"
is_repetition(pattern)


np.True_

In [107]:
day2 = Day2B(data_path)
day2._get_invalid_ids_B()

46666175279