# About

I tried to rebuild the shortest superpermutation (for n=7) on my own and build the submission on top of it. Maybe there is a chance to tweak the solution later on while building the superpermutation. 

Sofare the shortest known solution for the superpermutiatio is [5906](https://www.kaggle.com/ilialar/santa-2021-baseline-and-optimization-ideas).

## References:
- Some helpful info: https://www.kaggle.com/ilialar/santa-2021-baseline-and-optimization-ideas

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import itertools

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Data preperation

Let's define the number of symbols for the superpermutation (in this competition n=7):

In [None]:
NUM_SYMBOLS = 7

Build all permutations:

In [None]:
# from https://www.kaggle.com/ilialar/santa-2021-baseline-and-optimization-ideas
permutations = [''.join(x) for x in itertools.permutations([str(i+1) for i in range(NUM_SYMBOLS)], NUM_SYMBOLS)]
len(permutations)

I put the permutation in a DataFrame. (There is a lot of opportunity to speed up the calculations further on, but that's not on my focus today. Putting data in a table makes it easy to examine them.)

In [None]:
data = pd.DataFrame(permutations, columns=['perm']).sort_values(by='perm').reset_index(drop=True)

Next I build a rolling value for each permutation. That already implicitly includes 7 permutations in one String with length 13.
Here is an example:

```java
1234567
.2345671
..3456712
...4567123
....5671234
.....6712345
......7123456
=============
1234567123456
```

Each of these permutations gets assigned to the same group (`roll_grp`). In the example: 1234567.

In [None]:
data['roll_grp'] = data.perm.apply(lambda x: x[x.find('1'):]+x[:x.find('1')])
data['roll_val'] = data.perm+data.perm.str[:-1]

data

Now I combine different `roll_val` with the most possible overlapping.
The possible overlap-sizes are 5,4,3,2 and 1.

Example:

```java
12345671 23456
         23456 17234561
=======================
12345671 23456 17234561
```


In [None]:
overlaps = [(i+1) for i in range(0, NUM_SYMBOLS-2)]
overlaps.reverse()
overlaps

# Approximating the Superpermutation length

If we just contat the roll_val, we get a length of 9360:

`len(roll_val) * count(distinct roll_grp) = (2*NUM_SYMBOLS-1)*(NUM_SYMBOLS-1)! = 13 * 720 = 9360`


We can reduce the length by the overlappings by 3447. Overlappings:

```python
redc = 0
for ov in overlaps:
  redc += ov * ov! * grp_compration_rate #, with grp_compration_rate = ov

=> 5*5!*5 + 4*4!*4 + 3*3!*3 + 2*2!*2 + 1*1!*1 = 3447
```

So I expect a Supperpermutation length of 9360 - 3447 = 5913.

> Note: till now I could not figure out where I left the remaining 7 that'd lead to the best known solution of 5609. It's likely that the score doesn't improve further with my approach, because the `roll_vall` always ties 7 permutations together (see comment section below).

# Generating the Superpermutation

Calculate the short superpermutation:

In [None]:
def gen_code(av_data = data.copy(), gen_code_grp = True):
   
    code = ''
    num_found=0

    while av_data.shape[0]>0:
        for idx, row in av_data.iterrows():
            found=False

            if code == '':
                code = row.roll_val
                found=True
                num_found +=1
                data.loc[data.roll_grp==row.roll_grp, 'code_grp']=0
                av_data = av_data[av_data.roll_grp!=row.roll_grp]
                break
            else:
                for ov in overlaps:
                    for _, row2 in av_data.iterrows():
                        if code[-ov:] == row2.roll_val[:ov]:
                            v_last_ov = ov
                            code += row2.roll_val[ov:]
                            found=True
                            break
                    if found:
                        break
                if found:
                    num_found +=1
                    
                    # put processed permutation in one of three groups
                    if gen_code_grp:
                        if num_found <= data.roll_grp.drop_duplicates().shape[0]//3:
                            data.loc[data.roll_grp==row2.roll_grp, 'code_grp']=0
                        elif num_found <= data.roll_grp.drop_duplicates().shape[0]//3*2:
                            data.loc[data.roll_grp==row2.roll_grp, 'code_grp']=1
                        else:
                            data.loc[data.roll_grp==row2.roll_grp, 'code_grp']=2

                    av_data = av_data[av_data.roll_grp!=row2.roll_grp]
                    break

    print(f'The generated string containing all permutations has {len(code)} characters.\n')

    return code

In [None]:
%%time

code = gen_code(av_data = data.copy(), gen_code_grp = True)


# Split and extend Superpermutation for competition
For the competition the supermermutation needs to be split in three parts.
We chop the code in three pieces with overlapping ends and beginnings, so we do not loose permutations on the split points.

In [None]:
lc = len(code)

fin_queues = [None]*3

fin_queues[0] = code[:(lc//3+6)]
fin_queues[1] = code[(lc//3):-(lc//3-6)]
fin_queues[2] = code[-(lc//3+6):]

max_len = 0
for it, q in enumerate(fin_queues):
    max_len = max(max_len, len(q))
    print(f'String {it} contains {len(q)} characters')

print(f'\nThe longest String contains {max_len} characters')

It is likely that the overlapping characters produce some overhead.

While calculating the suppermermutation we assigned each permutation to one of three groups (`data.code_grp`).
Let's try to calculate for each group an independent superpermutation.

In [None]:
%%time
fin_queues = [None]*3
for it in range(3):
    fin_queues[it] = gen_code(av_data = data[data.code_grp==it].copy(), gen_code_grp = False)
    
max_len = 0
for it, q in enumerate(fin_queues):
    max_len = max(max_len, len(q))
    print(f'String {it} contains {len(q)} characters')

print(f'\nThe longest String contains {max_len} characters')

We gopt rid of the overlapping characters and reduced the longest string from 1977 to **1973** characters.

Assert no permutation is missing:

In [None]:
i=0
for p in permutations:
    if (p in fin_queues[0])|(p in fin_queues[1])|(p in fin_queues[2]):
        pass 
    else:
        print(p)
        i+=1
assert(i == 0)

Find best Santa couple (couples that are equaly distributed among the three strings):

In [None]:
perm_santa_couple = [''.join(x) for x in itertools.permutations([str(i+1) for i in range(NUM_SYMBOLS)], 2)]

for ps in perm_santa_couple:
    log = f'Santas: {ps}\n'
    
    avg = 0
    mn = 9999
    for qn, q in enumerate(fin_queues):
        avg += q.count(ps)/3
        mn = min(mn, q.count(ps))
        log += f'counts in Q({qn}):{q.count(ps)}\n'
    log+=f'avg: {avg:.2f}, min: {mn}\n'
    
    if (mn >= 50)&(avg>=50):
        print(log)
        

I pick 45 as Santa couple and add all missing Santa couple permutation at the end of the Strings.

In [None]:
santa_couple_code = '45'

def fill_q(fin_queues):
    santa_couple_add = 0
    for q_id in range(3):
        print(f'queue({q_id})#')
        print(f'length before santa couple added: {len(fin_queues[q_id])}')
        for perm in data[data.perm.str.startswith(santa_couple_code)].perm.values:
            if perm not in fin_queues[q_id]:
                fin_queues[q_id] = fin_queues[q_id]+perm
                santa_couple_add +=1
        print(f'length after: {len(fin_queues[q_id])}\n')
    
    print(f'SantasCoder to add: {santa_couple_add}\n')
    print(f'* {len(fin_queues[0]),len(fin_queues[1]),len(fin_queues[2])}\n')
    return fin_queues

In [None]:
fin_queues_ext = fill_q(fin_queues)

We end up with the longest string having **2533** characters.

# Submission

In [None]:
replace_dict = {
    "4": '🎅', 
    "5": '🤶', 
    "3": '🦌', 
    "1": '🧝', 
    "2": '🎄', 
    "6": '🎁', 
    "7": '🎀', 
}

ans = fin_queues_ext.copy()
for i in range(3):
    for k,v in replace_dict.items():
        ans[i] = ans[i].replace(k, v)

In [None]:
ans[0][:20]

In [None]:
sub = pd.DataFrame()
sub['schedule'] = [ans[i] for i in range(3)]

sub

In [None]:
sub.to_csv('submission.csv', index = False)