# The Mery Movie Montage

> José Luis Lobera del Castillo y Rafael Andrade Ruíz Capetillo

## Overview

### Description

People seem to be getting in the Christmas spirit earlier and earlier each year. Decorations appear for sale in stores in the fall, Christmas songs are on the radio in October…

The Elves at the North Pole are starting to recognize this, and need to work as fast as possible to launch their latest holiday offering: SantaTV+! A 24/7 streaming television channel where it’s “Always Christmas, All the Time.” To debut their new station, they’ve decided to kick things off with a made-for-television Christmas movie marathon! They’re excited for the premiere of such movies as 🎅, 🤶, 🦌, 🧝, 🎄, 🎁, and 🎀!

But elves know that just as important as the movie themselves is the order they’ll be aired. So the elves have decided the best way to figure out which order is best is to watch all the movies in every possible combination to see which feels the most Christmas-y.

Your job is to help the elves by giving them the shortest viewing schedules that shows them every combination of movies so they can get SantaTV+ live as soon as possible! The elves have formed three movie-watching teams to lighten the load, so every combination must be seen by at least one of their groups. But they’re also pretty sure they want to kick off the movie marathon with the 🎅 and 🤶 movies back-to-back, so be sure that each group has all the combinations that start with those. And finally, the elves have agreed to two sugar breaks, so you’re allowed to give each group up to two 🌟 wildcards, which will play all the movies at once while they’re snacking, which will help speed things along.

They can’t launch SantaTV+ until all the groups have finished watching - so help give them the most efficient schedule to see every Christmas movie combination, and help them get back to making toys!

### Objective

Your objective is to find a set of three strings containing every permutation of the seven symbols 🎅, 🤶, 🦌, 🧝, 🎄, 🎁, and 🎀 as substrings, subject to the following conditions:

- Every permutation must be in at least one string.

- Each permutation beginning with 🎅🤶 must be in all three strings.

- Each string may have up to two wildcards 🌟, which will match any symbol in a permutation.

- No string of length seven containing more than one wildcard will count as a permutation.

Your score is the length of the longest of the three strings. This is a minimization problem, so lower scores are better.

### Example

Let's consider a simplified problem where we only use three symbols 🎅, 🤶, 🦌and no wildcard, and where our solution consists of only two strings.

There are six permutations of these three symbols: 🎅🤶🦌, 🎅🦌🤶, 🤶🎅🦌, 🤶🦌🎅, 🦌🎅🤶, and 🦌🤶🎅. The permutation 🎅🤶🦌 must be a substring of both solution strings while the other five permutations must be in at least one of the strings.

A valid solution for this problem is:

1. 🤶🎅🦌🤶🎅🤶🦌

2. 🎅🤶🦌🎅🤶

which would have a score of 7, the length of string 1.

If we were allowed the use of one wildcard, we could have the solution:

1. 🎅🤶🌟🦌🤶🎅

undefined. 🎅🤶🦌🎅🤶

with a score of 6. The wildcard can represent different symbols in different permutations.

### Submission

Your solution should consist of three schedules containing permutations of the seven symbols with optional wildcards as described above. 

## Genetic Algorithms Approach

In [43]:
import re
import random
import pprint
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
from copy import copy, deepcopy

### Exploratory Data Analysis

#### distance_matrix.csv

This file gives the distance between each permutation in the TSP formulation. The entry at index `[i, j]` is the distance from permutation `i` to permutation `j`.

In [44]:
df_distance_matrix = pd.read_csv("./santa-2021/distance_matrix.csv")

In [45]:
print(df_distance_matrix.shape) # 7! * 7!
df_distance_matrix.head()

(5040, 5041)


Unnamed: 0,Permutation,🎅🤶🦌🧝🎄🎁🎀,🎅🤶🦌🧝🎄🎀🎁,🎅🤶🦌🧝🎁🎄🎀,🎅🤶🦌🧝🎁🎀🎄,🎅🤶🦌🧝🎀🎄🎁,🎅🤶🦌🧝🎀🎁🎄,🎅🤶🦌🎄🧝🎁🎀,🎅🤶🦌🎄🧝🎀🎁,🎅🤶🦌🎄🎁🧝🎀,...,🎀🎁🎄🦌🤶🎅🧝,🎀🎁🎄🦌🤶🧝🎅,🎀🎁🎄🦌🧝🎅🤶,🎀🎁🎄🦌🧝🤶🎅,🎀🎁🎄🧝🎅🤶🦌,🎀🎁🎄🧝🎅🦌🤶,🎀🎁🎄🧝🤶🎅🦌,🎀🎁🎄🧝🤶🦌🎅,🎀🎁🎄🧝🦌🎅🤶,🎀🎁🎄🧝🦌🤶🎅
0,🎅🤶🦌🧝🎄🎁🎀,0,7,7,7,7,7,7,7,7,...,6,6,6,6,6,6,6,6,6,6
1,🎅🤶🦌🧝🎄🎀🎁,7,0,7,7,7,7,7,7,7,...,5,5,5,5,5,5,5,5,5,5
2,🎅🤶🦌🧝🎁🎄🎀,7,7,0,7,7,7,7,7,7,...,6,6,6,6,6,6,6,6,6,6
3,🎅🤶🦌🧝🎁🎀🎄,7,7,7,0,7,7,7,7,7,...,7,7,7,7,7,7,7,7,7,7
4,🎅🤶🦌🧝🎀🎄🎁,7,7,7,7,0,7,7,7,7,...,7,7,7,7,7,7,7,7,7,7


#### permutations.csv

All permutations of the symbols 🎅, 🤶, 🦌, 🧝, 🎄, 🎁, and 🎀.

In [46]:
df_permutations = pd.read_csv("./santa-2021/permutations.csv")

In [47]:
print(df_permutations.shape) # 7!
df_permutations.head()

(5040, 1)


Unnamed: 0,Permutation
0,🎅🤶🦌🧝🎄🎁🎀
1,🎅🤶🦌🧝🎄🎀🎁
2,🎅🤶🦌🧝🎁🎄🎀
3,🎅🤶🦌🧝🎁🎀🎄
4,🎅🤶🦌🧝🎀🎄🎁


#### sample_submission.csv

A submission file in the correct format.

In [48]:
df_sample_submission = pd.read_csv("./santa-2021/sample_submission.csv")

In [49]:
print(df_sample_submission.shape)
df_sample_submission.head()

(3, 1)


Unnamed: 0,schedule
0,🎅🤶🦌🧝🎄🎁🎀🎁🎄🧝🦌🎅🤶🦌🧝🎄🎀🎁🤶🎄🧝🦌🎀🎅🤶🦌🧝🎁🎄🎀🤶🎁🎄🧝🦌🎅🤶🦌🧝🎁🎀🎄🤶🧝🦌🎀...
1,🎅🤶🦌🧝🎄🎁🎀🎅🤶🦌🧝🎄🎀🎁🎅🤶🦌🧝🎁🎄🎀🎅🤶🦌🧝🎁🎀🎄🎅🤶🦌🧝🎀🎄🎁🎅🤶🦌🧝🎀🎁🎄🎅🤶🦌🎄...
2,🎅🤶🦌🧝🎄🎁🎀🎅🤶🦌🧝🎄🎀🎁🎅🤶🦌🧝🎁🎄🎀🎅🤶🦌🧝🎁🎀🎄🎅🤶🦌🧝🎀🎄🎁🎅🤶🦌🧝🎀🎁🎄🎅🤶🦌🎄...


#### wildcards.csv

This file gives a mapping from each seven-letter substring (Factors) containing a wildcard symbol to the Permutation it can represent.

In [50]:
df_wildcards = pd.read_csv("./santa-2021/wildcards.csv")

In [51]:
print(df_wildcards.shape) # 7! * 7
df_wildcards.head()

(35280, 2)


Unnamed: 0,Factor,Permutation
0,🌟🤶🦌🧝🎄🎁🎀,🎅🤶🦌🧝🎄🎁🎀
1,🎅🌟🦌🧝🎄🎁🎀,🎅🤶🦌🧝🎄🎁🎀
2,🎅🤶🌟🧝🎄🎁🎀,🎅🤶🦌🧝🎄🎁🎀
3,🎅🤶🦌🌟🎄🎁🎀,🎅🤶🦌🧝🎄🎁🎀
4,🎅🤶🦌🧝🌟🎁🎀,🎅🤶🦌🧝🎄🎁🎀


In [52]:
df_wildcards['Factor'][0]

'🌟🤶🦌🧝🎄🎁🎀'

In [53]:
print("Len of str 0:", len(df_sample_submission['schedule'][0]))
print("Len of str 1:", len(df_sample_submission['schedule'][1]))
print("Len of str 2:", len(df_sample_submission['schedule'][2]))
#df_sample_submission['schedule'][2]

Len of str 0: 3497
Len of str 1: 840
Len of str 2: 6985


#### df_starts_with_mrms_clause

In [54]:
df_with_mrms_start = df_permutations[df_permutations['Permutation'].str.startswith('🎅🤶')]

In [55]:
print(df_with_mrms_start.shape)
df_with_mrms_start.head()

(120, 1)


Unnamed: 0,Permutation
0,🎅🤶🦌🧝🎄🎁🎀
1,🎅🤶🦌🧝🎄🎀🎁
2,🎅🤶🦌🧝🎁🎄🎀
3,🎅🤶🦌🧝🎁🎀🎄
4,🎅🤶🦌🧝🎀🎄🎁


#### df_without_mrms_permutations

In [56]:
df_without_mrms_start = df_permutations[np.logical_not(df_permutations['Permutation'].str.startswith('🎅🤶'))]

In [57]:
print(df_without_mrms_start.shape)
df_without_mrms_start.head()

(4920, 1)


Unnamed: 0,Permutation
120,🎅🦌🤶🧝🎄🎁🎀
121,🎅🦌🤶🧝🎄🎀🎁
122,🎅🦌🤶🧝🎁🎄🎀
123,🎅🦌🤶🧝🎁🎀🎄
124,🎅🦌🤶🧝🎀🎄🎁


### Objective Function


Function to evaluate the score of an individual according to the conditions of the problem.

In [58]:
def count_perms_in_substring(substrings, permutations):
    aux_perm = [perm in substrings[0] or perm in substrings[1] or perm in substrings[2] for perm in permutations]
    return np.array(aux_perm)

def count_mrms_in_substring(substrings, permutations):
    aux_perm = [perm in substrings[0] and perm in substrings[1] and perm in substrings[2] for perm in permutations]
    return np.array(aux_perm)

In [59]:
# objective_func(df_substrings: pandas DataFrame) -> (int) max_lenght , (int) score 
# max_lenght: lenght of the longest substring
# score : number of missing permutations

def general_objective_func(df_substrings):
        
    symbols = ['🎅', '🤶', '🦌', '🧝', '🎄', '🎁', '🎀']

    # BOOL ARR OF MRMS IN EVERY WILDCARD VARIATION
    mrms_in_wildcards = np.zeros(df_with_mrms_start.shape[0])
    mrms_in_wildcards.fill(False)

    # BOOL ARR OF PERMS IN EVERY WILDCARD VARIATION
    perm_in_wildcards = np.zeros(df_without_mrms_start.shape[0])
    perm_in_wildcards.fill(False)

    all_wildcards = np.array([[s.replace('🌟', symbol) for s in df_substrings['schedule']] for symbol in symbols])
    wildcard_subs = [''.join(all_wildcards[:,i]) for i in range(3)]

    # CHECK PERMUTATIONS THAT START WITH MR &. MRS CLAUSE
    mrms_in_sub = count_mrms_in_substring(wildcard_subs, df_with_mrms_start['Permutation'])
    mrms_in_wildcards = np.logical_or(mrms_in_wildcards, mrms_in_sub)
        
    # CHECK PERMUTATIONS THAT DON'T START WITH MR &. MRS CLAUSE
    perms_in_sub = count_perms_in_substring(wildcard_subs, df_without_mrms_start['Permutation'])
    perm_in_wildcards = np.logical_or(perm_in_wildcards, perms_in_sub)


    max_lenght = max([len(substring) for substring in df_substrings['schedule']])
    score = df_permutations.shape[0] - (np.sum(mrms_in_wildcards) + np.sum(perm_in_wildcards))

    return True if score == 0 else False # CHECK THAT THERE ARE NO REPEATED VALUES

def delete_nodes_objective_func(df_substrings):
    # BOOL ARR OF MRMS IN EVERY WILDCARD VARIATION
    mrms_in_wildcards = np.zeros(df_with_mrms_start.shape[0])
    mrms_in_wildcards.fill(False)

    # BOOL ARR OF PERMS IN EVERY WILDCARD VARIATION
    perm_in_wildcards = np.zeros(df_without_mrms_start.shape[0])
    perm_in_wildcards.fill(False)

    # CHECK PERMUTATIONS THAT START WITH MR &. MRS CLAUSE
    mrms_in_sub = count_mrms_in_substring(df_substrings['schedule'], df_with_mrms_start['Permutation'])
    mrms_in_wildcards = np.logical_or(mrms_in_wildcards, mrms_in_sub)

    # CHECK PERMUTATIONS THAT DON'T START WITH MR &. MRS CLAUSE
    perms_in_sub = count_perms_in_substring(df_substrings['schedule'], df_without_mrms_start['Permutation'])
    perm_in_wildcards = np.logical_or(perm_in_wildcards, perms_in_sub)

    #max_lenght = max([len(substring) for substring in df_substrings['schedule']])
    score = df_permutations.shape[0] - (np.sum(mrms_in_wildcards) + np.sum(perm_in_wildcards))

    return True if score == 0 else False # CHECK THAT THERE ARE NO REPEATED VALUES

def count_nodes_objective_func(df_substrings):

    symbols = ['🎅', '🤶', '🦌', '🧝', '🎄', '🎁', '🎀']
    c=0

    for symbol in symbols:
        
        wildcard_subs = [s.replace('🌟', symbol) for s in df_substrings['schedule']]
        
        for perm in df_permutations['Permutation']:
            for substring in wildcard_subs:
                c += substring.count(perm)
    return c


In [60]:
#general_objective_func(df_sample_submission)
#print(delete_nodes_objective_func(df_sample_submission))
print(count_nodes_objective_func(df_sample_submission))

59367


### Individual Generation

#### Random Permutation of len N

In [61]:
N = 5000
symbols = ['🎅', '🤶', '🦌', '🧝', '🎄', '🎁', '🎀']

individual = ''.join(np.random.choice(symbols, N))

A Genetic Approach with random permutations would mean to create a population of M individuals, each individual would consist of 3 substrings.
The mutation proccess would randomly remove genes trying to get a better or equal score with a shorter substring.
The crossover could change a substring or mix two substrings to get a better one.

#### Superpermutation
*a string that contains each permutation of n symbols as a substring*
With a superstring we can assure the presence of every permutation and from there we can shorten the string
Shortest superstring made of all permutations of 7 digits (1,2,3,4,5,6,7):

In [62]:
superpermutation = '12345671234561723456127345612374561327456137245613742561374526137456213745612347561324756134275613472561347526134756213475612345761234516723451627345162374516234751623457162345176234512673451263745126347512634571263451726345127634512367451236475123645712364517236451273645123764512346751234657123465172346512734651243765124367512436571243651724365127436512473651246375124635712463517246351274635124763512467351426735146273514672351467325146735216473521674352167345216374521634752163457216345271634521764352176453271645327614532764153276451326745132647513264571326451732645137264531726453712645372164537261453726415372645132764531276453217645231764521376452173654217365241736521473652174365217346521736452176345216735421637542163574216354721635427163542176354216735241637524163572416352741635247163524176352416735214673512465371246531724653127465312476531246753142675314627531467253146752316475321647531264753162475316427531647253164752316745321674531267453162745316724531674253167452316754231675243167523416752314675321467531246573124651372465132746513247651324671532467135246713254671235467125346712543671524367154236715432675143267541326754312675432167543261745362174536127453617245361742536174523617453261743526174325617432651742365174263517426531742651374265173426157342617534216753421765342175634217536421753462175342617354261734526173425617342651743261574362157436125743162574312657413265741236574126357412653741265734126574312567413256741235674125367412563741256734125674312576413257614325761342576132457613254761325746132576412357641253761425376124537612543761524376154237615432761543726154376215437612534761253746125376412573641257634125764312574631257436152743615724361574236157432617543621754361275436172543617524361754236175432671543627154367215436712546371254673125476312547361524736154273615472361547326145736214576321475632147653214763521476325147632154763214576231457621345762143576214537621457361245736142573614527361457236145732614753621475361247536142753614725361475236147532614735261473256147326514723651472635147265314726513472651437265147326154736215473612547316254731265471326547123654712635471265347126543716253471625374162537146253716425371624537162543716524371654237165432716543721654371265473125647132564712356471253647125634712564372156437251643275614327564132756431275643217564327156432751643257163425176342516734251637425163472516342751634257163245176324516732451637245163274516324751632457163254716325741632571463275146327154632714563271465327146352714632571643527164357216435712643517264351276435126743512647351264375126435716243517624351672435162743516247351624375162435716423517642351674235164723516427351642375146237514263751423675142376514273651427635142765314276513427651432765142375614235761423567143256714352671435627143567214356712435617243561274356124735612437561243576124356714235617423561472356142735614237516423571643251764325167432516473251643725614372564137256431725643712564731254671324567132465713246751324615732461753246173524617325416723541762354716235476123547621354762315467231546273154623715462317564231576421356742135647213564271356421735624137562413576241356724135627413562471356241735621473562174356217345621735462173564213756421357642153746215374261537421653742156374215367421537642157364215763421576432157642315674231564723156427315642371564231756243157624315672431562743156247315624371562431756234157623415672341562734156237415623471562341756231475623174562317546321745632174653217463521746325174632157463217546312754631725463175246315724631527463152476315246731524637152463175426315742631547263154276315426731542637154263175462315746235174623571462357416235746123574621357462315476235147623541726354172365417235641723546172354167253417625314762531746253176425317624531762543176524317654231765432176543127654317265431762534172653417256341725364172534617253416725431672541367251436725134672153476215347261534721653472156347215364721534672135467213456721346572136457213654721365742136572413657214365721346752136475213674521367542136752413675214376521437562143752614375216437521463725146372154637214563721465372146357214637521436752134672513647251367425136724513672541637254167325417632541736251473625174362517346257136425713624571362547136257413625714362571346275136427513624751362745136275416327541623754126375412367541237654132765413726541376251437625134762513746251376425137624513762541376524137654213765412375641237546132754613725461375246137542613754621375461237541627354126735412763541273654127356412735461273541627534126753412765341275634127536412753461275341627543162754136275143627513462715342671354267134526713425671342657143265714236571426357142653714265731426571342675134267153427615342716534271563427153642715346271354627134562713465271364527136542713652471365274136527143652713462573146257341625734612573462157346251736425173624517362541732654173256417325461732456173246517324615372461532746153247615324167532416573214657321645731264573162457316425731645273165427316524731652743165273416527314652731645723165472316574231657243165723416572314657231645732165473216574321657342165732416537241653274165324716532417653241567321456731245637124563172456312745631247563124576312456731425637142563174256314725631427563142576314256731452637145236714532671453627145367214536712453671425367145237614523716452371465237416523746152347651234765213476523147652341765234716523476152346715234617523461572346152734615237465123746521374652317465237145623714526317452631475263145726314527631452673145627314567231456732154673215647321567432156734215673241563724156327415632471563241756324157632415367241536274153624715362417536241573624153762415326741532647153264175326415732641523764152367415236471523641752364157236415273641526374152634715263417526341572634152763415267341526437152643175264315726431527643152674315264731526413752641357261435726134572613547261357426135724613572641352761435276134527613542761352476135274613527641352674135264713526417352641'
len(superpermutation)

5906

In [63]:
superpermutation = superpermutation.replace('1', '🎅')
superpermutation = superpermutation.replace('2', '🤶')
superpermutation = superpermutation.replace('3', '🦌')
superpermutation = superpermutation.replace('4', '🧝')
superpermutation = superpermutation.replace('5', '🎄')
superpermutation = superpermutation.replace('6', '🎁')
superpermutation = superpermutation.replace('7', '🎀')
len(superpermutation)

5906

In [64]:
phenotype = [superpermutation for _ in range(3)]
print(len(phenotype), len(phenotype[0]), len(phenotype[1]), len(phenotype[2]))
print(np.array(list(phenotype[0])).shape)

3 5906 5906 5906
(5906,)


In [65]:
count_nodes_objective_func(pd.DataFrame(phenotype, columns=['schedule']))

105840

In [66]:
a = pd.DataFrame(['🎅🤶🦌🧝🎄🎁🎀🎁🎄🧝🦌🎅🤶🦌🧝🎄🎀🎁🤶🎄🧝🦌🎀🎅🤶🦌🧝🎁🎄🎀🤶🎁🎄🧝', '🎅🤶🦌🧝🎄🎁🎀🎁🎄🧝🦌🎅🤶🦌🧝🎄🎀🎁🤶🎄🧝🦌🎀🎅🤶🦌🧝🎁🎄🎀🤶🎁🎄🧝', '🎅🤶🦌🧝🎄🎁🎀🎁🎄🧝🦌🎅🤶🦌🧝🎄🎀🎁🤶🎄🧝🦌🎀🎅🤶🦌🧝🎁🎄🎀🤶🎁🎄🧝'], columns = ['schedule'])
delete_nodes_objective_func(a)

False

In [67]:
class Individual:
    def __init__(self, superm_len=5906, g_size=3, wildcards = None):
        self.superm_len = superm_len
        self.g_size = g_size

        self.genotype = [set() for _ in range(g_size)]
        self.add_init_genes()
        self.fitness = -1
        self.wildcards = wildcards

    def get_phenotype(self):
        if self.wildcards is not None:
            phenotype = self.wildcards.get_phenotype()
            phenome = [np.array(list(a)) for a in phenotype['schedule']]
        else:
            phenome = [np.array(list(superpermutation)) for _ in range(self.g_size)]
        phenome = [''.join(np.delete(phenome[i], list(self.genotype[i]))) for i in range(self.g_size)]
        
        return pd.DataFrame(phenome, columns=['schedule'])
    
    def add_genes(self, str_idx=None,  value=None):
        if str_idx is None and value is None:
            str_idx = random.randint(0,2)
            value_limit = self.superm_len - len(self.genotype[str_idx])
            value = [random.randint(0, value_limit)]

        elif str_idx is not None and value is None:
            value_limit = self.superm_len - len(self.genotype[str_idx])
            value = [random.randint(0, value_limit)]
        
        self.genotype[str_idx].update(value)
        #self.genotype[str_idx] = np.append(self.genotype[str_idx], value)

    def add_init_genes(self):
        self.add_genes(str_idx=0)
        self.add_genes(str_idx=1)
        self.add_genes(str_idx=2)
    
    def get_fitness(self):
        if(self.wildcards is not None): self.fitness = general_objective_func(self.get_phenotype())
        else: self.fitness = delete_nodes_objective_func(self.get_phenotype())
        return self.fitness

    def get_local_fitness(self):
        return np.sum([len(gene) for gene in self.genotype])
    
    def mutate(self):
        self.add_genes()
    
    def get_final_score(self):
        a = self.genotype.shape

    def phenotype_to_csv(self):
        self.get_phenotype().to_csv('submission.csv')

    def __str__(self):
        lens = [len(gene) for gene in self.genotype]
        s = 'Movies eliminated: ' + str(np.sum(lens)) + ' ' + pprint.pformat(lens) + '\t\n'
        #s += '\n\tFitness: ' + str(self.fitness) + '\n'
        #s += '\n\tWildcards: ' + pprint.pformat(self.wildcards).replace('\n','').replace(' ','').replace('array','') + '\n'
        return s

In [68]:
g01 = Individual()
g01.get_fitness()
print(g01)


Movies eliminated: 3 [1, 1, 1]	



In [96]:
class Population:
    def __init__(self, pop_size, wildcards=None, tolerance = 5):
        self.pop_size = pop_size
        self.tolerance = tolerance
        self.wildcards = wildcards
        self.individuals = [Individual(wildcards=self.wildcards) for _ in range(self.pop_size)]
        
        self.survivors, self.local_fitness = [], []
        self.elite = [None, -1] # Individual, fitness
        self.bests_per_cicle = []
        self.generations_without_survivors = 0

    def get_survivors(self):
        new_survivors = []
        new_local_fitness = []

        for individual in self.individuals:
            if individual.get_fitness():
                new_survivors.append(individual)
                new_local_fitness.append(individual.get_local_fitness())
        
        #self.generations_without_survivors = len(new_survivors)
        if not new_survivors:
            print('No change 1')
            self.generations_without_survivors += 1
            return deepcopy(self.survivors), deepcopy(self.local_fitness)
        else:
            self.generations_without_survivors = 0
            return new_survivors, np.array(new_local_fitness, dtype=int)
        

    def new_population(self, pop_size = None, crossover_size=1):
        if self.generations_without_survivors >= self.tolerance: return True
        if pop_size: self.pop_size = pop_size
        
        
        self.survivors, self.local_fitness = self.get_survivors()
        
        best_fit_idx = np.argmax(self.local_fitness)
        if self.local_fitness[best_fit_idx] > self.elite[1]:
            self.elite = [self.survivors[best_fit_idx], self.local_fitness[best_fit_idx]]
        
        self.bests_per_cicle.append(self.local_fitness[best_fit_idx])

        self.individuals = []
        for _ in range(self.pop_size):
            A, B = random.choices(self.survivors, weights=self.local_fitness, k=2)
            self.individuals.append(self.crossover(A, B, crossover_size))
        self.mutation()
        
        return False

    
    def crossover(self, A, B, sample_size=1):
        sample_genotype = B.genotype
        C = deepcopy(A)

        for i in range(3):
            rnd_genes = np.random.choice(list(sample_genotype[i]), size=sample_size, replace=True)
            C.add_genes(i, rnd_genes)

        return C

    def mutation(self):
        for individual in self.individuals:
            individual.mutate()

    def get_results(self):
        print("ELITE INDIVIDUAL: \n\t{}".format(self.elite[0])) # , self.elite[0].get_fitness()
        return self.elite[0]

    
    def __str__(self):
        s = ''
        for i, individual in enumerate(self.individuals):
            s += str(i+1) + '. ' + individual.__str__()
        return s

In [108]:
def genetic_algorithm(pop_size=100, n_gen = 1000, crossover_size = 7, wildcards=None):
    population = Population(pop_size, wildcards=wildcards)               # Init population
    
    g = 0
    result = None

    while g < n_gen:
        clear_output(wait=True)
        result = population.new_population(pop_size, np.random.randint(1, crossover_size))
        
        print("GENETIC ALGORITHM | THE MERRY MOVIE MONTAGE ")
        print(f'Generation: {g}/{n_gen}'.ljust(len(str(g)) + len(str(n_gen)) + 20) + f'Population: {pop_size}'.ljust(20) + f'Elite: {population.elite[1]}'.ljust(20) + f'Survivors: {len(population.survivors)}'.ljust(20) + f'Generations Without Survivors: {population.generations_without_survivors}'.ljust(20))

        if(result): break
        g += 1

    return population.get_results()

In [109]:
result = genetic_algorithm(10, 10, 7)

GENETIC ALGORITHM | THE MERRY MOVIE MONTAGE 
Generation: 9/10       Population: 10      Elite: 40           Survivors: 9        Generations Without Survivors: 0
ELITE INDIVIDUAL: 
	Movies eliminated: 40 [12, 13, 15]	



In [107]:
phenotype = result.get_phenotype()
phenotype

Unnamed: 0,schedule
0,🎅🤶🦌🧝🎄🎁🎀🎅🤶🦌🧝🎄🎁🎅🎀🤶🦌🧝🎄🎁🎅🤶🎀🦌🧝🎄🎁🎅🤶🦌🎀🧝🎄🎁🎅🦌🤶🎀🧝🎄🎁🎅🦌🎀🤶🧝...
1,🎅🤶🦌🧝🎄🎁🎀🎅🤶🦌🧝🎄🎁🎅🎀🤶🦌🧝🎄🎁🎅🤶🎀🦌🧝🎄🎁🎅🤶🦌🎀🧝🎄🎁🎅🦌🤶🎀🧝🎄🎁🎅🦌🎀🤶🧝...
2,🎅🤶🦌🧝🎄🎁🎀🎅🤶🦌🧝🎄🎁🎅🎀🤶🦌🧝🎄🎁🎅🤶🎀🦌🧝🎄🎁🎅🤶🦌🎀🧝🎄🎁🎅🦌🤶🎀🧝🎄🎁🎅🦌🎀🤶🧝...


In [100]:
class Wildcards:
    def __init__(self, phenotype, wildcard_symbol = '🌟'):
        self.wildcards = []
        self.wildcard_symbol = wildcard_symbol
        self.fitness = -1
        self.phenotype = phenotype

        for i in range(3):
            self.wildcards.append(list(np.random.randint(0, len(self.phenotype['schedule'][i]), 2)))

    def update_gene(self, str_idx=None, wldcrd_idx=None, value=None):
        if str_idx is None and wldcrd_idx is None and value is None:
            str_idx = random.randint(0,2)
            wldcrd_idx = random.randint(0,1)
            value_limit = len(self.phenotype['schedule'][str_idx])
            value = random.randint(0, value_limit)

        elif str_idx is not None and wldcrd_idx is not None and value is None:
            value_limit = len(self.phenotype['schedule'][str_idx])
            value = random.randint(0, value_limit)
        
        self.wildcards[str_idx][wldcrd_idx] = value

    def get_phenotype(self):
        phenome_lsts = [list(p) for p in self.phenotype['schedule']]
        for i in range(3):
            for j in range(2):
                phenome_lsts[i][self.wildcards[i][j]] = self.wildcard_symbol
        phenome = [''.join(phenome_lst) for phenome_lst in phenome_lsts]
        return pd.DataFrame(phenome, columns=['schedule'])

    def mutate(self):
        self.update_gene()
    
    def get_fitness(self):
        self.fitness = count_nodes_objective_func(self.get_phenotype())
        return self.fitness

    def __str__(self):
        return f'{pprint.pformat(self.wildcards)} {self.get_fitness()}\n'

In [101]:
a = Wildcards(phenotype=phenotype)
count_nodes_objective_func(a.get_phenotype())
print(a)

[[4920, 2388], [1798, 1463], [4935, 2241]] 103595



In [102]:
class WildcardPopulation:
    def __init__(self, pop_size, phenotype):
        self.pop_size = pop_size
        self.phenotype = phenotype
        self.individuals = [Wildcards(phenotype) for _ in range(self.pop_size)]
        self.elite = [None, -1]
        self.fitnesses = self.update_fitnesses()

    def evolve(self, n_gen):    
        g = 0
        result = None

        while g < n_gen:
            clear_output(wait=True)
            result = self.new_population()
            
            print("WILDCARDS GENETIC ALGORITHM | THE MERRY MOVIE MONTAGE ")
            print(f'Generation: {g}/{n_gen}'.ljust(len(str(g)) + len(str(n_gen)) + 20) + f'Population: {self.pop_size}'.ljust(20) + f'Elite: {max(self.fitnesses)}'.ljust(20))

            #if(result): break
            g += 1
        return self.get_results()


    def update_fitnesses(self):
        self.fitnesses = [count_nodes_objective_func(individual.get_phenotype()) for individual in self.individuals]
        

    def new_population(self):

        new_population = []
        weights = self.fitnesses
        for _ in range(self.pop_size):
            A, B = random.choices(self.individuals, weights=weights, k=2)
            new_population.append(self.crossover(A, B))

        self.individuals = new_population
        self.mutation()
        self.get_elite()

    def get_elite(self):
        self.update_fitnesses()
        best_idx = np.argmax(self.fitnesses)
        if(self.fitnesses[best_idx] > self.elite[1]):
            self.elite = [self.individuals[best_idx], self.fitnesses[best_idx]]
    
    def crossover(self, A, B, sample_size=1):
        sample_genotype = B.wildcards
        C = deepcopy(A)

        rdm_x_idx = random.randint(0,2)
        rdm_y_idx = random.randint(0,1)

        C.update_gene(rdm_x_idx, rdm_y_idx, sample_genotype[rdm_x_idx][rdm_y_idx])

        return C

    def mutation(self):
        for individual in self.individuals:
            individual.mutate()

    def get_results(self):
        print("ELITE INDIVIDUAL: \n\t{0} {1}".format(self.elite[0], self.elite[1])) # , self.elite[0].get_fitness()
        return self.elite

    
    def __str__(self):
        s = ''
        for i, individual in enumerate(self.individuals):
            s += str(i+1) + '. ' + individual.__str__() + '\n'
        return s

In [103]:
p = WildcardPopulation(5, phenotype=phenotype)
wildcards = p.evolve(2)

WILDCARDS GENETIC ALGORITHM | THE MERRY MOVIE MONTAGE 
Generation: 1/2       Population: 5       Elite: 103605       
ELITE INDIVIDUAL: 
	[[3362, 4882], [2682, 3603], [938, 5030]] 103627
 103627


In [111]:
final = genetic_algorithm(10, 10, 3, wildcards=wildcards[0])
#final.phenotype_to_csv()

GENETIC ALGORITHM | THE MERRY MOVIE MONTAGE 
Generation: 9/10       Population: 10      Elite: 35           Survivors: 10       Generations Without Survivors: 0
ELITE INDIVIDUAL: 
	Movies eliminated: 35 [13, 10, 12]	



In [None]:
df_submission = pd.read_csv('./submission.csv')
df_submission

Unnamed: 0,schedule
0,🎅🤶🧝🎄🎀🎁🦌🎅🤶🧝🎄🎁🎀🦌🤶🎅🧝🎄🎁🎀🦌🤶🧝🎅🎄🎁🎀🦌🤶🧝🎄🎅🎁🎀🦌🤶🧝🎄🎁🎅🎀🦌🤶🧝🎄🎁...
1,🎅🤶🎄🎁🎀🦌🧝🎅🤶🎄🎀🎁🦌🧝🎅🤶🎀🦌🧝🎄🎁🎅🤶🧝🎄🎁🎀🦌🎅🤶🧝🎄🎀🎁🦌🤶🎅🧝🎄🎀🎁🦌🤶🧝🎅🎄...
2,🎅🤶🦌🎁🧝🎄🎀🎅🤶🎁🦌🧝🎄🎀🎅🤶🎁🎀🦌🧝🎄🤶🎅🎁🎀🦌🧝🎄🤶🎁🎅🎀🦌🧝🎄🤶🎁🎀🎅🦌🧝🎄🤶🎁🎀🦌...


In [None]:
print(general_objective_func(df_submission))
print(count_nodes_objective_func(df_submission))
print(df_with_mrms_start.shape,  df_without_mrms_start.shape)

for string in df_submission['schedule']:
    print(np.where(np.array(list(string)) == '🌟'))

True
40089
(120, 1) (4920, 1)
(array([ 95, 300]),)
(array([116, 523]),)
(array([339, 544]),)
