# Tutorial notebook on how to search the barrier sequence database(s) to get sequences with specific attributes

In [1]:
import pandas as pd
import sys
sys.path.append("..")  # Use sys to add the parent directory (where src/hexmaze lives) to the path

## 1. Load a barrier sequence database

There are a bunch of different databases in the [`Barrier_Sequence_Databases`](../Barrier_Sequence_Databases/) folder:

`barrier_sequence_database` contains 3126 barrier sequences. This is a good place to start.
- Sequences are 4-6 mazes long
- Every barrier change results in at least one path getting shorter and one getting longer, AND the optimal path order changes (criteria_type=ALL)
- There are at least 9 hexes different combined across all optimal paths for all mazes in a sequence (min_hex_diff=9)

`barrier_sequences_starting_from_all_mazes` contains 55896 barrier sequences, each starting from a different maze in the `maze_configuration_database`.
- Some of these "sequences" contain only one maze because no good barrier changes were found. These mazes are still included as they make good candidates for probability change experiments where barrier changes are not needed.
- barrier_sequence_database (above) is the subset of this database where sequence_length >= 4

`long_barrier_sequences` contains 438 long barrier sequences (generated by allowing get_barrier_sequence to make up to 200 recursive calls instead of the default 40).
- Sequences are 6-7 mazes long
- Every barrier change results in at least one path getting shorter and one getting longer, AND the optimal path order changes (criteria_type=ALL)
- There are at least 9 hexes different combined across all optimal paths for all mazes in a sequence (min_hex_diff=9)

`single_choice_point` contains 3720 barrier sequences where all mazes in the sequence have a single choice point.
- All sequences are at least 3 mazes long
- 104 sequences are 6 mazes long
- Every barrier change results in at least one path getting shorter and one getting longer, AND the optimal path order changes (criteria_type=ALL)
- There are at least 9 hexes different combined across all optimal paths for all mazes in a sequence (min_hex_diff=9)

`criteria_type_any_long_sequences` contains 1734 barrier sequences 5+ mazes long, with relaxed criteria (criteria_type=ANY) to generate longer sequences.
- All sequences are at least 5 mazes long
- 1338 sequences are 6+ mazes long, 564 are 7+, 334 are 8+, 220 are 9
- Every barrier change results in at least one path getting shorter and one getting longer, OR the optimal path order changes (criteria_type=ANY)
- There are at least 16 hexes different combined across all optimal paths for all mazes in a sequence (min_hex_diff=16)

`criteria_type_any_starting_from_all_mazes` contains 55896 barrier sequences, each starting from a different maze in the `maze_configuration_database`, with relaxed criteria (criteria_type=ANY) to generate longer sequences.
- criteria_type_any_long_sequences (above) is the subset of this database where sequence_length >= 5
- Some of these "sequences" contain only one maze because no good barrier changes were found. These mazes are still included as they make good candidates for probability change experiments where barrier changes are not needed.
- Every barrier change results in at least one path getting shorter and one getting longer, OR the optimal path order changes (criteria_type=ANY)
- There are at least 16 hexes different combined across all optimal paths for all mazes in a sequence (min_hex_diff=16)

`1_choice_point_all_path_lengths_different_first5000` contains 112 barrier sequences where all mazes have a single choice point AND all 3 path lengths are different.
- All sequences are at least 3 mazes long. 6 sequences are 4 mazes long
- This was generated starting from the first 5000 mazes in the `maze_configuration_database`. To add to it, start from maze 5001


As I add new databases, I will document them here (and if you generate a new database, please document it here as well!)

In [2]:
# Load the database of your choice of different barrier sequences as 'barrier_sequence_df'
barrier_sequence_df = pd.read_pickle('../Barrier_Sequence_Databases/barrier_sequence_database.pkl')
display(barrier_sequence_df)

Unnamed: 0,barrier_sequence,sequence_length,barrier_changes,reward_path_lengths,choice_points
18,"[(39, 7, 10, 42, 18, 20, 23, 26, 30), (37, 39,...",4,"[[26, 37], [10, 17], [20, 24]]","[[25, 19, 17], [15, 19, 19], [25, 17, 19], [17...","[{29}, {17, 26, 29}, {29}, {26}]"
34,"[(34, 36, 37, 7, 45, 14, 17, 20, 28), (34, 36,...",6,"[[17, 25], [36, 26], [26, 41], [41, 32], [34, ...","[[23, 19, 17], [15, 19, 23], [21, 19, 15], [15...","[{35}, {13}, {35}, {26, 35, 13}, {35, 36, 13},..."
45,"[(32, 34, 11, 44, 13, 46, 15, 21, 30), (32, 34...",5,"[[13, 16], [16, 10], [30, 20], [20, 36]]","[[21, 15, 19], [17, 25, 19], [19, 15, 19], [19...","[{24}, {31}, {16, 24, 31}, {31}, {16}]"
46,"[(37, 7, 42, 44, 14, 20, 25, 28, 31), (37, 7, ...",5,"[[31, 17], [17, 16], [16, 30], [37, 21]]","[[15, 17, 21], [23, 17, 17], [15, 19, 17], [15...","[{13}, {29}, {26}, {26, 13, 29}, {29}]"
60,"[(34, 37, 10, 45, 14, 15, 24, 25, 27), (34, 37...",4,"[[24, 39], [39, 28], [27, 36]]","[[19, 21, 15], [19, 15, 17], [19, 17, 15], [17...","[{36}, {16, 24, 36}, {16, 35, 36}, {16}]"
...,...,...,...,...,...
55834,"[(39, 8, 42, 16, 19, 21, 25, 27, 30), (39, 8, ...",5,"[[16, 10], [21, 36], [36, 17], [17, 24]]","[[17, 23, 17], [19, 15, 17], [17, 15, 19], [21...","[{31}, {16, 29, 31}, {16}, {29}, {31}]"
55848,"[(7, 42, 44, 14, 20, 23, 28, 30, 31), (7, 42, ...",4,"[[31, 17], [17, 37], [30, 17]]","[[15, 17, 21], [23, 17, 17], [15, 17, 19], [23...","[{13}, {29}, {26, 13, 29}, {29}]"
55850,"[(36, 39, 8, 10, 42, 19, 21, 24, 27), (39, 8, ...",4,"[[36, 30], [24, 17], [21, 24]]","[[19, 17, 23], [19, 25, 17], [21, 15, 17], [17...","[{16}, {31}, {29}, {31}]"
55852,"[(37, 8, 9, 45, 20, 21, 23, 27, 29), (35, 37, ...",5,"[[29, 35], [35, 34], [37, 36], [36, 31]]","[[19, 21, 15], [19, 17, 25], [19, 19, 15], [17...","[{36}, {13}, {35, 36, 13}, {13}, {35}]"


## 2. Filter the database based on certain criteria

In [3]:
# For example, we only want sequences >= length 5
filtered = barrier_sequence_df[barrier_sequence_df['sequence_length'] >= 5]
print(f"There are {len(filtered)} barrier sequences in our database that are at least 5 mazes long:")
display(filtered)

There are 1179 barrier sequences in our database that are at least 5 mazes long:


Unnamed: 0,barrier_sequence,sequence_length,barrier_changes,reward_path_lengths,choice_points
34,"[(34, 36, 37, 7, 45, 14, 17, 20, 28), (34, 36,...",6,"[[17, 25], [36, 26], [26, 41], [41, 32], [34, ...","[[23, 19, 17], [15, 19, 23], [21, 19, 15], [15...","[{35}, {13}, {35}, {26, 35, 13}, {35, 36, 13},..."
45,"[(32, 34, 11, 44, 13, 46, 15, 21, 30), (32, 34...",5,"[[13, 16], [16, 10], [30, 20], [20, 36]]","[[21, 15, 19], [17, 25, 19], [19, 15, 19], [19...","[{24}, {31}, {16, 24, 31}, {31}, {16}]"
46,"[(37, 7, 42, 44, 14, 20, 25, 28, 31), (37, 7, ...",5,"[[31, 17], [17, 16], [16, 30], [37, 21]]","[[15, 17, 21], [23, 17, 17], [15, 19, 17], [15...","[{13}, {29}, {26}, {26, 13, 29}, {29}]"
123,"[(37, 40, 9, 42, 17, 18, 20, 25, 28), (37, 40,...",6,"[[17, 10], [42, 32], [32, 36], [36, 26], [25, ...","[[23, 17, 17], [15, 19, 17], [17, 19, 15], [15...","[{29}, {17, 26, 29}, {17, 36, 29}, {17}, {29},..."
296,"[(34, 37, 7, 41, 10, 44, 15, 18, 25), (34, 37,...",5,"[[7, 17], [17, 20], [20, 11], [44, 24]]","[[15, 17, 19], [21, 15, 19], [15, 21, 19], [17...","[{24, 17, 26}, {24}, {26}, {16, 24, 26}, {26}]"
...,...,...,...,...,...
55773,"[(32, 37, 7, 40, 11, 44, 17, 20, 28), (32, 35,...",5,"[[17, 35], [35, 21], [37, 36], [21, 17]]","[[21, 17, 15], [17, 17, 23], [19, 17, 15], [17...","[{29}, {13}, {13, 36, 29}, {13}, {29}]"
55797,"[(32, 34, 37, 10, 45, 14, 15, 25, 30), (32, 34...",5,"[[30, 23], [23, 40], [32, 41], [40, 20]]","[[19, 15, 21], [19, 17, 15], [19, 15, 17], [17...","[{16}, {16, 35, 36}, {16, 24, 36}, {16, 24, 26..."
55834,"[(39, 8, 42, 16, 19, 21, 25, 27, 30), (39, 8, ...",5,"[[16, 10], [21, 36], [36, 17], [17, 24]]","[[17, 23, 17], [19, 15, 17], [17, 15, 19], [21...","[{31}, {16, 29, 31}, {16}, {29}, {31}]"
55852,"[(37, 8, 9, 45, 20, 21, 23, 27, 29), (35, 37, ...",5,"[[29, 35], [35, 34], [37, 36], [36, 31]]","[[19, 21, 15], [19, 17, 25], [19, 19, 15], [17...","[{36}, {13}, {35, 36, 13}, {13}, {35}]"


In [4]:
# Or we only want sequences where the each maze only has a single choice point
filtered = barrier_sequence_df[barrier_sequence_df['choice_points'].apply(lambda x: all(len(cp) == 1 for cp in x))]
print(f"There are {len(filtered)} barrier sequences in our database where each maze in the sequence has only 1 choice point:")
display(filtered)

There are 1147 barrier sequences in our database where each maze in the sequence has only 1 choice point:


Unnamed: 0,barrier_sequence,sequence_length,barrier_changes,reward_path_lengths,choice_points
66,"[(9, 42, 44, 17, 22, 23, 24, 27, 30), (9, 42, ...",4,"[[24, 20], [17, 13], [20, 26]]","[[17, 23, 17], [23, 17, 17], [15, 21, 17], [23...","[{31}, {29}, {31}, {29}]"
118,"[(39, 9, 42, 10, 15, 22, 24, 25, 30), (35, 39,...",4,"[[24, 35], [9, 11], [35, 20]]","[[15, 21, 17], [15, 17, 19], [17, 15, 19], [17...","[{31}, {17}, {16}, {31}]"
193,"[(34, 37, 42, 13, 45, 14, 18, 19, 25), (34, 37...",4,"[[14, 7], [42, 27], [7, 31]]","[[21, 17, 17], [15, 19, 17], [17, 19, 15], [19...","[{35}, {26}, {36}, {35}]"
201,"[(34, 37, 41, 10, 14, 15, 17, 19, 25), (34, 37...",4,"[[17, 31], [31, 29], [19, 30]]","[[21, 17, 17], [17, 17, 21], [17, 21, 17], [17...","[{35}, {16}, {26}, {16}]"
252,"[(34, 37, 40, 10, 12, 18, 20, 25, 27), (34, 35...",4,"[[20, 35], [35, 31], [27, 42]]","[[17, 21, 17], [17, 17, 21], [21, 17, 17], [15...","[{36}, {17}, {24}, {17}]"
...,...,...,...,...,...
55556,"[(32, 37, 40, 10, 12, 45, 18, 25, 29), (32, 37...",4,"[[40, 13], [32, 41], [13, 31]]","[[17, 17, 21], [17, 19, 15], [15, 19, 17], [15...","[{17}, {36}, {26}, {17}]"
55682,"[(37, 42, 10, 12, 45, 18, 25, 29, 30), (37, 10...",4,"[[30, 28], [42, 27], [28, 40]]","[[15, 17, 19], [15, 19, 17], [17, 19, 15], [17...","[{17}, {26}, {36}, {17}]"
55712,"[(34, 37, 10, 42, 45, 14, 19, 24, 25), (34, 37...",4,"[[24, 17], [17, 31], [42, 13]]","[[17, 21, 17], [21, 17, 17], [17, 17, 21], [19...","[{26}, {35}, {16}, {35}]"
55755,"[(9, 42, 10, 44, 13, 15, 22, 25, 30), (9, 10, ...",4,"[[13, 29], [9, 8], [29, 24]]","[[15, 21, 17], [15, 17, 19], [17, 15, 19], [17...","[{31}, {17}, {16}, {31}]"


Note that searches like the one above simply rule out all sequences where ANY maze has more than one choice point. 
If a sequence breaks this rule in the 5th maze in the sequence (as many long sequences do), it does not return a valid subsequence, even if there is one.

In these cases, it is often preferable to generate a new database that fits the desired criteria (which is why we have the `single_choice_point` database!). 


See [`Generate_Custom_Barrier_Sequence_Database.ipynb`](Generate_Custom_Barrier_Sequence_Database.ipynb) for more information.