## Distance predictor with simple NN

A simple self-supervised setting to predict the number of moves it takes to reach final position

This is a starting place to check things like state space representation (does using state = [1, 2, 3, 1] work), and just generally how well NNs can approximate permutation problems.

### Basic outline:

- Generate a set of `n` moves by uniformly sampling from available actions, with `n` randomly sampled from some probability distribution
- Using greedy_reduce to simplify those moves to get `n'` number of moves, $n' \leq n$
- Apply these `n'` moves on a puzzle to reach `start_state`
- A network $\mathcal{F}$ takes `start_state` as input, and the target output is `n'`

### Some preliminary details

- Loss function: mean square loss
- Neural network weights: $|s| \times 128$, $128 \times 128$, $128 \times 1$
- For 2x2 puzzles, we only need up to 10 - 14 moves
- State-space representation should be normalized?

### Some hypotheses

- Expecting better performance on low `n` over high `n`
- Expecting weird things to happen at `n` > 10.


In [37]:
import json
from typing import Dict, List
from collections import OrderedDict

import numpy as np

%pprint

Pretty printing has been turned OFF


In [63]:
from src.mechanism.permute import reverse_perm, permute_with_swap, perm_to_swap
from src.mechanism.utils import get_inverse_move
from src.mechanism.reduce import iterate_reduce_sequence

### Load a puzzle 
Since each puzzle is trained separately 

In [58]:
def load_puzzle_moves(
    puzzle_name: str, convert_to_swaps=True
) -> (Dict[str, List[int]], int):
    """Retrieves and returns the moves and final position of the puzzle"""
    # load the moves:
    with open(f"puzzles/{puzzle_name}/moves.json") as f:
        moves = json.load(f)

    num_states = len(list(moves.values())[0])
    # add reversed moves
    reversed_moves = {}
    for move_name, perm in moves.items():
        reversed_perm = reverse_perm(perm)
        if reversed_perm == perm:
            continue
        reversed_moves[f"-{move_name}"] = reversed_perm

    moves.update(reversed_moves)

    if convert_to_swaps:
        for move_name, perm in moves.items():
            moves[move_name] = perm_to_swap(perm)

    # # get final position (from the first puzzle), note that the actual state of this position doesn't really matter
    # # we just need to get the structure of the puzzle
    # df = pd.read_csv(f'puzzles/{puzzle_name}/puzzles.csv')
    # state = df.iloc[0].to_numpy()[3]

    return moves, num_states


puzzle_name = "cube_2x2x2"
move_dict, num_states = load_puzzle_moves(puzzle_name)

move_names = list(move_dict.keys())

final_state = list(range(num_states))

print(f"Loaded {puzzle_name} with {len(move_names)} moves and {num_states} states")

Loaded cube_2x2x2 with 12 moves and 24 states


### Data generation

The self-supervised part: Generate a set of `n` moves by uniformly sampling from available actions, with `n` randomly sampled from some probability distribution

In [86]:
def sample_moves(move_names: List[str], n: int) -> List[int]:
    return np.random.choice(move_names, n)


def generate_state_from_moves(move_names, move_dict, state, inverse=False):
    for move_name in move_names:
        if inverse:
            move_name = get_inverse_move(move_name)
        move = move_dict[move_name]
        state = permute_with_swap(state, move)

    return state


path = sample_moves(move_names, 5)

print(list(path))

path = iterate_reduce_sequence(list(path), puzzle_name)
print(path)

generate_state_from_moves(path, move_dict, final_state)

['r0', 'r1', '-f0', '-d0', 'r0']
['r0', 'r1', '-f0', '-d0', 'r0']


[4, 23, 10, 9, 21, 12, 15, 13, 1, 14, 0, 8, 11, 2, 5, 6, 17, 7, 20, 22, 18, 16, 19, 3]