## Restoration of superposition matrices

This notebook is using functions from `algorithms.py` and `utils.py`. Refer to `requirements.txt` to install the required packages. This code is tested in environment with python version available below.

In [1]:
import sys
print(sys.version)

3.7.7 (default, May  7 2020, 21:25:33) 
[GCC 7.3.0]


In [None]:
!wget https://raw.githubusercontent.com/Intelligent-Systems-Phystech/Neychev_PhD_Thesis/main/SymbolicRegressionPaper/code/algorithms.py -nc
!wget https://raw.githubusercontent.com/Intelligent-Systems-Phystech/Neychev_PhD_Thesis/main/SymbolicRegressionPaper/code/utils.py -nc

In [None]:
import random

import numpy as np
import matplotlib
from matplotlib import pyplot as plt

In [None]:
from algorithms import add_noise_to_matrix, make_random_correct_adj_matrix, restore_matrix
from utils import do_exp, do_exp_multiple, make_plot, make_plot_multiple, generate_arities_list

Sanity check of the `generate_arities_list` function. The original (correct) and noised adjacency matrices are shown below. Noise level can be adjusted explicitly (and should take value from $[0, 1]$).

In [None]:
np.random.seed(421)
random.seed(a=472443)

arities = generate_arities_list(3)
var_number = 1
print(arities)

new_matrix = make_random_correct_adj_matrix(arities, var_number, complexity_limit=-1)

print(new_matrix)

noisy_matrix = add_noise_to_matrix(
    new_matrix, noise_level=0.2, noise_variant="uniform", calibration_variant="linear"
)

print(noisy_matrix)

Restoring the matrix

In [None]:
restore_matrix(
    arities,
    var_number,
    noisy_matrix,
    method="kmst_prim_incor",
    eps=0.0001,
    max_cmplx=-1,
    prize_coef=0.5,
)

__Experimental setup:__
    - 200 arities (from 5 to 50)
    - Generate 100 adjacency matrices for each
    - Noise and restore several (10) times
    - Adjustable parameters: Noise value and type, number of variables, arity, complexity

### Experiments

### Small arities, main algorithms

In [None]:
algs = [
    "greedy_dfs",
    "greedy_bfs",
    "prim_fast",
    "kmst_pure",
    "kmst_dfs",
    "kmst_bfs",
    "kmst_prim",
]

In [None]:
%%time
# Warning: this cell takes around 15 minutes for 50 repeats to execute

repeats = 2

recovered_total = []
for alg in algs:
    np.random.seed(421)
    random.seed(a=472443)
    recovered_total_per_alg = do_exp_multiple(repeats,
                                              1, (5, 20), [alg], 20, 5, np.linspace(0.0, 1.0, 51),
                                              'uniform', 'linear', 1, 5, -1, 0.2, 0.5)
    recovered_total.append(recovered_total_per_alg)

In [None]:
print(recovered_total[0].shape)
new_recovered_total_array = np.zeros(
    (recovered_total[0].shape[0], len(algs), recovered_total[0].shape[2])
)
for i, rec in enumerate(recovered_total):
    new_recovered_total_array[:, i : i + 1, :] = rec.copy()
print(new_recovered_total_array.shape)

In [None]:
saveas = "main_algs_full_alpha006_maxarity_5_20.eps"
make_plot_multiple(
    algs,
    new_recovered_total_array,
    repeats,
    0.06,
    0.0,
    np.linspace(0.0, 1.0, 51),
    saveas,
    (14, 7),
)

In [None]:
saveas = "main_algs_full_alpha000_maxarity_5_20.eps"
make_plot_multiple(
    algs,
    new_recovered_total_array,
    repeats,
    0.00,
    0.0,
    np.linspace(0.0, 1.0, 51),
    saveas,
    (14, 7),
)

In [None]:
saveas = "main_algs_local_alpha000_maxarity_5_20.eps"
make_plot_multiple(
    algs,
    new_recovered_total_array[:, :, 24:35],
    repeats,
    0.0,
    0.0,
    np.linspace(0.48, 0.68, 11),
    saveas,
    (5, 7),
)

### Small arities, basic algorithms enchanced with proposed heuristics

In [None]:
algs = [
    "greedy_dfs",
    "greedy_bfs",
    "prim_fast",
    "kmst_pure_incor",
    "kmst_dfs_incor",
    "kmst_bfs_incor",
    "kmst_prim_incor",
]

In [None]:
%%time
# Warning: this cell takes around 15 minutes for 50 repeats to execute

repeats = 2

recovered_total_incor = []
for alg in algs:
    np.random.seed(421)
    random.seed(a=472443)
    recovered_total_per_alg = do_exp_multiple(repeats,
                                              1, (5, 20), [alg], 20, 5, np.linspace(0.0, 1.0, 51),
                                              'uniform', 'linear', 1, 5, -1, 0.2, 0.5)
    recovered_total_incor.append(recovered_total_per_alg)

In [None]:
print(recovered_total_incor[0].shape)
new_recovered_total_incor_array = np.zeros(
    (recovered_total_incor[0].shape[0], len(algs), recovered_total_incor[0].shape[2])
)
for i, rec in enumerate(recovered_total_incor):
    new_recovered_total_incor_array[:, i : i + 1, :] = rec.copy()
print(new_recovered_total_incor_array.shape)

In [None]:
saveas = "incor_algs_full_alpha006_maxarity_5_20.eps"
make_plot_multiple(
    algs,
    new_recovered_total_incor_array,
    repeats,
    0.06,
    0.0,
    np.linspace(0.0, 1.0, 51),
    saveas,
    (14, 7),
    True,
)

In [None]:
saveas = "incor_algs_full_alpha000_maxarity_5_20.eps"
make_plot_multiple(
    algs,
    new_recovered_total_incor_array,
    repeats,
    0.00,
    0.0,
    np.linspace(0.0, 1.0, 51),
    saveas,
    (14, 7),
    True,
)

In [None]:
saveas = "incor_algs_local_alpha000_maxarity_5_20.eps"
make_plot_multiple(
    algs,
    new_recovered_total_incor_array[:, :, 24:35],
    repeats,
    0.0,
    0.0,
    np.linspace(0.48, 0.68, 11),
    saveas,
    (5, 7),
    True,
)